Sample records for optimizing nucleotide sequence

  1. Modular and configurable optimal sequence alignment software: Cola.

    PubMed

    Zamani, Neda; Sundström, Görel; Höppner, Marc P; Grabherr, Manfred G

    2014-01-01

    The fundamental challenge in optimally aligning homologous sequences is to define a scoring scheme that best reflects the underlying biological processes. Maximising the overall number of matches in the alignment does not always reflect the patterns by which nucleotides mutate. Efficiently implemented algorithms that can be parameterised to accommodate more complex non-linear scoring schemes are thus desirable. We present Cola, alignment software that implements different optimal alignment algorithms, also allowing for scoring contiguous matches of nucleotides in a nonlinear manner. The latter places more emphasis on short, highly conserved motifs, and less on the surrounding nucleotides, which can be more diverged. To illustrate the differences, we report results from aligning 14,100 sequences from 3' untranslated regions of human genes to 25 of their mammalian counterparts, where we found that a nonlinear scoring scheme is more consistent than a linear scheme in detecting short, conserved motifs. Cola is freely available under LPGL from https://github.com/nedaz/cola.

  2. Optimization of algorithm of coding of genetic information of Chlamydia

    NASA Astrophysics Data System (ADS)

    Feodorova, Valentina A.; Ulyanov, Sergey S.; Zaytsev, Sergey S.; Saltykov, Yury V.; Ulianova, Onega V.

    2018-04-01

    New method of coding of genetic information using coherent optical fields is developed. Universal technique of transformation of nucleotide sequences of bacterial gene into laser speckle pattern is suggested. Reference speckle patterns of the nucleotide sequences of omp1 gene of typical wild strains of Chlamydia trachomatis of genovars D, E, F, G, J and K and Chlamydia psittaci serovar I as well are generated. Algorithm of coding of gene information into speckle pattern is optimized. Fully developed speckles with Gaussian statistics for gene-based speckles have been used as criterion of optimization.

  3. WEB-server for search of a periodicity in amino acid and nucleotide sequences

    NASA Astrophysics Data System (ADS)

    E Frenkel, F.; Skryabin, K. G.; Korotkov, E. V.

    2017-12-01

    A new web server (http://victoria.biengi.ac.ru/splinter/login.php) was designed and developed to search for periodicity in nucleotide and amino acid sequences. The web server operation is based upon a new mathematical method of searching for multiple alignments, which is founded on the position weight matrices optimization, as well as on implementation of the two-dimensional dynamic programming. This approach allows the construction of multiple alignments of the indistinctly similar amino acid and nucleotide sequences that accumulated more than 1.5 substitutions per a single amino acid or a nucleotide without performing the sequences paired comparisons. The article examines the principles of the web server operation and two examples of studying amino acid and nucleotide sequences, as well as information that could be obtained using the web server.

  4. Ancestral sequence reconstruction in primate mitochondrial DNA: compositional bias and effect on functional inference.

    PubMed

    Krishnan, Neeraja M; Seligmann, Hervé; Stewart, Caro-Beth; De Koning, A P Jason; Pollock, David D

    2004-10-01

    Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and flexibility. To determine whether biased reconstructions using optimization methods might affect inferences of functional properties, ancestral primate mitochondrial tRNA sequences were inferred and helix-forming propensities for conserved pairs were evaluated in silico. For ambiguously reconstructed nucleotides at sites with high base composition variability, ancestral tRNA sequences from Bayesian analyses were more compatible with canonical base pairing than were those inferred by other methods. Thus, nucleotide bias in reconstructed sequences apparently can lead to serious bias and inaccuracies in functional predictions.

  5. UrQt: an efficient software for the Unsupervised Quality trimming of NGS data.

    PubMed

    Modolo, Laurent; Lerat, Emmanuelle

    2015-04-29

    Quality control is a necessary step of any Next Generation Sequencing analysis. Although customary, this step still requires manual interventions to empirically choose tuning parameters according to various quality statistics. Moreover, current quality control procedures that provide a "good quality" data set, are not optimal and discard many informative nucleotides. To address these drawbacks, we present a new quality control method, implemented in UrQt software, for Unsupervised Quality trimming of Next Generation Sequencing reads. Our trimming procedure relies on a well-defined probabilistic framework to detect the best segmentation between two segments of unreliable nucleotides, framing a segment of informative nucleotides. Our software only requires one user-friendly parameter to define the minimal quality threshold (phred score) to consider a nucleotide to be informative, which is independent of both the experiment and the quality of the data. This procedure is implemented in C++ in an efficient and parallelized software with a low memory footprint. We tested the performances of UrQt compared to the best-known trimming programs, on seven RNA and DNA sequencing experiments and demonstrated its optimality in the resulting tradeoff between the number of trimmed nucleotides and the quality objective. By finding the best segmentation to delimit a segment of good quality nucleotides, UrQt greatly increases the number of reads and of nucleotides that can be retained for a given quality objective. UrQt source files, binary executables for different operating systems and documentation are freely available (under the GPLv3) at the following address: https://lbbe.univ-lyon1.fr/-UrQt-.html .

  6. Shuffle Optimizer: A Program to Optimize DNA Shuffling for Protein Engineering.

    PubMed

    Milligan, John N; Garry, Daniel J

    2017-01-01

    DNA shuffling is a powerful tool to develop libraries of variants for protein engineering. Here, we present a protocol to use our freely available and easy-to-use computer program, Shuffle Optimizer. Shuffle Optimizer is written in the Python computer language and increases the nucleotide homology between two pieces of DNA desired to be shuffled together without changing the amino acid sequence. In addition we also include sections on optimal primer design for DNA shuffling and library construction, a small-volume ultrasonicator method to create sheared DNA, and finally a method to reassemble the sheared fragments and recover and clone the library. The Shuffle Optimizer program and these protocols will be useful to anyone desiring to perform any of the nucleotide homology-dependent shuffling methods.

  7. T box transcription antitermination riboswitch: Influence of nucleotide sequence and orientation on tRNA binding by the antiterminator element

    PubMed Central

    Fauzi, Hamid; Agyeman, Akwasi; Hines, Jennifer V.

    2008-01-01

    Many bacteria utilize riboswitch transcription regulation to monitor and appropriately respond to cellular levels of important metabolites or effector molecules. The T box transcription antitermination riboswitch responds to cognate uncharged tRNA by specifically stabilizing an antiterminator element in the 5′-untranslated mRNA leader region and precluding formation of a thermodynamically more stable terminator element. Stabilization occurs when the tRNA acceptor end base pairs with the first four nucleotides in the seven nucleotide bulge of the highly conserved antiterminator element. The significance of the conservation of the antiterminator bulge nucleotides that do not base pair with the tRNA is unknown, but they are required for optimal function. In vitro selection was used to determine if the isolated antiterminator bulge context alone dictates the mode in which the tRNA acceptor end binds the bulge nucleotides. No sequence conservation beyond complementarity was observed and the location was not constrained to the first four bases of the bulge. The results indicate that formation of a structure that recognizes the tRNA acceptor end in isolation is not the determinant driving force for the high phylogenetic sequence conservation observed within the antiterminator bulge. Additional factors or T box leader features more likely influenced the phylogenetic sequence conservation. PMID:19152843

  8. Superstatistical model of bacterial DNA architecture

    NASA Astrophysics Data System (ADS)

    Bogachev, Mikhail I.; Markelov, Oleg A.; Kayumov, Airat R.; Bunde, Armin

    2017-02-01

    Understanding the physical principles that govern the complex DNA structural organization as well as its mechanical and thermodynamical properties is essential for the advancement in both life sciences and genetic engineering. Recently we have discovered that the complex DNA organization is explicitly reflected in the arrangement of nucleotides depicted by the universal power law tailed internucleotide interval distribution that is valid for complete genomes of various prokaryotic and eukaryotic organisms. Here we suggest a superstatistical model that represents a long DNA molecule by a series of consecutive ~150 bp DNA segments with the alternation of the local nucleotide composition between segments exhibiting long-range correlations. We show that the superstatistical model and the corresponding DNA generation algorithm explicitly reproduce the laws governing the empirical nucleotide arrangement properties of the DNA sequences for various global GC contents and optimal living temperatures. Finally, we discuss the relevance of our model in terms of the DNA mechanical properties. As an outlook, we focus on finding the DNA sequences that encode a given protein while simultaneously reproducing the nucleotide arrangement laws observed from empirical genomes, that may be of interest in the optimization of genetic engineering of long DNA molecules.

  9. Effect of the nucleotides surrounding the start codon on the translation of foot-and-mouth disease virus RNA.

    PubMed

    Ma, X X; Feng, Y P; Gu, Y X; Zhou, J H; Ma, Z R

    2016-06-01

    As for the alternative AUGs in foot-and-mouth disease virus (FMDV), nucleotide bias of the context flanking the AUG(2nd) could be used as a strong signal to initiate translation. To determine the role of the specific nucleotide context, dicistronic reporter constructs were engineered to contain different versions of nucleotide context linking between internal ribosome entry site (IRES) and downstream gene. The results indicate that under FMDV IRES-dependent mechanism, the nucleotide contexts flanking start codon can influence the translation initiation efficiencies. The most optimal sequences for both start codons have proved to be UUU AUG(1st) AAC and AAG AUG(2nd) GAA.

  10. Homology and the optimization of DNA sequence data

    NASA Technical Reports Server (NTRS)

    Wheeler, W.

    2001-01-01

    Three methods of nucleotide character analysis are discussed. Their implications for molecular sequence homology and phylogenetic analysis are compared. The criterion of inter-data set congruence, both character based and topological, are applied to two data sets to elucidate and potentially discriminate among these parsimony-based ideas. c2001 The Willi Hennig Society.

  11. Validation and optimization of the Ion Torrent S5 XL sequencer and Oncomine workflow for BRCA1 and BRCA2 genetic testing.

    PubMed

    Shin, Saeam; Kim, Yoonjung; Chul Oh, Seoung; Yu, Nae; Lee, Seung-Tae; Rak Choi, Jong; Lee, Kyung-A

    2017-05-23

    In this study, we validated the analytical performance of BRCA1/2 sequencing using Ion Torrent's new bench-top sequencer with amplicon panel with optimized bioinformatics pipelines. Using 43 samples that were previously validated by Illumina's MiSeq platform and/or by Sanger sequencing/multiplex ligation-dependent probe amplification, we amplified the target with the Oncomine™ BRCA Research Assay and sequenced on Ion Torrent S5 XL (Thermo Fisher Scientific, Waltham, MA, USA). We compared two bioinformatics pipelines for optimal processing of S5 XL sequence data: the Torrent Suite with a plug-in Torrent Variant Caller (Thermo Fisher Scientific), and commercial NextGENe software (Softgenetics, State College, PA, USA). All expected 681 single nucleotide variants, 15 small indels, and three copy number variants were correctly called, except one common variant adjacent to a rare variant on the primer-binding site. The sensitivity, specificity, false positive rate, and accuracy for detection of single nucleotide variant and small indels of S5 XL sequencing were 99.85%, 100%, 0%, and 99.99% for the Torrent Variant Caller and 99.85%, 99.99%, 0.14%, and 99.99% for NextGENe, respectively. The reproducibility of variant calling was 100%, and the precision of variant frequency also showed good performance with coefficients of variation between 0.32 and 5.29%. We obtained highly accurate data through uniform and sufficient coverage depth over all target regions and through optimization of the bioinformatics pipeline. We confirmed that our platform is accurate and practical for diagnostic BRCA1/2 testing in a clinical laboratory.

  12. E2FM: an encrypted and compressed full-text index for collections of genomic sequences.

    PubMed

    Montecuollo, Ferdinando; Schmid, Giovannni; Tagliaferri, Roberto

    2017-09-15

    Next Generation Sequencing (NGS) platforms and, more generally, high-throughput technologies are giving rise to an exponential growth in the size of nucleotide sequence databases. Moreover, many emerging applications of nucleotide datasets-as those related to personalized medicine-require the compliance with regulations about the storage and processing of sensitive data. We have designed and carefully engineered E 2 FM -index, a new full-text index in minute space which was optimized for compressing and encrypting nucleotide sequence collections in FASTA format and for performing fast pattern-search queries. E 2 FM -index allows to build self-indexes which occupy till to 1/20 of the storage required by the input FASTA file, thus permitting to save about 95% of storage when indexing collections of highly similar sequences; moreover, it can exactly search the built indexes for patterns in times ranging from few milliseconds to a few hundreds milliseconds, depending on pattern length. Source code is available at https://github.com/montecuollo/E2FM . ferdinando.montecuollo@unicampania.it. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  13. Multiplexed resequencing analysis to identify rare variants in pooled DNA with barcode indexing using next-generation sequencer.

    PubMed

    Mitsui, Jun; Fukuda, Yoko; Azuma, Kyo; Tozaki, Hirokazu; Ishiura, Hiroyuki; Takahashi, Yuji; Goto, Jun; Tsuji, Shoji

    2010-07-01

    We have recently found that multiple rare variants of the glucocerebrosidase gene (GBA) confer a robust risk for Parkinson disease, supporting the 'common disease-multiple rare variants' hypothesis. To develop an efficient method of identifying rare variants in a large number of samples, we applied multiplexed resequencing using a next-generation sequencer to identification of rare variants of GBA. Sixteen sets of pooled DNAs from six pooled DNA samples were prepared. Each set of pooled DNAs was subjected to polymerase chain reaction to amplify the target gene (GBA) covering 6.5 kb, pooled into one tube with barcode indexing, and then subjected to extensive sequence analysis using the SOLiD System. Individual samples were also subjected to direct nucleotide sequence analysis. With the optimization of data processing, we were able to extract all the variants from 96 samples with acceptable rates of false-positive single-nucleotide variants.

  14. MCMC-ODPR: primer design optimization using Markov Chain Monte Carlo sampling.

    PubMed

    Kitchen, James L; Moore, Jonathan D; Palmer, Sarah A; Allaby, Robin G

    2012-11-05

    Next generation sequencing technologies often require numerous primer designs that require good target coverage that can be financially costly. We aimed to develop a system that would implement primer reuse to design degenerate primers that could be designed around SNPs, thus find the fewest necessary primers and the lowest cost whilst maintaining an acceptable coverage and provide a cost effective solution. We have implemented Metropolis-Hastings Markov Chain Monte Carlo for optimizing primer reuse. We call it the Markov Chain Monte Carlo Optimized Degenerate Primer Reuse (MCMC-ODPR) algorithm. After repeating the program 1020 times to assess the variance, an average of 17.14% fewer primers were found to be necessary using MCMC-ODPR for an equivalent coverage without implementing primer reuse. The algorithm was able to reuse primers up to five times. We compared MCMC-ODPR with single sequence primer design programs Primer3 and Primer-BLAST and achieved a lower primer cost per amplicon base covered of 0.21 and 0.19 and 0.18 primer nucleotides on three separate gene sequences, respectively. With multiple sequences, MCMC-ODPR achieved a lower cost per base covered of 0.19 than programs BatchPrimer3 and PAMPS, which achieved 0.25 and 0.64 primer nucleotides, respectively. MCMC-ODPR is a useful tool for designing primers at various melting temperatures at good target coverage. By combining degeneracy with optimal primer reuse the user may increase coverage of sequences amplified by the designed primers at significantly lower costs. Our analyses showed that overall MCMC-ODPR outperformed the other primer-design programs in our study in terms of cost per covered base.

  15. MCMC-ODPR: Primer design optimization using Markov Chain Monte Carlo sampling

    PubMed Central

    2012-01-01

    Background Next generation sequencing technologies often require numerous primer designs that require good target coverage that can be financially costly. We aimed to develop a system that would implement primer reuse to design degenerate primers that could be designed around SNPs, thus find the fewest necessary primers and the lowest cost whilst maintaining an acceptable coverage and provide a cost effective solution. We have implemented Metropolis-Hastings Markov Chain Monte Carlo for optimizing primer reuse. We call it the Markov Chain Monte Carlo Optimized Degenerate Primer Reuse (MCMC-ODPR) algorithm. Results After repeating the program 1020 times to assess the variance, an average of 17.14% fewer primers were found to be necessary using MCMC-ODPR for an equivalent coverage without implementing primer reuse. The algorithm was able to reuse primers up to five times. We compared MCMC-ODPR with single sequence primer design programs Primer3 and Primer-BLAST and achieved a lower primer cost per amplicon base covered of 0.21 and 0.19 and 0.18 primer nucleotides on three separate gene sequences, respectively. With multiple sequences, MCMC-ODPR achieved a lower cost per base covered of 0.19 than programs BatchPrimer3 and PAMPS, which achieved 0.25 and 0.64 primer nucleotides, respectively. Conclusions MCMC-ODPR is a useful tool for designing primers at various melting temperatures at good target coverage. By combining degeneracy with optimal primer reuse the user may increase coverage of sequences amplified by the designed primers at significantly lower costs. Our analyses showed that overall MCMC-ODPR outperformed the other primer-design programs in our study in terms of cost per covered base. PMID:23126469

  16. Optimized Next-Generation Sequencing Genotype-Haplotype Calling for Genome Variability Analysis

    PubMed Central

    Navarro, Javier; Nevado, Bruno; Hernández, Porfidio; Vera, Gonzalo; Ramos-Onsins, Sebastián E

    2017-01-01

    The accurate estimation of nucleotide variability using next-generation sequencing data is challenged by the high number of sequencing errors produced by new sequencing technologies, especially for nonmodel species, where reference sequences may not be available and the read depth may be low due to limited budgets. The most popular single-nucleotide polymorphism (SNP) callers are designed to obtain a high SNP recovery and low false discovery rate but are not designed to account appropriately the frequency of the variants. Instead, algorithms designed to account for the frequency of SNPs give precise results for estimating the levels and the patterns of variability. These algorithms are focused on the unbiased estimation of the variability and not on the high recovery of SNPs. Here, we implemented a fast and optimized parallel algorithm that includes the method developed by Roesti et al and Lynch, which estimates the genotype of each individual at each site, considering the possibility to call both bases from the genotype, a single one or none. This algorithm does not consider the reference and therefore is independent of biases related to the reference nucleotide specified. The pipeline starts from a BAM file converted to pileup or mpileup format and the software outputs a FASTA file. The new program not only reduces the running times but also, given the improved use of resources, it allows its usage with smaller computers and large parallel computers, expanding its benefits to a wider range of researchers. The output file can be analyzed using software for population genetics analysis, such as the R library PopGenome, the software VariScan, and the program mstatspop for analysis considering positions with missing data. PMID:28894353

  17. Multi-modulus algorithm based on global artificial fish swarm intelligent optimization of DNA encoding sequences.

    PubMed

    Guo, Y C; Wang, H; Wu, H P; Zhang, M Q

    2015-12-21

    Aimed to address the defects of the large mean square error (MSE), and the slow convergence speed in equalizing the multi-modulus signals of the constant modulus algorithm (CMA), a multi-modulus algorithm (MMA) based on global artificial fish swarm (GAFS) intelligent optimization of DNA encoding sequences (GAFS-DNA-MMA) was proposed. To improve the convergence rate and reduce the MSE, this proposed algorithm adopted an encoding method based on DNA nucleotide chains to provide a possible solution to the problem. Furthermore, the GAFS algorithm, with its fast convergence and global search ability, was used to find the best sequence. The real and imaginary parts of the initial optimal weight vector of MMA were obtained through DNA coding of the best sequence. The simulation results show that the proposed algorithm has a faster convergence speed and smaller MSE in comparison with the CMA, the MMA, and the AFS-DNA-MMA.

  18. Combined hairpin-antisense compositions and methods for modulating expression

    DOEpatents

    Shanklin, John; Nguyen, Tam

    2014-08-05

    A nucleotide construct comprising a nucleotide sequence that forms a stem and a loop, wherein the loop comprises a nucleotide sequence that modulates expression of a target, wherein the stem comprises a nucleotide sequence that modulates expression of a target, and wherein the target modulated by the nucleotide sequence in the loop and the target modulated by the nucleotide sequence in the stem may be the same or different. Vectors, methods of regulating target expression, methods of providing a cell, and methods of treating conditions comprising the nucleotide sequence are also disclosed.

  19. Combined hairpin-antisense compositions and methods for modulating expression

    DOEpatents

    Shanklin, John; Nguyen, Tam Huu

    2015-11-24

    A nucleotide construct comprising a nucleotide sequence that forms a stem and a loop, wherein the loop comprises a nucleotide sequence that modulates expression of a target, wherein the stem comprises a nucleotide sequence that modulates expression of a target, and wherein the target modulated by the nucleotide sequence in the loop and the target modulated by the nucleotide sequence in the stem may be the same or different. Vectors, methods of regulating target expression, methods of providing a cell, and methods of treating conditions comprising the nucleotide sequence are also disclosed.

  20. A clustering package for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Model.

    PubMed

    Bruneau, Marine; Mottet, Thierry; Moulin, Serge; Kerbiriou, Maël; Chouly, Franz; Chretien, Stéphane; Guyeux, Christophe

    2018-02-01

    In this article, a new Python package for nucleotide sequences clustering is proposed. This package, freely available on-line, implements a Laplacian eigenmap embedding and a Gaussian Mixture Model for DNA clustering. It takes nucleotide sequences as input, and produces the optimal number of clusters along with a relevant visualization. Despite the fact that we did not optimise the computational speed, our method still performs reasonably well in practice. Our focus was mainly on data analytics and accuracy and as a result, our approach outperforms the state of the art, even in the case of divergent sequences. Furthermore, an a priori knowledge on the number of clusters is not required here. For the sake of illustration, this method is applied on a set of 100 DNA sequences taken from the mitochondrially encoded NADH dehydrogenase 3 (ND3) gene, extracted from a collection of Platyhelminthes and Nematoda species. The resulting clusters are tightly consistent with the phylogenetic tree computed using a maximum likelihood approach on gene alignment. They are coherent too with the NCBI taxonomy. Further test results based on synthesized data are then provided, showing that the proposed approach is better able to recover the clusters than the most widely used software, namely Cd-hit-est and BLASTClust. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Sequence search on a supercomputer.

    PubMed

    Gotoh, O; Tagashira, Y

    1986-01-10

    A set of programs was developed for searching nucleic acid and protein sequence data bases for sequences similar to a given sequence. The programs, written in FORTRAN 77, were optimized for vector processing on a Hitachi S810-20 supercomputer. A search of a 500-residue protein sequence against the entire PIR data base Ver. 1.0 (1) (0.5 M residues) is carried out in a CPU time of 45 sec. About 4 min is required for an exhaustive search of a 1500-base nucleotide sequence against all mammalian sequences (1.2M bases) in Genbank Ver. 29.0. The CPU time is reduced to about a quarter with a faster version.

  2. Effects of transcriptional start site sequence and position on nucleotide-sensitive selection of alternative start sites at the pyrC promoter in Escherichia coli.

    PubMed Central

    Liu, J; Turnbough, C L

    1994-01-01

    In Escherichia coli, expression of the pyrC gene is regulated primarily by a translational control mechanism based on nucleotide-sensitive selection of transcriptional start sites at the pyrC promoter. When intracellular levels of CTP are high, pyrC transcripts are initiated predominantly with CTP at a site 7 bases downstream of the Pribnow box. These transcripts form a stable hairpin at their 5' ends that blocks ribosome binding. When the CTP level is low and the GTP level is high, conditions found in pyrimidine-limited cells, transcripts are initiated primarily with GTP at a site 9 bases downstream of the Pribnow box. These shorter transcripts are unable to form a hairpin at their 5' ends and are readily translated. In this study, we examined the effects of nucleotide sequence and position on the selection of transcriptional start sites at the pyrC promoter. We characterized promoter mutations that systematically alter the sequence at position 7 or 9 downstream of the Pribnow box or vary the spacing between the Pribnow box and wild-type transcriptional initiation region. The results reveal preferences for particular initiating nucleotides (ATP > or = GTP > UTP >> CTP) and for starting positions downstream of the Pribnow box (7 >> 6 and 8 > 9 > 10). The results indicate that optimal nucleotide-sensitive start site switching at the wild-type pyrC promoter is the result of competition between the preferred start site (position 7) that uses the poorest initiating nucleotide (CTP) and a weak start site (position 9) that uses a good initiating nucleotide (GTP). The sequence of the pyrC promoter also minimizes the synthesis of untranslatable transcripts and provides for maximum stability of the regulatory transcript hairpin. In addition, the results show that the effects of the mutations on pyrC expression and regulation are consistent with the current model for translational control. Possible effects of preferences for initiating nucleotides and start sites on the expression and regulation of other genes are discussed. Images PMID:7910603

  3. The Role of ABC Proteins in Drug-Resistant Breast Cancer Cells

    DTIC Science & Technology

    2007-04-01

    and a biotin acceptor domain) under control of the alcohol oxidase promoter (Figure 2). Upon methanol induction, the yeast expressed high levels of...as native cDNA. Therefore, we backtranslated the protein into a nucleotide sequence codon-optimized for expression in Pichia pastoris yeast. Yeast

  4. Apolipoprotein A-I mutant proteins having cysteine substitutions and polynucleotides encoding same

    DOEpatents

    Oda, Michael N [Benicia, CA; Forte, Trudy M [Berkeley, CA

    2007-05-29

    Functional Apolipoprotein A-I mutant proteins, having one or more cysteine substitutions and polynucleotides encoding same, can be used to modulate paraoxonase's arylesterase activity. These ApoA-I mutant proteins can be used as therapeutic agents to combat cardiovascular disease, atherosclerosis, acute phase response and other inflammatory related diseases. The invention also includes modifications and optimizations of the ApoA-I nucleotide sequence for purposes of increasing protein expression and optimization.

  5. Efficient Coproduction of Mannanase and Cellulase by the Transformation of a Codon-Optimized Endomannanase Gene from Aspergillus niger into Trichoderma reesei.

    PubMed

    Sun, Xianhua; Xue, Xianli; Li, Mengzhu; Gao, Fei; Hao, Zhenzhen; Huang, Huoqing; Luo, Huiying; Qin, Lina; Yao, Bin; Su, Xiaoyun

    2017-12-20

    Cellulase and mannanase are both important enzyme additives in animal feeds. Expressing the two enzymes simultaneously within one microbial host could potentially lead to cost reductions in the feeding of animals. For this purpose, we codon-optimized the Aspergillus niger Man5A gene to the codon-usage bias of Trichoderma reesei. By comparing the free energies and the local structures of the nucleotide sequences, one optimized sequence was finally selected and transformed into the T. reesei pyridine-auxotrophic strain TU-6. The codon-optimized gene was expressed to a higher level than the original one. Further expressing the codon-optimized gene in a mutated T. reesei strain through fed-batch cultivation resulted in coproduction of cellulase and mannanase up to 1376 U·mL -1 and 1204 U·mL -1 , respectively.

  6. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  7. TargetM6A: Identifying N6-Methyladenosine Sites From RNA Sequences via Position-Specific Nucleotide Propensities and a Support Vector Machine.

    PubMed

    Li, Guang-Qing; Liu, Zi; Shen, Hong-Bin; Yu, Dong-Jun

    2016-10-01

    As one of the most ubiquitous post-transcriptional modifications of RNA, N 6 -methyladenosine ( [Formula: see text]) plays an essential role in many vital biological processes. The identification of [Formula: see text] sites in RNAs is significantly important for both basic biomedical research and practical drug development. In this study, we designed a computational-based method, called TargetM6A, to rapidly and accurately target [Formula: see text] sites solely from the primary RNA sequences. Two new features, i.e., position-specific nucleotide/dinucleotide propensities (PSNP/PSDP), are introduced and combined with the traditional nucleotide composition (NC) feature to formulate RNA sequences. The extracted features are further optimized to obtain a much more compact and discriminative feature subset by applying an incremental feature selection (IFS) procedure. Based on the optimized feature subset, we trained TargetM6A on the training dataset with a support vector machine (SVM) as the prediction engine. We compared the proposed TargetM6A method with existing methods for predicting [Formula: see text] sites by performing stringent jackknife tests and independent validation tests on benchmark datasets. The experimental results show that the proposed TargetM6A method outperformed the existing methods for predicting [Formula: see text] sites and remarkably improved the prediction performances, with MCC = 0.526 and AUC = 0.818. We also provided a user-friendly web server for TargetM6A, which is publicly accessible for academic use at http://csbio.njust.edu.cn/bioinf/TargetM6A.

  8. The maize stripe virus major noncapsid protein messenger RNA transcripts contain heterogeneous leader sequences at their 5' termini.

    PubMed

    Huiet, L; Feldstein, P A; Tsai, J H; Falk, B W

    1993-12-01

    Primer extension analyses and a PCR-based cloning strategy were used to identify and characterize 5' nucleotide sequences on the maize stripe virus (MStV) RNA4 mRNA transcripts encoding the major noncapsid protein (NCP). Direct RNA sequence analysis by primer extension showed that the NCP mRNA transcripts had 10-15 nucleotides beyond the 5' terminus of the MStV RNA4 nucleotide sequence. MStV genomic RNAs isolated from ribonucleoprotein particles (RNPs) lacked the additional 5' nucleotides. cDNA clones representing the 5' region of the mRNA transcripts were constructed, and the nucleotide sequences of the 5' regions were determined for 16 clones. Each was found to have a distinct 10-15 nucleotide sequence immediately 5' of the MStV RNA4 sequence. Eleven of 16 clones had the correct MStV RNA4 5' nucleotide sequence, while five showed minor variations at or near the 5' most MStV RNA4 nucleotide. These characteristics show strong similarities to other viral mRNA transcripts which are synthesized by cap snatching.

  9. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  10. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2011-07-01 2011-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  11. Unique nucleotide sequence-guided assembly of repetitive DNA parts for synthetic biology applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Torella, JP; Lienert, F; Boehm, CR

    2014-08-07

    Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts, and they hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies-for example, repeated terminator and insulator sequences-that complicate recombination-based assembly. We and others have recently developed DNA assembly methods, which we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked withmore » UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly assembled constructs, or into high-quality combinatorial libraries in only 2-3 d. If the DNA parts must be generated from scratch, an additional 2-5 d are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques.« less

  12. Unique nucleotide sequence (UNS)-guided assembly of repetitive DNA parts for synthetic biology applications

    PubMed Central

    Torella, Joseph P.; Lienert, Florian; Boehm, Christian R.; Chen, Jan-Hung; Way, Jeffrey C.; Silver, Pamela A.

    2016-01-01

    Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts and hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies — for example repeated terminator and insulator sequences — that complicate recombination-based assembly. We and others have recently developed DNA assembly methods that we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked with UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly-assembled constructs, or into high-quality combinatorial libraries in only 2–3 days. If the DNA parts must be generated from scratch, an additional 2–5 days are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques. PMID:25101822

  13. Multiple sequence alignment using multi-objective based bacterial foraging optimization algorithm.

    PubMed

    Rani, R Ranjani; Ramyachitra, D

    2016-12-01

    Multiple sequence alignment (MSA) is a widespread approach in computational biology and bioinformatics. MSA deals with how the sequences of nucleotides and amino acids are sequenced with possible alignment and minimum number of gaps between them, which directs to the functional, evolutionary and structural relationships among the sequences. Still the computation of MSA is a challenging task to provide an efficient accuracy and statistically significant results of alignments. In this work, the Bacterial Foraging Optimization Algorithm was employed to align the biological sequences which resulted in a non-dominated optimal solution. It employs Multi-objective, such as: Maximization of Similarity, Non-gap percentage, Conserved blocks and Minimization of gap penalty. BAliBASE 3.0 benchmark database was utilized to examine the proposed algorithm against other methods In this paper, two algorithms have been proposed: Hybrid Genetic Algorithm with Artificial Bee Colony (GA-ABC) and Bacterial Foraging Optimization Algorithm. It was found that Hybrid Genetic Algorithm with Artificial Bee Colony performed better than the existing optimization algorithms. But still the conserved blocks were not obtained using GA-ABC. Then BFO was used for the alignment and the conserved blocks were obtained. The proposed Multi-Objective Bacterial Foraging Optimization Algorithm (MO-BFO) was compared with widely used MSA methods Clustal Omega, Kalign, MUSCLE, MAFFT, Genetic Algorithm (GA), Ant Colony Optimization (ACO), Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO) and Hybrid Genetic Algorithm with Artificial Bee Colony (GA-ABC). The final results show that the proposed MO-BFO algorithm yields better alignment than most widely used methods. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  14. Centrifuge: rapid and sensitive classification of metagenomic sequences

    PubMed Central

    Song, Li; Breitwieser, Florian P.

    2016-01-01

    Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space. PMID:27852649

  15. Conserved features of eukaryotic hsp70 genes revealed by comparison with the nucleotide sequence of human hsp70.

    PubMed Central

    Hunt, C; Morimoto, R I

    1985-01-01

    We have determined the nucleotide sequence of the human hsp70 gene and 5' flanking region. The hsp70 gene is transcribed as an uninterrupted primary transcript of 2440 nucleotides composed of a 5' noncoding leader sequence of 212 nucleotides, a 3' noncoding region of 242 nucleotides, and a continuous open reading frame of 1986 nucleotides that encodes a protein with predicted molecular mass of 69,800 daltons. Upstream of the 5' terminus are the canonical TATAAA box, the sequence ATTGG that corresponds in the inverted orientation to the CCAAT motif, and the dyad sequence CTGGAAT/ATTCCCG that shares homology in 12 of 14 positions with the consensus transcription regulatory sequence common to Drosophila heat shock genes. Comparison of the predicted amino acid sequences of human hsp70 with the published sequences of Drosophila hsp70 and Escherichia coli dnaK reveals that human hsp70 is 73% identical to Drosophila hsp70 and 47% identical to E. coli dnaK. Surprisingly, the nucleotide sequences of the human and Drosophila genes are 72% identical and human and E. coli genes are 50% identical, which is more highly conserved than necessary given the degeneracy of the genetic code. The lack of accumulated silent nucleotide substitutions leads us to propose that there may be additional information in the nucleotide sequence of the hsp70 gene or the corresponding mRNA that precludes the maximum divergence allowed in the silent codon positions. PMID:3931075

  16. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... DEPARTMENT OF COMMERCE Patent and Trademark Office Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request... Patent applications that contain nucleotide and/or amino acid sequence disclosures must include a copy of...

  17. L1-mediated retrotransposition of murine B1 and B2 SINEs recapitulated in cultured cells.

    PubMed

    Dewannieux, Marie; Heidmann, Thierry

    2005-06-03

    SINEs are short interspersed nucleotide elements with transpositional activity, present at a high copy number (up to a million) in mammalian genomes. They are 80-400 bp long, non-coding sequences which derive either from the 7SL RNA (e.g. human Alus, murine B1s) or tRNA (e.g. murine B2s) polymerase III-driven genes. We have previously demonstrated that Alus very efficiently divert the enzymatic machinery of the autonomous L1 LINE (long interspersed nucleotide element) retrotransposons to transpose at a high rate. Here we show, using an ex vivo assay for transposition, that both B1 and B2 SINEs can be mobilized by murine LINEs, with the hallmarks of a bona fide retrotransposition process, including target site duplications of varying lengths and integrations into A-rich sequences. Despite different phylogenetic origins, transposition of the tRNA-derived B2 sequences is as efficient as that of the human Alus, whereas that of B1s is 20-100-fold lower despite a similar high copy number of these elements in the mouse genome. We provide evidence, via an appropriate nucleotide substitution within the B1 sequence in a domain essential for its intracellular targeting, that the current B1 SINEs are not optimal for transposition, a feature most probably selected for the host sake in the course of evolution.

  18. Nucleotide sequences specific to Yersinia pestis and methods for the detection of Yersinia pestis

    DOEpatents

    McCready, Paula M [Tracy, CA; Radnedge, Lyndsay [San Mateo, CA; Andersen, Gary L [Berkeley, CA; Ott, Linda L [Livermore, CA; Slezak, Thomas R [Livermore, CA; Kuczmarski, Thomas A [Livermore, CA; Motin, Vladinir L [League City, TX

    2009-02-24

    Nucleotide sequences specific to Yersinia pestis that serve as markers or signatures for identification of this bacterium were identified. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  19. Nucleotide sequences specific to Brucella and methods for the detection of Brucella

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McCready, Paula M; Radnedge, Lyndsay; Andersen, Gary L

    Nucleotide sequences specific to Brucella that serves as a marker or signature for identification of this bacterium were identified. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  20. Droplet-based pyrosequencing using digital microfluidics.

    PubMed

    Boles, Deborah J; Benton, Jonathan L; Siew, Germaine J; Levy, Miriam H; Thwar, Prasanna K; Sandahl, Melissa A; Rouse, Jeremy L; Perkins, Lisa C; Sudarsan, Arjun P; Jalili, Roxana; Pamula, Vamsee K; Srinivasan, Vijay; Fair, Richard B; Griffin, Peter B; Eckhardt, Allen E; Pollack, Michael G

    2011-11-15

    The feasibility of implementing pyrosequencing chemistry within droplets using electrowetting-based digital microfluidics is reported. An array of electrodes patterned on a printed-circuit board was used to control the formation, transportation, merging, mixing, and splitting of submicroliter-sized droplets contained within an oil-filled chamber. A three-enzyme pyrosequencing protocol was implemented in which individual droplets contained enzymes, deoxyribonucleotide triphosphates (dNTPs), and DNA templates. The DNA templates were anchored to magnetic beads which enabled them to be thoroughly washed between nucleotide additions. Reagents and protocols were optimized to maximize signal over background, linearity of response, cycle efficiency, and wash efficiency. As an initial demonstration of feasibility, a portion of a 229 bp Candida parapsilosis template was sequenced using both a de novo protocol and a resequencing protocol. The resequencing protocol generated over 60 bp of sequence with 100% sequence accuracy based on raw pyrogram levels. Excellent linearity was observed for all of the homopolymers (two, three, or four nucleotides) contained in the C. parapsilosis sequence. With improvements in microfluidic design it is expected that longer reads, higher throughput, and improved process integration (i.e., "sample-to-sequence" capability) could eventually be achieved using this low-cost platform.

  1. Single-Molecule Counting of Point Mutations by Transient DNA Binding

    NASA Astrophysics Data System (ADS)

    Su, Xin; Li, Lidan; Wang, Shanshan; Hao, Dandan; Wang, Lei; Yu, Changyuan

    2017-03-01

    High-confidence detection of point mutations is important for disease diagnosis and clinical practice. Hybridization probes are extensively used, but are hindered by their poor single-nucleotide selectivity. Shortening the length of DNA hybridization probes weakens the stability of the probe-target duplex, leading to transient binding between complementary sequences. The kinetics of probe-target binding events are highly dependent on the number of complementary base pairs. Here, we present a single-molecule assay for point mutation detection based on transient DNA binding and use of total internal reflection fluorescence microscopy. Statistical analysis of single-molecule kinetics enabled us to effectively discriminate between wild type DNA sequences and single-nucleotide variants at the single-molecule level. A higher single-nucleotide discrimination is achieved than in our previous work by optimizing the assay conditions, which is guided by statistical modeling of kinetics with a gamma distribution. The KRAS c.34 A mutation can be clearly differentiated from the wild type sequence (KRAS c.34 G) at a relative abundance as low as 0.01% mutant to WT. To demonstrate the feasibility of this method for analysis of clinically relevant biological samples, we used this technology to detect mutations in single-stranded DNA generated from asymmetric RT-PCR of mRNA from two cancer cell lines.

  2. Base Preferences in Non-Templated Nucleotide Incorporation by MMLV-Derived Reverse Transcriptases

    PubMed Central

    Zajac, Pawel; Islam, Saiful; Hochgerner, Hannah; Lönnerberg, Peter; Linnarsson, Sten

    2013-01-01

    Reverse transcriptases derived from Moloney Murine Leukemia Virus (MMLV) have an intrinsic terminal transferase activity, which causes the addition of a few non-templated nucleotides at the 3´ end of cDNA, with a preference for cytosine. This mechanism can be exploited to make the reverse transcriptase switch template from the RNA molecule to a secondary oligonucleotide during first-strand cDNA synthesis, and thereby to introduce arbitrary barcode or adaptor sequences in the cDNA. Because the mechanism is relatively efficient and occurs in a single reaction, it has recently found use in several protocols for single-cell RNA sequencing. However, the base preference of the terminal transferase activity is not known in detail, which may lead to inefficiencies in template switching when starting from tiny amounts of mRNA. Here, we used fully degenerate oligos to determine the exact base preference at the template switching site up to a distance of ten nucleotides. We found a strong preference for guanosine at the first non-templated nucleotide, with a greatly reduced bias at progressively more distant positions. Based on this result, and a number of careful optimizations, we report conditions for efficient template switching for cDNA amplification from single cells. PMID:24392002

  3. MaxAlign: maximizing usable data in an alignment.

    PubMed

    Gouveia-Oliveira, Rodrigo; Sackett, Peter W; Pedersen, Anders G

    2007-08-28

    The presence of gaps in an alignment of nucleotide or protein sequences is often an inconvenience for bioinformatical studies. In phylogenetic and other analyses, for instance, gapped columns are often discarded entirely from the alignment. MaxAlign is a program that optimizes the alignment prior to such analyses. Specifically, it maximizes the number of nucleotide (or amino acid) symbols that are present in gap-free columns - the alignment area - by selecting the optimal subset of sequences to exclude from the alignment. MaxAlign can be used prior to phylogenetic and bioinformatical analyses as well as in other situations where this form of alignment improvement is useful. In this work we test MaxAlign's performance in these tasks and compare the accuracy of phylogenetic estimates including and excluding gapped columns from the analysis, with and without processing with MaxAlign. In this paper we also introduce a new simple measure of tree similarity, Normalized Symmetric Similarity (NSS) that we consider useful for comparing tree topologies. We demonstrate how MaxAlign is helpful in detecting misaligned or defective sequences without requiring manual inspection. We also show that it is not advisable to exclude gapped columns from phylogenetic analyses unless MaxAlign is used first. Finally, we find that the sequences removed by MaxAlign from an alignment tend to be those that would otherwise be associated with low phylogenetic accuracy, and that the presence of gaps in any given sequence does not seem to disturb the phylogenetic estimates of other sequences. The MaxAlign web-server is freely available online at http://www.cbs.dtu.dk/services/MaxAlign where supplementary information can also be found. The program is also freely available as a Perl stand-alone package.

  4. In vitro optimization of truncated stem-loop II variants of the hammerhead ribozyme for cleavage in low concentrations of magnesium under non-turnover conditions.

    PubMed Central

    Zillmann, M; Limauro, S E; Goodchild, J

    1997-01-01

    By truncating helix II to two base pairs in a hammerhead ribozyme having long flanking sequences (greater than 30 bases), the rate of cleavage in 1 mM magnesium can be increased roughly 100-fold. Replacing most of the nucleotides in a typical stem-loop II with 1-4 randomized nucleotides gave an RNA library that, even before selection, was more active in 1 mM magnesium than the parent ribozyme, but considerably less active than the truncated stem-loop II ribozyme. A novel, multiround selection for intermolecular cleavage was exploited to optimize this library for cleavage in low concentrations of magnesium. After three rounds of selection at sequentially lower concentrations of magnesium, the library cleaved substrate RNA 20-fold faster than the initial pool and was cloned. This pool was heavily enriched for one particular sequence (5'-CGUG-3') that represented 16 of 52 isolates (the next most common sequence was represented only six times). This sequence also represented the most active sequence, exceeding the activity of the short helix II variant under the conditions of the selection, thereby demonstrating the effectiveness of the selection technique. Analysis of the cleavage rates of RNAs made from eight isolates having different four-base insert sequences allowed assignment of highly preferred bases at each position in the insert. Analysis of pool clones having insert of differing lengths showed that, in general, activity decreased as the length of the insert decreased from 4 to 1. This supports the suggested role of stem-loop II in stabilizing the non-Watson-Crick interactions between the conserved bases of the catalytic core. PMID:9214657

  5. Complete nucleotide sequence of a monopartite Begomovirus and associated satellites infecting Carica papaya in Nepal.

    PubMed

    Shahid, M S; Yoshida, S; Khatri-Chhetri, G B; Briddon, R W; Natsuaki, K T

    2013-06-01

    Carica papaya (papaya) is a fruit crop that is cultivated mostly in kitchen gardens throughout Nepal. Leaf samples of C. papaya plants with leaf curling, vein darkening, vein thickening, and a reduction in leaf size were collected from a garden in Darai village, Rampur, Nepal in 2010. Full-length clones of a monopartite Begomovirus, a betasatellite and an alphasatellite were isolated. The complete nucleotide sequence of the Begomovirus showed the arrangement of genes typical of Old World begomoviruses with the highest nucleotide sequence identity (>99 %) to an isolate of Ageratum yellow vein virus (AYVV), confirming it as an isolate of AYVV. The complete nucleotide sequence of betasatellite showed greater than 89 % nucleotide sequence identity to an isolate of Tomato leaf curl Java betasatellite originating from Indonesian. The sequence of the alphasatellite displayed 92 % nucleotide sequence identity to Sida yellow vein China alphasatellite. This is the first identification of these components in Nepal and the first time they have been identified in papaya.

  6. Nucleotide sequences encoding a thermostable alkaline protease

    DOEpatents

    Wilson, David B.; Lao, Guifang

    1998-01-01

    Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium.

  7. Nucleotide sequences specific to Francisella tularensis and methods for the detection of Francisella tularensis

    DOEpatents

    McCready, Paula M [Tracy, CA; Radnedge, Lyndsay [San Mateo, CA; Andersen, Gary L [Berkeley, CA; Ott, Linda L [Livermore, CA; Slezak, Thomas R [Livermore, CA; Kuczmarski, Thomas A [Livermore, CA; Vitalis, Elizabeth A [Livermore, CA

    2007-02-06

    Described herein is the identification of nucleotide sequences specific to Francisella tularensis that serves as a marker or signature for identification of this bacterium. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  8. Nucleotide sequences specific to Francisella tularensis and methods for the detection of Francisella tularensis

    DOEpatents

    McCready, Paula M [Tracy, CA; Radnedge, Lyndsay [San Mateo, CA; Andersen, Gary L [Berkeley, CA; Ott, Linda L [Livermore, CA; Slezak, Thomas R [Livermore, CA; Kuczmarski, Thomas A [Livermore, CA; Vitalis, Elizabeth A [Livermore, CA

    2009-02-24

    Described herein is the identification of nucleotide sequences specific to Francisella tularensis that serves as a marker or signature for identification of this bacterium. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  9. Empirical Bayes Estimation of Coalescence Times from Nucleotide Sequence Data.

    PubMed

    King, Leandra; Wakeley, John

    2016-09-01

    We demonstrate the advantages of using information at many unlinked loci to better calibrate estimates of the time to the most recent common ancestor (TMRCA) at a given locus. To this end, we apply a simple empirical Bayes method to estimate the TMRCA. This method is both asymptotically optimal, in the sense that the estimator converges to the true value when the number of unlinked loci for which we have information is large, and has the advantage of not making any assumptions about demographic history. The algorithm works as follows: we first split the sample at each locus into inferred left and right clades to obtain many estimates of the TMRCA, which we can average to obtain an initial estimate of the TMRCA. We then use nucleotide sequence data from other unlinked loci to form an empirical distribution that we can use to improve this initial estimate. Copyright © 2016 by the Genetics Society of America.

  10. Nucleotide sequences encoding a thermostable alkaline protease

    DOEpatents

    Wilson, D.B.; Lao, G.

    1998-01-06

    Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium. 3 figs.

  11. Choice of Reference Sequence and Assembler for Alignment of Listeria monocytogenes Short-Read Sequence Data Greatly Influences Rates of Error in SNP Analyses

    PubMed Central

    Pightling, Arthur W.; Petronella, Nicholas; Pagotto, Franco

    2014-01-01

    The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should test a variety of conditions to achieve optimal results. PMID:25144537

  12. Centrifuge: rapid and sensitive classification of metagenomic sequences.

    PubMed

    Kim, Daehwan; Song, Li; Breitwieser, Florian P; Salzberg, Steven L

    2016-12-01

    Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space. © 2016 Kim et al.; Published by Cold Spring Harbor Laboratory Press.

  13. Short-read, high-throughput sequencing technology for STR genotyping

    PubMed Central

    Bornman, Daniel M.; Hester, Mark E.; Schuetter, Jared M.; Kasoji, Manjula D.; Minard-Smith, Angela; Barden, Curt A.; Nelson, Scott C.; Godbold, Gene D.; Baker, Christine H.; Yang, Boyu; Walther, Jacquelyn E.; Tornes, Ivan E.; Yan, Pearlly S.; Rodriguez, Benjamin; Bundschuh, Ralf; Dickens, Michael L.; Young, Brian A.; Faith, Seth A.

    2013-01-01

    DNA-based methods for human identification principally rely upon genotyping of short tandem repeat (STR) loci. Electrophoretic-based techniques for variable-length classification of STRs are universally utilized, but are limited in that they have relatively low throughput and do not yield nucleotide sequence information. High-throughput sequencing technology may provide a more powerful instrument for human identification, but is not currently validated for forensic casework. Here, we present a systematic method to perform high-throughput genotyping analysis of the Combined DNA Index System (CODIS) STR loci using short-read (150 bp) massively parallel sequencing technology. Open source reference alignment tools were optimized to evaluate PCR-amplified STR loci using a custom designed STR genome reference. Evaluation of this approach demonstrated that the 13 CODIS STR loci and amelogenin (AMEL) locus could be accurately called from individual and mixture samples. Sensitivity analysis showed that as few as 18,500 reads, aligned to an in silico referenced genome, were required to genotype an individual (>99% confidence) for the CODIS loci. The power of this technology was further demonstrated by identification of variant alleles containing single nucleotide polymorphisms (SNPs) and the development of quantitative measurements (reads) for resolving mixed samples. PMID:25621315

  14. Droplet-Based Pyrosequencing Using Digital Microfluidics

    PubMed Central

    Boles, Deborah J.; Benton, Jonathan L.; Siew, Germaine J.; Levy, Miriam H.; Thwar, Prasanna K.; Sandahl, Melissa A.; Rouse, Jeremy L.; Perkins, Lisa C.; Sudarsan, Arjun P.; Jalili, Roxana; Pamula, Vamsee K.; Srinivasan, Vijay; Fair, Richard B.; Griffin, Peter B.; Eckhardt, Allen E.; Pollack, Michael G.

    2013-01-01

    The feasibility of implementing pyrosequencing chemistry within droplets using electrowetting-based digital microfluidics is reported. An array of electrodes patterned on a printed-circuit board was used to control the formation, transportation, merging, mixing, and splitting of submicroliter-sized droplets contained within an oil-filled chamber. A three-enzyme pyrosequencing protocol was implemented in which individual droplets contained enzymes, deoxyribonucleotide triphosphates (dNTPs), and DNA templates. The DNA templates were anchored to magnetic beads which enabled them to be thoroughly washed between nucleotide additions. Reagents and protocols were optimized to maximize signal over background, linearity of response, cycle efficiency, and wash efficiency. As an initial demonstration of feasibility, a portion of a 229 bp Candida parapsilosis template was sequenced using both a de novo protocol and a resequencing protocol. The resequencing protocol generated over 60 bp of sequence with 100% sequence accuracy based on raw pyrogram levels. Excellent linearity was observed for all of the homopolymers (two, three, or four nucleotides) contained in the C. parapsilosis sequence. With improvements in microfluidic design it is expected that longer reads, higher throughput, and improved process integration (i.e., “sample-to-sequence” capability) could eventually be achieved using this low-cost platform. PMID:21932784

  15. Characterization of apple stem grooving virus and apple chlorotic leaf spot virus identified in a crab apple tree.

    PubMed

    Li, Yongqiang; Deng, Congliang; Bian, Yong; Zhao, Xiaoli; Zhou, Qi

    2017-04-01

    Apple stem grooving virus (ASGV), apple chlorotic leaf spot virus (ACLSV), and prunus necrotic ringspot virus (PNRSV) were identified in a crab apple tree by small RNA deep sequencing. The complete genome sequence of ACLSV isolate BJ (ACLSV-BJ) was 7554 nucleotides and shared 67.0%-83.0% nucleotide sequence identity with other ACLSV isolates. A phylogenetic tree based on the complete genome sequence of all available ACLSV isolates showed that ACLSV-BJ clustered with the isolates SY01 from hawthorn, MO5 from apple, and JB, KMS and YH from pear. The complete nucleotide sequence of ASGV-BJ was 6509 nucleotides (nt) long and shared 78.2%-80.7% nucleotide sequence identity with other isolates. ASGV-BJ and the isolate ASGV_kfp clustered together in the phylogenetic tree as an independent clade. Recombination analysis showed that isolate ASGV-BJ was a naturally occurring recombinant.

  16. Universal digital high-resolution melt: a novel approach to broad-based profiling of heterogeneous biological samples.

    PubMed

    Fraley, Stephanie I; Hardick, Justin; Masek, Billie J; Jo Masek, Billie; Athamanolap, Pornpat; Rothman, Richard E; Gaydos, Charlotte A; Carroll, Karen C; Wakefield, Teresa; Wang, Tza-Huei; Yang, Samuel

    2013-10-01

    Comprehensive profiling of nucleic acids in genetically heterogeneous samples is important for clinical and basic research applications. Universal digital high-resolution melt (U-dHRM) is a new approach to broad-based PCR diagnostics and profiling technologies that can overcome issues of poor sensitivity due to contaminating nucleic acids and poor specificity due to primer or probe hybridization inaccuracies for single nucleotide variations. The U-dHRM approach uses broad-based primers or ligated adapter sequences to universally amplify all nucleic acid molecules in a heterogeneous sample, which have been partitioned, as in digital PCR. Extensive assay optimization enables direct sequence identification by algorithm-based matching of melt curve shape and Tm to a database of known sequence-specific melt curves. We show that single-molecule detection and single nucleotide sensitivity is possible. The feasibility and utility of U-dHRM is demonstrated through detection of bacteria associated with polymicrobial blood infection and microRNAs (miRNAs) associated with host response to infection. U-dHRM using broad-based 16S rRNA gene primers demonstrates universal single cell detection of bacterial pathogens, even in the presence of larger amounts of contaminating bacteria; U-dHRM using universally adapted Lethal-7 miRNAs in a heterogeneous mixture showcases the single copy sensitivity and single nucleotide specificity of this approach.

  17. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  18. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  19. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  20. Labeled nucleotide phosphate (NP) probes

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2009-02-03

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  1. The primary structure of the Saccharomyces cerevisiae gene for 3-phosphoglycerate kinase.

    PubMed Central

    Hitzeman, R A; Hagie, F E; Hayflick, J S; Chen, C Y; Seeburg, P H; Derynck, R

    1982-01-01

    The DNA sequence of the gene for the yeast glycolytic enzyme, 3-phosphoglycerate kinase (PGK), has been obtained by sequencing part of a 3.1 kbp HindIII fragment obtained from the yeast genome. The structural gene sequence corresponds to a reading frame of 1251 bp coding for 416 amino acids with no intervening DNA sequences. The amino acid sequence is approximately 65 percent homologous with human and horse PGK protein sequences and is in general agreement with the published protein sequence for yeast PGK. As for other highly expressed structural genes in yeast, the coding sequence is highly codon biased with 95 percent of the amino acids coded for by a select 25 codons (out of 61 possible). Besides structural DNA sequence, 291 bp of 5'-flanking sequence and 286 bp of 3'-flanking sequence were determined. Transcription starts 36 nucleotides upstream from the translational start and stops 86-93 nucleotides downstream from the translational stop. These results suggest a non-polyadenylated mRNA length of 1373 to 1380 nucleotides, which is consistent with the observed length of 1500 nucleotides for polyadenylated PGK mRNA. A sequence TATATATAAA is found at 145 nucleotides upstream from the translational start. This sequence resembles the TATAAA box that is possibly associated with RNA polymerase II binding. Images PMID:6296791

  2. 37 CFR 5.31-5.33 - [Reserved

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... from abandonment 1.135 Amino Acid Sequences. (See Nucleotide and/or Amino Acid Sequences) Appeal to... Appeals and Interference 41.47 Of rejection of an application 1.104(a) Nucleotide and/or Amino Acid...) Symbols for nucleotide and/or amino acid sequence data 1.822 T Tables in patent applications 1.58 Terminal...

  3. Codon Optimization of the Human Papillomavirus E7 Oncogene Induces a CD8+ T Cell Response to a Cryptic Epitope Not Harbored by Wild-Type E7

    PubMed Central

    Lorenz, Felix K. M.; Wilde, Susanne; Voigt, Katrin; Kieback, Elisa; Mosetter, Barbara; Schendel, Dolores J.; Uckert, Wolfgang

    2015-01-01

    Codon optimization of nucleotide sequences is a widely used method to achieve high levels of transgene expression for basic and clinical research. Until now, immunological side effects have not been described. To trigger T cell responses against human papillomavirus, we incubated T cells with dendritic cells that were pulsed with RNA encoding the codon-optimized E7 oncogene. All T cell receptors isolated from responding T cell clones recognized target cells expressing the codon-optimized E7 gene but not the wild type E7 sequence. Epitope mapping revealed recognition of a cryptic epitope from the +3 alternative reading frame of codon-optimized E7, which is not encoded by the wild type E7 sequence. The introduction of a stop codon into the +3 alternative reading frame protected the transgene product from recognition by T cell receptor gene-modified T cells. This is the first experimental study demonstrating that codon optimization can render a transgene artificially immunogenic through generation of a dominant cryptic epitope. This finding may be of great importance for the clinical field of gene therapy to avoid rejection of gene-corrected cells and for the design of DNA- and RNA-based vaccines, where codon optimization may artificially add a strong immunogenic component to the vaccine. PMID:25799237

  4. Nucleotide sequence and genetic organization of barley stripe mosaic virus RNA gamma.

    PubMed

    Gustafson, G; Hunter, B; Hanau, R; Armour, S L; Jackson, A O

    1987-06-01

    The complete nucleotide sequences of RNA gamma from the Type and ND18 strains of barley stripe mosaic virus (BSMV) have been determined. The sequences are 3164 (Type) and 2791 (ND18) nucleotides in length. Both sequences contain a 5'-noncoding region (87 or 88 nucleotides) which is followed by a long open reading frame (ORF1). A 42-nucleotide intercistronic region separates ORF1 from a second, shorter open reading frame (ORF2) located near the 3'-end of the RNA. There is a high degree of homology between the Type and ND18 strains in the nucleotide sequence of ORF1. However, the Type strain contains a 366 nucleotide direct tandem repeat within ORF1 which is absent in the ND18 strain. Consequently, the predicted translation product of Type RNA gamma ORF1 (mol wt 87,312) is significantly larger than that of ND18 RNA gamma ORF1 (mol wt 74,011). The amino acid sequence of the ORF1 polypeptide contains homologies with putative RNA polymerases from other RNA viruses, suggesting that this protein may function in replication of the BSMV genome. The nucleotide sequence of RNA gamma ORF2 is nearly identical in the Type and ND18 strains. ORF2 codes for a polypeptide with a predicted molecular weight of 17,209 (Type) or 17,074 (ND18) which is known to be translated from a subgenomic (sg) RNA. The initiation point of this sgRNA has been mapped to a location 27 nucleotides upstream of the ORF2 initiation codon in the intercistronic region between ORF1 and ORF2. The sgRNA is not coterminal with the 3'-end of the genomic RNA, but instead contains heterogeneous poly(A) termini up to 150 nucleotides long (J. Stanley, R. Hanau, and A. O. Jackson, 1984, Virology 139, 375-383). In the genomic RNA gamma, ORF2 is followed by a short poly(A) tract and a 238-nucleotide tRNA-like structure.

  5. Nucleotide cleaving agents and method

    DOEpatents

    Que, Jr., Lawrence; Hanson, Richard S.; Schnaith, Leah M. T.

    2000-01-01

    The present invention provides a unique series of nucleotide cleaving agents and a method for cleaving a nucleotide sequence, whether single-stranded or double-stranded DNA or RNA, using and a cationic metal complex having at least one polydentate ligand to cleave the nucleotide sequence phosphate backbone to yield a hydroxyl end and a phosphate end.

  6. Nucleotide sequence analysis of the recA gene and discrimination of the three isolates of urease-positive thermophilic Campylobacter (UPTC) isolated from seagulls (Larus spp.) in Northern Ireland.

    PubMed

    Matsuda, M; Tai, K; Moore, J E; Millar, B C; Murayama, O

    2004-01-01

    Nucleotide sequencing after TA cloning of the amplicon of the almost-full length recA gene from three strains of UPTC (A1, A2, and A3) isolated from seagulls in Northern Ireland, the phenotypical and genotypical characteristics of which have been demonstrated to be indistinguishable, clarified nucleotide differences at three nucleotide positions among the three strains. In conclusion, the nucleotide sequences of the recA gene were found to discriminate among the three strains of UPTC, A1, A2, and A3, which are indistinguishable phenotypically and genotypically. Thus, the present study strongly suggests that nucleotide sequence data of the amplicon of a suitable gene or region could aid in discriminating among isolates of the UPTC group, which are indistinguishable phenotypically and genotypically. Copyright 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

  7. Nucleic acid analysis using terminal-phosphate-labeled nucleotides

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2008-04-22

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  8. Nucleotide sequence analysis of the L gene of Newcastle disease virus: homologies with Sendai and vesicular stomatitis viruses.

    PubMed Central

    Yusoff, K; Millar, N S; Chambers, P; Emmerson, P T

    1987-01-01

    The nucleotide sequence of the L gene of the Beaudette C strain of Newcastle disease virus (NDV) has been determined. The L gene is 6704 nucleotides long and encodes a protein of 2204 amino acids with a calculated molecular weight of 248822. Mung bean nuclease mapping of the 5' terminus of the L gene mRNA indicates that the transcription of the L gene is initiated 11 nucleotides upstream of the translational start site. Comparison with the amino acid sequences of the L genes of Sendai virus and vesicular stomatitis virus (VSV) suggests that there are several regions of homology between the sequences. These data provide further evidence for an evolutionary relationship between the Paramyxoviridae and the Rhabdoviridae. A non-coding sequence of 46 nucleotides downstream of the presumed polyadenylation site of the L gene may be part of a negative strand leader RNA. Images PMID:3035486

  9. Software for optimization of SNP and PCR-RFLP genotyping to discriminate many genomes with the fewest assays

    PubMed Central

    Gardner, Shea N; Wagner, Mark C

    2005-01-01

    Background Microbial forensics is important in tracking the source of a pathogen, whether the disease is a naturally occurring outbreak or part of a criminal investigation. Results A method and SPR Opt (SNP and PCR-RFLP Optimization) software to perform a comprehensive, whole-genome analysis to forensically discriminate multiple sequences is presented. Tools for the optimization of forensic typing using Single Nucleotide Polymorphism (SNP) and PCR-Restriction Fragment Length Polymorphism (PCR-RFLP) analyses across multiple isolate sequences of a species are described. The PCR-RFLP analysis includes prediction and selection of optimal primers and restriction enzymes to enable maximum isolate discrimination based on sequence information. SPR Opt calculates all SNP or PCR-RFLP variations present in the sequences, groups them into haplotypes according to their co-segregation across those sequences, and performs combinatoric analyses to determine which sets of haplotypes provide maximal discrimination among all the input sequences. Those set combinations requiring that membership in the fewest haplotypes be queried (i.e. the fewest assays be performed) are found. These analyses highlight variable regions based on existing sequence data. These markers may be heterogeneous among unsequenced isolates as well, and thus may be useful for characterizing the relationships among unsequenced as well as sequenced isolates. The predictions are multi-locus. Analyses of mumps and SARS viruses are summarized. Phylogenetic trees created based on SNPs, PCR-RFLPs, and full genomes are compared for SARS virus, illustrating that purported phylogenies based only on SNP or PCR-RFLP variations do not match those based on multiple sequence alignment of the full genomes. Conclusion This is the first software to optimize the selection of forensic markers to maximize information gained from the fewest assays, accepting whole or partial genome sequence data as input. As more sequence data becomes available for multiple strains and isolates of a species, automated, computational approaches such as those described here will be essential to make sense of large amounts of information, and to guide and optimize efforts in the laboratory. The software and source code for SPR Opt is publicly available and free for non-profit use at . PMID:15904493

  10. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2014-07-01 2014-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...

  11. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...

  12. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2012-07-01 2012-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...

  13. Synthetic oligonucleotide probes deduced from amino acid sequence data. Theoretical and practical considerations.

    PubMed

    Lathe, R

    1985-05-05

    Synthetic probes deduced from amino acid sequence data are widely used to detect cognate coding sequences in libraries of cloned DNA segments. The redundancy of the genetic code dictates that a choice must be made between (1) a mixture of probes reflecting all codon combinations, and (2) a single longer "optimal" probe. The second strategy is examined in detail. The frequency of sequences matching a given probe by chance alone can be determined and also the frequency of sequences closely resembling the probe and contributing to the hybridization background. Gene banks cannot be treated as random associations of the four nucleotides, and probe sequences deduced from amino acid sequence data occur more often than predicted by chance alone. Probe lengths must be increased to confer the necessary specificity. Examination of hybrids formed between unique homologous probes and their cognate targets reveals that short stretches of perfect homology occurring by chance make a significant contribution to the hybridization background. Statistical methods for improving homology are examined, taking human coding sequences as an example, and considerations of codon utilization and dinucleotide frequencies yield an overall homology of greater than 82%. Recommendations for probe design and hybridization are presented, and the choice between using multiple probes reflecting all codon possibilities and a unique optimal probe is discussed.

  14. Adenovirus EIIA early promoter: transcriptional control elements and induction by the viral pre-early EIA gene, which appears to be sequence independent.

    PubMed Central

    Murthy, S C; Bhat, G P; Thimmappaya, B

    1985-01-01

    A molecular dissection of the adenovirus EIIA early (E) promoter was undertaken to study the sequence elements required for transcription and to examine the nucleotide sequences, if any, specific for its trans-activation by the viral pre-early EIA gene product. A chimeric gene in which the EIIA-E promoter region fused to the coding sequences of the bacterial chloramphenicol acetyltransferase (CAT) gene was used in transient assays to identify the transcriptional control regions. Deletion mapping studies revealed that the upstream DNA sequences up to -86 were sufficient for the optimal basal level transcription in HeLa cells and also for the EIA-induced transcription. A series of linker-scanning (LS) mutants were constructed to precisely identify the nucleotide sequences that control transcription. Analysis of these LS mutants allowed us to identify two regions of the promoter that are critical for the EIIA-E transcription. These regions are located between -29 and -21 (region I) and between -82 and -66 (region II). Mutations in region I affected initiation and appeared functionally similar to the "TATA" sequence of the commonly studied promoters. To examine whether or not the EIIA-E promoter contained DNA sequences specific for the trans-activation by the EIA, the LS mutants were analyzed in a cotransfection assay containing a plasmid carrying the EIA gene. CAT activity of all of the LS mutants was induced by the EIA gene in this assay, suggesting that the induction of transcription of the EIIA-E promoter by the EIA gene is not sequence-specific. Images PMID:3857577

  15. Nucleotide sequence determination of guinea-pig casein B mRNA reveals homology with bovine and rat alpha s1 caseins and conservation of the non-coding regions of the mRNA.

    PubMed Central

    Hall, L; Laird, J E; Craig, R K

    1984-01-01

    Nucleotide sequence analysis of cloned guinea-pig casein B cDNA sequences has identified two casein B variants related to the bovine and rat alpha s1 caseins. Amino acid homology was largely confined to the known bovine or predicted rat phosphorylation sites and within the 'signal' precursor sequence. Comparison of the deduced nucleotide sequence of the guinea-pig and rat alpha s1 casein mRNA species showed greater sequence conservation in the non-coding than in the coding regions, suggesting a functional and possibly regulatory role for the non-coding regions of casein mRNA. The results provide insight into the evolution of the casein genes, and raise questions as to the role of conserved nucleotide sequences within the non-coding regions of mRNA species. Images Fig. 1. PMID:6548375

  16. Homogeneity of the 16S rDNA sequence among geographically disparate isolates of Taylorella equigenitalis

    PubMed Central

    Matsuda, M; Tazumi, A; Kagawa, S; Sekizuka, T; Murayama, O; Moore, JE; Millar, BC

    2006-01-01

    Background At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis) are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Results Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. Conclusion High sequence similarity (99.5% or more) was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted. PMID:16398935

  17. Complete nucleotide sequence of Alfalfa mosaic virus isolated from alfalfa (Medicago sativa L.) in Argentina.

    PubMed

    Trucco, Verónica; de Breuil, Soledad; Bejerman, Nicolás; Lenardon, Sergio; Giolitti, Fabián

    2014-06-01

    The complete nucleotide sequence of an Alfalfa mosaic virus (AMV) isolate infecting alfalfa (Medicago sativa L.) in Argentina, AMV-Arg, was determined. The virus genome has the typical organization described for AMV, and comprises 3,643, 2,593, and 2,038 nucleotides for RNA1, 2 and 3, respectively. The whole genome sequence and each encoding region were compared with those of other four isolates that have been completely sequenced from China, Italy, Spain and USA. The nucleotide identity percentages ranged from 95.9 to 99.1 % for the three RNAs and from 93.7 to 99 % for the protein 1 (P1), protein 2 (P2), movement protein and coat protein (CP) encoding regions, whereas the amino acid identity percentages of these proteins ranged from 93.4 to 99.5 %, the lowest value corresponding to P2. CP sequences of AMV-Arg were compared with those of other 25 available isolates, and the phylogenetic analysis based on the CP gene was carried out. The highest percentage of nucleotide sequence identity of the CP gene was 98.3 % with a Chinese isolate and 98.6 % at the amino acid level with four isolates, two from Italy, one from Brazil and the remaining one from China. The phylogenetic analysis showed that AMV-Arg is closely related to subgroup I of AMV isolates. To our knowledge, this is the first report of a complete nucleotide sequence of AMV from South America and the first worldwide report of complete nucleotide sequence of AMV isolated from alfalfa as natural host.

  18. Human ribosomal RNA gene: nucleotide sequence of the transcription initiation region and comparison of three mammalian genes.

    PubMed Central

    Financsek, I; Mizumoto, K; Mishima, Y; Muramatsu, M

    1982-01-01

    The transcription initiation site of the human ribosomal RNA gene (rDNA) was located by using the single-strand specific nuclease protection method and by determining the first nucleotide of the in vitro capped 45S preribosomal RNA. The sequence of 1,211 nucleotides surrounding the initiation site was determined. The sequenced region was found to consist of 75% G and C and to contain a number of short direct and inverted repeats and palindromes. By comparison of the corresponding initiation regions of three mammalian species, several conserved sequences were found upstream and downstream from the transcription starting point. Two short A + T-rich sequences are present on human, mouse, and rat ribosomal RNA genes between the initiation site and 40 nucleotides upstream, and a C + T cluster is located at a position around -60. At and downstream from the initiation site, a common sequence, T-AG-C-T-G-A-C-A-C-G-C-T-G-T-C-C-T-CT-T, was found in the three genes from position -1 through +18. The strong conservation of these sequences suggests their functional significance in rDNA. The S1 nuclease protection experiments with cloned rDNA fragments indicated the presence in human 45S RNA of molecules several hundred nucleotides shorter than the supposed primary transcript. The first 19 nucleotides of these molecules appear identical--except for one mismatch--to the nucleotide sequence of the 5' end of a supposed early processing product of the mouse 45S RNA. Images PMID:6954460

  19. Quantum Point Contact Single-Nucleotide Conductance for DNA and RNA Sequence Identification.

    PubMed

    Afsari, Sepideh; Korshoj, Lee E; Abel, Gary R; Khan, Sajida; Chatterjee, Anushree; Nagpal, Prashant

    2017-11-28

    Several nanoscale electronic methods have been proposed for high-throughput single-molecule nucleic acid sequence identification. While many studies display a large ensemble of measurements as "electronic fingerprints" with some promise for distinguishing the DNA and RNA nucleobases (adenine, guanine, cytosine, thymine, and uracil), important metrics such as accuracy and confidence of base calling fall well below the current genomic methods. Issues such as unreliable metal-molecule junction formation, variation of nucleotide conformations, insufficient differences between the molecular orbitals responsible for single-nucleotide conduction, and lack of rigorous base calling algorithms lead to overlapping nanoelectronic measurements and poor nucleotide discrimination, especially at low coverage on single molecules. Here, we demonstrate a technique for reproducible conductance measurements on conformation-constrained single nucleotides and an advanced algorithmic approach for distinguishing the nucleobases. Our quantum point contact single-nucleotide conductance sequencing (QPICS) method uses combed and electrostatically bound single DNA and RNA nucleotides on a self-assembled monolayer of cysteamine molecules. We demonstrate that by varying the applied bias and pH conditions, molecular conductance can be switched ON and OFF, leading to reversible nucleotide perturbation for electronic recognition (NPER). We utilize NPER as a method to achieve >99.7% accuracy for DNA and RNA base calling at low molecular coverage (∼12×) using unbiased single measurements on DNA/RNA nucleotides, which represents a significant advance compared to existing sequencing methods. These results demonstrate the potential for utilizing simple surface modifications and existing biochemical moieties in individual nucleobases for a reliable, direct, single-molecule, nanoelectronic DNA and RNA nucleotide identification method for sequencing.

  20. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.

    PubMed

    Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook

    2014-11-01

    As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  1. Array of nucleic acid probes on biological chips for diagnosis of HIV and methods of using the same

    DOEpatents

    Chee, Mark; Gingeras, Thomas R.; Fodor, Stephen P. A.; Hubble, Earl A.; Morris, MacDonald S.

    1999-01-19

    The invention provides an array of oligonucleotide probes immobilized on a solid support for analysis of a target sequence from a human immunodeficiency virus. The array comprises at least four sets of oligonucleotide probes 9 to 21 nucleotides in length. A first probe set has a probe corresponding to each nucleotide in a reference sequence from a human immunodeficiency virus. A probe is related to its corresponding nucleotide by being exactly complementary to a subsequence of the reference sequence that includes the corresponding nucleotide. Thus, each probe has a position, designated an interrogation position, that is occupied by a complementary nucleotide to the corresponding nucleotide. The three additional probe sets each have a corresponding probe for each probe in the first probe set. Thus, for each nucleotide in the reference sequence, there are four corresponding probes, one from each of the probe sets. The three corresponding probes in the three additional probe sets are identical to the corresponding probe from the first probe or a subsequence thereof that includes the interrogation position, except that the interrogation position is occupied by a different nucleotide in each of the four corresponding probes.

  2. The use of sequence-based SSR mining for the development of a vast collection of microsatellites in Aquilegia Formosa

    Treesearch

    Brandon Schlautman; Vera Pfeiffer; Juan Zalapa; Johanne Brunet

    2014-01-01

    Numerous microsatellite markers were developed for Aquilegia formosafrom sequences deposited within the Expressed Sequence Tag (EST), Genomic Survey Sequence (GSS), and Nucleotide databases in NCBI. Microsatellites (SSRs) were identified and primers were designed for 9 SSR containing sequences in the Nucleotide database, 3803 sequences in the EST...

  3. Switchgrass ubiquitin promoter (PVUBI2) and uses thereof

    DOEpatents

    Stewart, C. Neal; Mann, David George James

    2013-12-10

    The subject application provides polynucleotides, compositions thereof and methods for regulating gene expression in a plant. Polynucleotides disclosed herein comprise novel sequences for a promoter isolated from Panicum virgatum (switchgrass) that initiates transcription of an operably linked nucleotide sequence. Thus, various embodiments of the invention comprise the nucleotide sequence of SEQ ID NO: 2 or fragments thereof comprising nucleotides 1 to 692 of SEQ ID NO: 2 that are capable of driving the expression of an operably linked nucleic acid sequence.

  4. In vivo therapeutic potential of Dicer-hunting siRNAs targeting infectious hepatitis C virus.

    PubMed

    Watanabe, Tsunamasa; Hatakeyama, Hiroto; Matsuda-Yasui, Chiho; Sato, Yusuke; Sudoh, Masayuki; Takagi, Asako; Hirata, Yuichi; Ohtsuki, Takahiro; Arai, Masaaki; Inoue, Kazuaki; Harashima, Hideyoshi; Kohara, Michinori

    2014-04-23

    The development of RNA interference (RNAi)-based therapy faces two major obstacles: selecting small interfering RNA (siRNA) sequences with strong activity, and identifying a carrier that allows efficient delivery to target organs. Additionally, conservative region at nucleotide level must be targeted for RNAi in applying to virus because hepatitis C virus (HCV) could escape from therapeutic pressure with genome mutations. In vitro preparation of Dicer-generated siRNAs targeting a conserved, highly ordered HCV 5' untranslated region are capable of inducing strong RNAi activity. By dissecting the 5'-end of an RNAi-mediated cleavage site in the HCV genome, we identified potent siRNA sequences, which we designate as Dicer-hunting siRNAs (dh-siRNAs). Furthermore, formulation of the dh-siRNAs in an optimized multifunctional envelope-type nano device inhibited ongoing infectious HCV replication in human hepatocytes in vivo. Our efforts using both identification of optimal siRNA sequences and delivery to human hepatocytes suggest therapeutic potential of siRNA for a virus.

  5. Optimization of single-base-pair mismatch discrimination in oligonucleotide microarrays

    NASA Technical Reports Server (NTRS)

    Urakawa, Hidetoshi; El Fantroussi, Said; Smidt, Hauke; Smoot, James C.; Tribou, Erik H.; Kelly, John J.; Noble, Peter A.; Stahl, David A.

    2003-01-01

    The discrimination between perfect-match and single-base-pair-mismatched nucleic acid duplexes was investigated by using oligonucleotide DNA microarrays and nonequilibrium dissociation rates (melting profiles). DNA and RNA versions of two synthetic targets corresponding to the 16S rRNA sequences of Staphylococcus epidermidis (38 nucleotides) and Nitrosomonas eutropha (39 nucleotides) were hybridized to perfect-match probes (18-mer and 19-mer) and to a set of probes having all possible single-base-pair mismatches. The melting profiles of all probe-target duplexes were determined in parallel by using an imposed temperature step gradient. We derived an optimum wash temperature for each probe and target by using a simple formula to calculate a discrimination index for each temperature of the step gradient. This optimum corresponded to the output of an independent analysis using a customized neural network program. These results together provide an experimental and analytical framework for optimizing mismatch discrimination among all probes on a DNA microarray.

  6. Evaluation of anonymous and expressed sequence tag derived polymorphic microsatellite markers in the tobacco budworm Heliothis virescens (Lepidoptera: noctuidae)

    USDA-ARS?s Scientific Manuscript database

    Polymorphic genetic markers were identified and characterized using a partial genomic library of Heliothis virescens enriched for simple sequence repeats (SSR) and nucleotide sequences of expressed sequence tags (EST). Nucleotide sequences of 192 clones from the partial genomic library yielded 147 u...

  7. Extension of the COG and arCOG databases by amino acid and nucleotide sequences

    PubMed Central

    Meereis, Florian; Kaufmann, Michael

    2008-01-01

    Background The current versions of the COG and arCOG databases, both excellent frameworks for studies in comparative and functional genomics, do not contain the nucleotide sequences corresponding to their protein or protein domain entries. Results Using sequence information obtained from GenBank flat files covering the completely sequenced genomes of the COG and arCOG databases, we constructed NUCOCOG (nucleotide sequences containing COG databases) as an extended version including all nucleotide sequences and in addition the amino acid sequences originally utilized to construct the current COG and arCOG databases. We make available three comprehensive single XML files containing the complete databases including all sequence information. In addition, we provide a web interface as a utility suitable to browse the NUCOCOG database for sequence retrieval. The database is accessible at . Conclusion NUCOCOG offers the possibility to analyze any sequence related property in the context of the COG and arCOG framework simply by using script languages such as PERL applied to a large but single XML document. PMID:19014535

  8. DNA Nucleotide Sequence Restricted by the RI Endonuclease

    PubMed Central

    Hedgpeth, Joe; Goodman, Howard M.; Boyer, Herbert W.

    1972-01-01

    The sequence of DNA base pairs adjacent to the phosphodiester bonds cleaved by the RI restriction endonuclease in unmodified DNA from coliphage λ has been determined. The 5′-terminal nucleotide labeled with 32P and oligonucleotides up to the heptamer were analyzed from a pancreatic DNase digest. The following sequence of nucleotides adjacent to the RI break made in λ DNA was deduced from these data and from the 3′-dinucleotide sequence and nearest-neighbor analysis obtained from repair synthesis with the DNA polymerase of Rous sarcoma virus [Formula: see text] The RI endonuclease cleavage of the phosphodiester bonds (indicated by arrows) generates 5′-phosphoryls and short cohesive termini of four nucleotides, pApApTpT. The most striking feature of the sequence is its symmetry. PMID:4343974

  9. Sequence of a cDNA encoding pancreatic preprosomatostatin-22.

    PubMed Central

    Magazin, M; Minth, C D; Funckes, C L; Deschenes, R; Tavianini, M A; Dixon, J E

    1982-01-01

    We report the nucleotide sequence of a precursor to somatostatin that upon proteolytic processing may give rise to a hormone of 22 amino acids. The nucleotide sequence of a cDNA from the channel catfish (Ictalurus punctatus) encodes a precursor to somatostatin that is 105 amino acids (Mr, 11,500). The cDNA coding for somatostatin-22 consists of 36 nucleotides in the 5' untranslated region, 315 nucleotides that code for the precursor to somatostatin-22, 269 nucleotides at the 3' untranslated region, and a variable length of poly(A). The putative preprohormone contains a sequence of hydrophobic amino acids at the amino terminus that has the properties of a "signal" peptide. A connecting sequence of approximately 57 amino acids is followed by a single Arg-Arg sequence, which immediately precedes the hormone. Somatostatin-22 is homologous to somatostatin-14 in 7 of the 14 amino acids, including the Phe-Trp-Lys sequence. Hybridization selection of mRNA, followed by its translation in a wheat germ cell-free system, resulted in the synthesis of a single polypeptide having a molecular weight of approximately 10,000 as estimated on Na-DodSO4/polyacrylamide gels. Images PMID:6127673

  10. Plant nitrogen regulatory P-PII genes

    DOEpatents

    Coruzzi, Gloria M.; Lam, Hon-Ming; Hsieh, Ming-Hsiun

    2001-01-01

    The present invention generally relates to plant nitrogen regulatory PII gene (hereinafter P-PII gene), a gene involved in regulating plant nitrogen metabolism. The invention provides P-PII nucleotide sequences, expression constructs comprising said nucleotide sequences, and host cells and plants having said constructs and, optionally expressing the P-PII gene from said constructs. The invention also provides substantially pure P-PII proteins. The P-PII nucleotide sequences and constructs of the

  11. Rapid construction of insulated genetic circuits via synthetic sequence-guided isothermal assembly

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Torella, JP; Boehm, CR; Lienert, F

    2013-12-28

    In vitro recombination methods have enabled one-step construction of large DNA sequences from multiple parts. Although synthetic biological circuits can in principle be assembled in the same fashion, they typically contain repeated sequence elements such as standard promoters and terminators that interfere with homologous recombination. Here we use a computational approach to design synthetic, biologically inactive unique nucleotide sequences (UNSes) that facilitate accurate ordered assembly. Importantly, our designed UNSes make it possible to assemble parts with repeated terminator and insulator sequences, and thereby create insulated functional genetic circuits in bacteria and mammalian cells. Using UNS-guided assembly to construct repeating promoter-gene-terminatormore » parts, we systematically varied gene expression to optimize production of a deoxychromoviridans biosynthetic pathway in Escherichia coli. We then used this system to construct complex eukaryotic AND-logic gates for genomic integration into embryonic stem cells. Construction was performed by using a standardized series of UNS-bearing BioBrick-compatible vectors, which enable modular assembly and facilitate reuse of individual parts. UNS-guided isothermal assembly is broadly applicable to the construction and optimization of genetic circuits and particularly those requiring tight insulation, such as complex biosynthetic pathways, sensors, counters and logic gates.« less

  12. Synthesis and evaluations of an acid-cleavable, fluorescently labeled nucleotide as a reversible terminator for DNA sequencing.

    PubMed

    Tan, Lianjiang; Liu, Yazhi; Li, Xiaowei; Wu, Xin-Yan; Gong, Bing; Shen, Yu-Mei; Shao, Zhifeng

    2016-02-11

    An acid-cleavable linker based on a dimethylketal moiety was synthesized and used to connect a nucleotide with a fluorophore to produce a 3'-OH unblocked nucleotide analogue as an excellent reversible terminator for DNA sequencing by synthesis.

  13. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array

    PubMed Central

    Fuller, Carl W.; Kumar, Shiv; Porel, Mintu; Chien, Minchen; Bibillo, Arek; Stranges, P. Benjamin; Dorwart, Michael; Tao, Chuanjuan; Li, Zengmin; Guo, Wenjing; Shi, Shundi; Korenblum, Daniel; Trans, Andrew; Aguirre, Anne; Liu, Edward; Harada, Eric T.; Pollard, James; Bhat, Ashwini; Cech, Cynthia; Yang, Alexander; Arnold, Cleoma; Palla, Mirkó; Hovis, Jennifer; Chen, Roger; Morozova, Irina; Kalachikov, Sergey; Russo, James J.; Kasianowicz, John J.; Davis, Randy; Roever, Stefan; Church, George M.; Ju, Jingyue

    2016-01-01

    DNA sequencing by synthesis (SBS) offers a robust platform to decipher nucleic acid sequences. Recently, we reported a single-molecule nanopore-based SBS strategy that accurately distinguishes four bases by electronically detecting and differentiating four different polymer tags attached to the 5′-phosphate of the nucleotides during their incorporation into a growing DNA strand catalyzed by DNA polymerase. Further developing this approach, we report here the use of nucleotides tagged at the terminal phosphate with oligonucleotide-based polymers to perform nanopore SBS on an α-hemolysin nanopore array platform. We designed and synthesized several polymer-tagged nucleotides using tags that produce different electrical current blockade levels and verified they are active substrates for DNA polymerase. A highly processive DNA polymerase was conjugated to the nanopore, and the conjugates were complexed with primer/template DNA and inserted into lipid bilayers over individually addressable electrodes of the nanopore chip. When an incoming complementary-tagged nucleotide forms a tight ternary complex with the primer/template and polymerase, the tag enters the pore, and the current blockade level is measured. The levels displayed by the four nucleotides tagged with four different polymers captured in the nanopore in such ternary complexes were clearly distinguishable and sequence-specific, enabling continuous sequence determination during the polymerase reaction. Thus, real-time single-molecule electronic DNA sequencing data with single-base resolution were obtained. The use of these polymer-tagged nucleotides, combined with polymerase tethering to nanopores and multiplexed nanopore sensors, should lead to new high-throughput sequencing methods. PMID:27091962

  14. Molecular Cloning and Sequencing of Hemoglobin-Beta Gene of Channel Catfish, Ictalurus Punctatus Rafinesque

    USDA-ARS?s Scientific Manuscript database

    : Hemoglobin-y gene of channel catfish , lctalurus punctatus, was cloned and sequenced . Total RNA from head kidneys was isolated, reverse transcribed and amplified . The sequence of the channel catfish hemoglobin-y gene consists of 600 nucleotides . Analysis of the nucleotide sequence reveals one o...

  15. Novel methodologies for spectral classification of exon and intron sequences

    NASA Astrophysics Data System (ADS)

    Kwan, Hon Keung; Kwan, Benjamin Y. M.; Kwan, Jennifer Y. Y.

    2012-12-01

    Digital processing of a nucleotide sequence requires it to be mapped to a numerical sequence in which the choice of nucleotide to numeric mapping affects how well its biological properties can be preserved and reflected from nucleotide domain to numerical domain. Digital spectral analysis of nucleotide sequences unfolds a period-3 power spectral value which is more prominent in an exon sequence as compared to that of an intron sequence. The success of a period-3 based exon and intron classification depends on the choice of a threshold value. The main purposes of this article are to introduce novel codes for 1-sequence numerical representations for spectral analysis and compare them to existing codes to determine appropriate representation, and to introduce novel thresholding methods for more accurate period-3 based exon and intron classification of an unknown sequence. The main findings of this study are summarized as follows: Among sixteen 1-sequence numerical representations, the K-Quaternary Code I offers an attractive performance. A windowed 1-sequence numerical representation (with window length of 9, 15, and 24 bases) offers a possible speed gain over non-windowed 4-sequence Voss representation which increases as sequence length increases. A winner threshold value (chosen from the best among two defined threshold values and one other threshold value) offers a top precision for classifying an unknown sequence of specified fixed lengths. An interpolated winner threshold value applicable to an unknown and arbitrary length sequence can be estimated from the winner threshold values of fixed length sequences with a comparable performance. In general, precision increases as sequence length increases. The study contributes an effective spectral analysis of nucleotide sequences to better reveal embedded properties, and has potential applications in improved genome annotation.

  16. Complete nucleotide sequence of a novel Hibiscus-infecting Cilevirus from Florida and its relationship with closely associated Cileviruses

    USDA-ARS?s Scientific Manuscript database

    The complete nucleotide sequence of a recently discovered Florida (FL) isolate of Hibiscus infecting Cilevirus (HiCV) was determined by Sanger sequencing. The movement- and coat- protein gene sequences of the HiCV-FL isolate are more divergent than other genes of the previously sequenced HiCV-HA (Ha...

  17. Statistical analysis of nucleotide sequences of the hemagglutinin gene of human influenza A viruses.

    PubMed Central

    Ina, Y; Gojobori, T

    1994-01-01

    To examine whether positive selection operates on the hemagglutinin 1 (HA1) gene of human influenza A viruses (H1 subtype), 21 nucleotide sequences of the HA1 gene were statistically analyzed. The nucleotide sequences were divided into antigenic and nonantigenic sites. The nucleotide diversities for antigenic and nonantigenic sites of the HA1 gene were computed at synonymous and nonsynonymous sites separately. For nonantigenic sites, the nucleotide diversities were larger at synonymous sites than at nonsynonymous sites. This is consistent with the neutral theory of molecular evolution. For antigenic sites, however, the nucleotide diversities at nonsynonymous sites were larger than those at synonymous sites. These results suggest that positive selection operates on antigenic sites of the HA1 gene of human influenza A viruses (H1 subtype). PMID:8078892

  18. 40 CFR 174.3 - Definitions.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ..., flowers, and pollen. Noncoding, nonexpressed nucleotide sequences means the nucleotide sequences are not... surgical alteration of the plant pistil, bud pollination, mentor pollen, immunosuppressants, in vitro...

  19. 40 CFR 174.3 - Definitions.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ..., flowers, and pollen. Noncoding, nonexpressed nucleotide sequences means the nucleotide sequences are not... surgical alteration of the plant pistil, bud pollination, mentor pollen, immunosuppressants, in vitro...

  20. 40 CFR 174.3 - Definitions.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ..., flowers, and pollen. Noncoding, nonexpressed nucleotide sequences means the nucleotide sequences are not... surgical alteration of the plant pistil, bud pollination, mentor pollen, immunosuppressants, in vitro...

  1. 40 CFR 174.3 - Definitions.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ..., flowers, and pollen. Noncoding, nonexpressed nucleotide sequences means the nucleotide sequences are not... surgical alteration of the plant pistil, bud pollination, mentor pollen, immunosuppressants, in vitro...

  2. 40 CFR 174.3 - Definitions.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ..., flowers, and pollen. Noncoding, nonexpressed nucleotide sequences means the nucleotide sequences are not... surgical alteration of the plant pistil, bud pollination, mentor pollen, immunosuppressants, in vitro...

  3. Genome sequences of a mouse-avirulent and a mouse-virulent strain of Ross River virus.

    PubMed

    Faragher, S G; Meek, A D; Rice, C M; Dalgarno, L

    1988-04-01

    The nucleotide sequence of the genomic RNA of a mouse-avirulent strain of Ross River virus, RRV NB5092 (isolated in 1969), has been determined and the corresponding sequence for the prototype mouse-virulent strain, RRV T48 (isolated in 1959), has been completed. The RRV NB5092 genome is approximately 11,674 nucleotides in length, compared with 11,853 nucleotides for RRV T48. RRV NB5092 and RRV T48 have the same genome organization. For both viruses an untranslated region of 80 nucleotides at the 5' end of the genome is followed by a 7440-nucleotide open reading frame which is interrupted after 5586 nucleotides by a single opal termination codon. By homology with other alphaviruses, the 5586-nucleotide open reading frame encodes the nonstructural proteins nsP1, nsP2, and nsP3; a fourth nonstructural protein, nsP4, is produced by read-through of the opal codon. The RRV nonstructural proteins show strong homology with the corresponding proteins of Sindbis virus and Semliki Forest virus in terms of size, net charge, and hydropathy characteristics. However, homology is not uniform between or within the proteins; nsP1, nsP2, and nsP4 contain extended domains which are highly conserved between alphaviruses, while the C-terminal region of nsP3 shows little conservation in sequence or length between alphaviruses. An untranslated "junction" region of 44 nucleotides (for RRV NB5092) or 47 nucleotides (for RRV T48) separates the nonstructural and structural protein coding regions. The structural proteins (capsid-E3-E2-6K-E1) are translated from an open reading frame of 3762 nucleotides which is followed by a 3'-untranslated region of approximately 348 nucleotides (for RRV NB5092) or 524 nucleotides (for RRV T48). Excluding deletions and insertions, the genomes of RRV NB5092 and RRV T48 differ at 284 nucleotides, representing a sequence divergence of 2.38%. Sequence deletions or insertions were found only in the noncoding regions and include a 173-nucleotide deletion in the 3'-untranslated region of RRV NB5092, compared with RRV T48. In the coding regions, most of the nucleotide differences are silent; there are 36 amino acid differences in the nonstructural proteins and 12 in the structural proteins. The distribution of amino acid differences between the two RRV strains correlates with the location of domains which are poorly conserved in sequence between alphaviruses. The possible role of amino acid differences in envelope glycoproteins E1 and E2 in determining the different antigenic and biological properties of RRV NB5092 and RRV T48 is discussed.

  4. The complete nucleotide sequence of the glnALG operon of Escherichia coli K12.

    PubMed Central

    Miranda-Ríos, J; Sánchez-Pescador, R; Urdea, M; Covarrubias, A A

    1987-01-01

    The nucleotide sequence of the E. coli glnALG operon has been determined. The glnL (ntrB) and glnG (ntrC) genes present a high homology, at the nucleotide and aminoacid levels, with the corresponding genes of Klebsiella pneumoniae. The predicted aminoacid sequence for glutamine synthetase allowed us to locate some of the enzyme domains. The structure of this operon is discussed. PMID:2882477

  5. The nucleotide sequences of 5S rRNAs from a rotifer, Brachionus plicatilis, and two nematodes, Rhabditis tokai and Caenorhabditis elegans.

    PubMed

    Kumazaki, T; Hori, H; Osawa, S; Ishii, N; Suzuki, K

    1982-11-11

    The nucleotide sequences of 5S rRNAs from a rotifer, Brachionus plicatilis, and two nematodes, Rhabditis tokai and Caenorhabditis elegans have been determined. The rotifer has two 5S rRNA species that are composed of 120 and 121 nucleotides, respectively. The sequences of these two 5S rRNAs are the same except that the latter has an additional base at its 3'-terminus. The 5S rRNAs from the two nematode species are both 119 nucleotides long. The sequence similarity percents are 79% (Brachionus/Rhabditis), 80% (Brachionus/Caenorhabditis), and 95% (Rhabditis/Caenorhabditis) among these three species. Brachionus revealed the highest similarity to Lingula (89%), but not to the nematodes (79%).

  6. Setting up a probe based, closed tube real-time PCR assay for focused detection of variable sequence alterations.

    PubMed

    Becságh, Péter; Szakács, Orsolya

    2014-10-01

    During diagnostic workflow when detecting sequence alterations, sometimes it is important to design an algorithm that includes screening and direct tests in combination. Normally the use of direct test, which is mainly sequencing, is limited. There is an increased need for effective screening tests, with "closed tube" during the whole process and therefore decreasing the risk of PCR product contamination. The aim of this study was to design such a closed tube, detection probe based screening assay to detect different kind of sequence alterations in the exon 11 of the human c-kit gene region. Inside this region there are variable possible deletions and single nucleotide changes. During assay setup, more probe chemistry formats were screened and tested. After some optimization steps the taqman probe format was selected.

  7. Optimization of sequence alignment for simple sequence repeat regions.

    PubMed

    Jighly, Abdulqader; Hamwieh, Aladdin; Ogbonnaya, Francis C

    2011-07-20

    Microsatellites, or simple sequence repeats (SSRs), are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs) mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs).SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type.When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic phylogenic relationship.

  8. The EMBL nucleotide sequence database

    PubMed Central

    Stoesser, Guenter; Baker, Wendy; van den Broek, Alexandra; Camon, Evelyn; Garcia-Pastor, Maria; Kanz, Carola; Kulikova, Tamara; Lombard, Vincent; Lopez, Rodrigo; Parkinson, Helen; Redaschi, Nicole; Sterk, Peter; Stoehr, Peter; Tuli, Mary Ann

    2001-01-01

    The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. PMID:11125039

  9. Interactive computer programs for the graphic analysis of nucleotide sequence data.

    PubMed Central

    Luckow, V A; Littlewood, R K; Rownd, R H

    1984-01-01

    A group of interactive computer programs have been developed which aid in the collection and graphical analysis of nucleotide and protein sequence data. The programs perform the following basic functions: a) enter, edit, list, and rearrange sequence data; b) permit automatic entry of nucleotide sequence data directly from an autoradiograph into the computer; c) search for restriction sites or other specified patterns and plot a linear or circular restriction map, or print their locations; d) plot base composition; e) analyze homology between sequences by plotting a two-dimensional graphic matrix; and f) aid in plotting predicted secondary structures of RNA molecules. PMID:6546437

  10. Optimizing complex phenotypes through model-guided multiplex genome engineering

    DOE PAGES

    Kuznetsov, Gleb; Goodman, Daniel B.; Filsinger, Gabriel T.; ...

    2017-05-25

    Here, we present a method for identifying genomic modifications that optimize a complex phenotype through multiplex genome engineering and predictive modeling. We apply our method to identify six single nucleotide mutations that recover 59% of the fitness defect exhibited by the 63-codon E. coli strain C321.ΔA. By introducing targeted combinations of changes in multiplex we generate rich genotypic and phenotypic diversity and characterize clones using whole-genome sequencing and doubling time measurements. Regularized multivariate linear regression accurately quantifies individual allelic effects and overcomes bias from hitchhiking mutations and context-dependence of genome editing efficiency that would confound other strategies.

  11. Optimizing complex phenotypes through model-guided multiplex genome engineering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kuznetsov, Gleb; Goodman, Daniel B.; Filsinger, Gabriel T.

    Here, we present a method for identifying genomic modifications that optimize a complex phenotype through multiplex genome engineering and predictive modeling. We apply our method to identify six single nucleotide mutations that recover 59% of the fitness defect exhibited by the 63-codon E. coli strain C321.ΔA. By introducing targeted combinations of changes in multiplex we generate rich genotypic and phenotypic diversity and characterize clones using whole-genome sequencing and doubling time measurements. Regularized multivariate linear regression accurately quantifies individual allelic effects and overcomes bias from hitchhiking mutations and context-dependence of genome editing efficiency that would confound other strategies.

  12. Nucleotide sequence analysis establishes the role of endogenous murine leukemia virus DNA segments in formation of recombinant mink cell focus-forming murine leukemia viruses.

    PubMed Central

    Khan, A S

    1984-01-01

    The sequence of 363 nucleotides near the 3' end of the pol gene and 564 nucleotides from the 5' terminus of the env gene in an endogenous murine leukemia viral (MuLV) DNA segment, cloned from AKR/J mouse DNA and designated as A-12, was obtained. For comparison, the nucleotide sequence in an analogous portion of AKR mink cell focus-forming (MCF) 247 MuLV provirus was also determined. Sequence features unique to MCF247 MuLV DNA in the 3' pol and 5' env regions were identified by comparison with nucleotide sequences in analogous regions of NFS -Th-1 xenotropic and AKR ecotropic MuLV proviruses. These included (i) an insertion of 12 base pairs encoding four amino acids located 60 base pairs from the 3' terminus of the pol gene and immediately preceding the env gene, (ii) the deletion of 12 base pairs (encoding four amino acids) and the insertion of 3 base pairs (encoding one amino acid) in the 5' portion of the env gene, and (iii) single base substitutions resulting in 2 MCF247 -specific amino acids in the 3' pol and 23 in the 5' env regions. Nucleotide sequence comparison involving the 3' pol and 5' env regions of AKR MCF247 , NFS xenotropic, and AKR ecotropic MuLV proviruses with the cloned endogenous MuLV DNA indicated that MCF247 proviral DNA sequences were conserved in the cloned endogenous MuLV proviral segment. In fact, total nucleotide sequence identity existed between the endogenous MuLV DNA and the MCF247 MuLV provirus in the 3' portion of the pol gene. In the 5' env region, only 4 of 564 nucleotides were different, resulting in three amino acid changes between AKR MCF247 MuLV DNA and the endogenous MuLV DNA present in clone A-12. In addition, nucleotide sequence comparison indicated that Moloney-and Friend-MCF MuLVs were also highly related in the 3' pol and 5' env regions to the cloned endogenous MuLV DNA. These results establish the role of endogenous MuLV DNA segments in generation of recombinant MCF viruses. PMID:6328017

  13. Complete nucleotide sequence and genome organization of a novel allexivirus from alfalfa (Medicago sativa)

    USDA-ARS?s Scientific Manuscript database

    A new species of the family Alphaflexiviridae provisionally named Alfalfa virus S (AVS) was diagnosed in alfalfa samples originating from Sudan. A complete nucleotide sequence of the viral genome consisting of 8,349 nucleotides excluding the 3’ poly(A) tail was determined by Illumina NGS technology ...

  14. The primary structure of the thymidine kinase gene of fish lymphocystis disease virus.

    PubMed

    Schnitzler, P; Handermann, M; Szépe, O; Darai, G

    1991-06-01

    The DNA nucleotide sequence of the thymidine kinase (TK) gene of fish lymphocystis disease virus (FLDV) which has been localized between the coordinates 0.678 to 0.688 of the viral genome was determined. The analysis of the DNA nucleotide sequence located between the recognition sites of HindIII (0.669 map unit; nucleotide position 1) and AccI (nucleotide position 2032) revealed the presence of an open reading frame of 954 bp on the lower strand of this region between nucleotide positions 1868 (ATG) and 915 (TAA). It encodes for a protein of 318 amino acid residues. The evolutionary relationships of the TK gene of FLDV to the other known TK genes was investigated using the method of progressive sequence alignment. These analyses revealed a high degree of diversity between the protein sequence of FLDV TK gene and the amino acid composition of other TKs tested. However, significant conservations were detected at several regions of amino acid residues of the FLDV TK protein when compared to the amino acid sequence of TKs of African swine fever virus, fowlpox virus, shope fibroma virus, and vaccinia virus and to the amino acid sequences of the cellular cytoplasmic TK of chicken, mouse, and man.

  15. Sequence diversity within the reovirus S2 gene: reovirus genes reassort in nature, and their termini are predicted to form a panhandle motif.

    PubMed Central

    Chapell, J D; Goral, M I; Rodgers, S E; dePamphilis, C W; Dermody, T S

    1994-01-01

    To better understand genetic diversity within mammalian reoviruses, we determined S2 nucleotide and deduced sigma 2 amino acid sequences of nine reovirus strains and compared these sequences with those of prototype strains of the three reovirus serotypes. The S2 gene and sigma 2 protein are highly conserved among the four type 1, one type 2, and seven type 3 strains studied. Phylogenetic analyses based on S2 nucleotide sequences of the 12 reovirus strains indicate that diversity within the S2 gene is independent of viral serotype. Additionally, we found marked topological differences between phylogenetic trees generated from S1 and S2 gene nucleotide sequences of the seven type 3 strains. These results demonstrate that reovirus S1 and S2 genes have distinct evolutionary histories, thus providing phylogenetic evidence for lateral transfer of reovirus genes in nature. When variability among the 12 sigma 2-encoding S2 nucleotide sequences was analyzed at synonymous positions, we found that approximately 60 nucleotides at the 5' terminus and 30 nucleotides at the 3' terminus were markedly conserved in comparison with other sigma 2-encoding regions of S2. Predictions of RNA secondary structures indicate that the more conserved S2 sequences participate in the formation of an extended region of duplex RNA interrupted by a pair of stem-loops. Among the 12 deduced sigma 2 amino acid sequences examined, substitutions were observed at only 11% of amino acid positions. This finding suggests that constraints on the structure or function of sigma 2, perhaps in part because of its location in the virion core, have limited sequence diversity within this protein. PMID:8289378

  16. Optimization of subculture and DNA extraction steps within the whole genome sequencing workflow for source tracking of Salmonella enterica and Listeria monocytogenes.

    PubMed

    Gimonet, Johan; Portmann, Anne-Catherine; Fournier, Coralie; Baert, Leen

    2018-06-16

    This work shows that an incubation time reduced to 4-5 h to prepare a culture for DNA extraction followed by an automated DNA extraction can shorten the hands-on time, the turnaround time by 30% and increase the throughput while maintaining the WGS quality assessed by high quality Single Nucleotide Polymorphism analysis. Copyright © 2018. Published by Elsevier B.V.

  17. DNA sequence analysis of simian virus 40 mutants with deletions mapping in the leader region of the late viral mRNA's: mutants with deletions similar in size and position exhibit varied phenotypes.

    PubMed

    Barkan, A; Mertz, J E

    1981-02-01

    The nucleotide sequences of 10 viable yet partially defective deletion mutants of simian virus 40 were determined. The deletions mapped within, and, in many cases, 5' to, the predominant leader sequence of the late viral mRNA's. They ranged from 74 to 187 nucleotide pairs in length. Six of the mutants had lost the sequence that corresponds to the "cap" site (5' terminus) of the most abundant class of 16S mRNA's. One of these mutants had a deletion that extended 103 nucleotide pairs into the region preceding this primary cap site and, therefore, was missing many secondary cap sites as well. A seventh mutant lacked the entire major 16S leader sequence except for the first six nucleotides at its 5' end and the last nine at its 3' end. Although these mutants differed in the size and position of their deletions, we were unable to discover any simple correlations between their growth characteristics and their DNA sequences. This finding indicates that the secondary structures of the RNA transcripts may play a more important role than the exact nucleotide sequence of the RNAs in determining how they function within the cell.

  18. Optimization of a Low Cost and Broadly Sensitive Genotyping Assay for HIV-1 Drug Resistance Surveillance and Monitoring in Resource-Limited Settings

    PubMed Central

    Zhou, Zhiyong; Wagar, Nick; DeVos, Joshua R.; Rottinghaus, Erin; Diallo, Karidia; Nguyen, Duc B.; Bassey, Orji; Ugbena, Richard; Wadonda-Kabondo, Nellie; McConnell, Michelle S.; Zulu, Isaac; Chilima, Benson; Nkengasong, John; Yang, Chunfu

    2011-01-01

    Commercially available HIV-1 drug resistance (HIVDR) genotyping assays are expensive and have limitations in detecting non-B subtypes and circulating recombinant forms that are co-circulating in resource-limited settings (RLS). This study aimed to optimize a low cost and broadly sensitive in-house assay in detecting HIVDR mutations in the protease (PR) and reverse transcriptase (RT) regions of pol gene. The overall plasma genotyping sensitivity was 95.8% (N = 96). Compared to the original in-house assay and two commercially available genotyping systems, TRUGENE® and ViroSeq®, the optimized in-house assay showed a nucleotide sequence concordance of 99.3%, 99.6% and 99.1%, respectively. The optimized in-house assay was more sensitive in detecting mixture bases than the original in-house (N = 87, P<0.001) and TRUGENE® and ViroSeq® assays. When the optimized in-house assay was applied to genotype samples collected for HIVDR surveys (N = 230), all 72 (100%) plasma and 69 (95.8%) of the matched dried blood spots (DBS) in the Vietnam transmitted HIVDR survey were genotyped and nucleotide sequence concordance was 98.8%; Testing of treatment-experienced patient plasmas with viral load (VL) ≥ and <3 log10 copies/ml from the Nigeria and Malawi surveys yielded 100% (N = 46) and 78.6% (N = 14) genotyping rates, respectively. Furthermore, all 18 matched DBS stored at room temperature from the Nigeria survey were genotyped. Phylogenetic analysis of the 236 sequences revealed that 43.6% were CRF01_AE, 25.9% subtype C, 13.1% CRF02_AG, 5.1% subtype G, 4.2% subtype B, 2.5% subtype A, 2.1% each subtype F and unclassifiable, 0.4% each CRF06_CPX, CRF07_BC and CRF09_CPX. Conclusions The optimized in-house assay is broadly sensitive in genotyping HIV-1 group M viral strains and more sensitive than the original in-house, TRUGENE® and ViroSeq® in detecting mixed viral populations. The broad sensitivity and substantial reagent cost saving make this assay more accessible for RLS where HIVDR surveillance is recommended to minimize the development and transmission of HIVDR. PMID:22132237

  19. Antifungal polypeptides

    DOEpatents

    Altier, Daniel J.; Dahlbacka, Glen; Ellanskaya, legal representative, Natalia; Herrmann, Rafael; Hunter-Cevera, Jennie; McCutchen, Billy F.; Presnail, James K.; Rice, Janet A.; Schepers, Eric; Simmons, Carl R.; Torok, Tamas; Yalpani, Nasser; Ellanskaya, deceased, Irina

    2007-12-11

    Compositions and methods for protecting a plant from a pathogen, particularly a fungal pathogen, are provided. Compositions include novel amino acid sequences, and variants and fragments thereof, for antipathogenic polypeptides that were isolated from microbial fermentation broths. Nucleic acid molecules comprising nucleotide sequences that encode the antipathogenic polypeptides of the invention are also provided. A method for inducing pathogen resistance in a plant using the nucleotide sequences disclosed herein is further provided. The method comprises introducing into a plant an expression cassette comprising a promoter operably linked to a nucleotide sequence that encodes an antipathogenic polypeptide of the invention. Compositions comprising an antipathogenic polypeptide or a transformed microorganism comprising a nucleic acid of the invention in combination with a carrier and methods of using these compositions to protect a plant from a pathogen are further provided. Transformed plants, plant cells, seeds, and microorganisms comprising a nucleotide sequence that encodes an antipathogenic polypeptide of the invention, or variant or fragment thereof, are also disclosed.

  20. Antifungal polypeptides

    DOEpatents

    Altier, Daniel J.; Dahlbacka, Glen; Elleskaya, Irina; Ellanskaya, legal representative; Natalia; Herrmann, Rafael; Hunter-Cevera, Jennie; McCutchen, Billy F.; Presnail, James K.; Rice, Janet A.; Schepers, Eric; Simmons, Carl R.; Torok, Tamas; Yalpani, Nasser

    2010-08-10

    Compositions and methods for protecting a plant from a pathogen, particularly a fungal pathogen, are provided. Compositions include novel amino acid sequences, and variants and fragments thereof, for antipathogenic polypeptides that were isolated from microbial fermentation broths. Nucleic acid molecules comprising nucleotide sequences that encode the antipathogenic polypeptides of the invention are also provided. A method for inducing pathogen resistance in a plant using the nucleotide sequences disclosed herein is further provided. The method comprises introducing into a plant an expression cassette comprising a promoter operably linked to a nucleotide sequence that encodes an antipathogenic polypeptide of the invention. Compositions comprising an antipathogenic polypeptide or a transformed microorganism comprising a nucleic acid of the invention in combination with a carrier and methods of using these compositions to protect a plant from a pathogen are further provided. Transformed plants, plant cells, seeds, and microorganisms comprising a nucleotide sequence that encodes an antipathogenic polypeptide of the invention, or variant or fragment thereof, are also disclosed.

  1. Antifungal polypeptides

    DOEpatents

    Altier, Daniel J [Waukee, IA; Dahlbacka, Glen [Oakland, CA; Elleskaya, Irina [Kyiv, UA; Ellanskaya, legal representative, Natalia; Herrmann, Rafael [Wilmington, DE; Hunter-Cevera, Jennie [Elliott City, MD; McCutchen, Billy F [College Station, IA; Presnail, James K [Avondale, PA; Rice, Janet A [Wilmington, DE; Schepers, Eric [Port Deposit, MD; Simmons, Carl R [Des Moines, IA; Torok, Tamas [Richmond, CA; Yalpani, Nasser [Johnston, IA

    2011-04-12

    Compositions and methods for protecting a plant from a pathogen, particularly a fungal pathogen, are provided. Compositions include novel amino acid sequences, and variants and fragments thereof, for antipathogenic polypeptides that were isolated from microbial fermentation broths. Nucleic acid molecules comprising nucleotide sequences that encode the antipathogenic polypeptides of the invention are also provided. A method for inducing pathogen resistance in a plant using the nucleotide sequences disclosed herein is further provided. The method comprises introducing into a plant an expression cassette comprising a promoter operably linked to a nucleotide sequence that encodes an antipathogenic polypeptide of the invention. Compositions comprising an antipathogenic polypeptide or a transformed microorganism comprising a nucleic acid of the invention in combination with a carrier and methods of using these compositions to protect a plant from a pathogen are further provided. Transformed plants, plant cells, seeds, and microorganisms comprising a nucleotide sequence that encodes an antipathogenic polypeptide of the invention, or variant or fragment thereof, are also disclosed.

  2. Antifungal polypeptides

    DOEpatents

    Altier, Daniel J [Granger, IA; Dahlbacka, Glen [Oakland, CA; Ellanskaya, Irina [Kyiv, UA; Ellanskaya, legal representative, Natalia; Herrmann, Rafael [Wilmington, DE; Hunter-Cevera, Jennie [Elliott City, MD; McCutchen, Billy F [College Station, TX; Presnail, James K [Avondale, PA; Rice, Janet A [Wilmington, DE; Schepers, Eric [Port Deposit, MD; Simmons, Carl R [Des Moines, IA; Torok, Tamas [Richmond, CA; Yalpani, Nasser [Johnston, IA

    2012-04-03

    Compositions and methods for protecting a plant from a pathogen, particularly a fungal pathogen, are provided. Compositions include novel amino acid sequences, and variants and fragments thereof, for antipathogenic polypeptides that were isolated from microbial fermentation broths. Nucleic acid molecules comprising nucleotide sequences that encode the antipathogenic polypeptides of the invention are also provided. A method for inducing pathogen resistance in a plant using the nucleotide sequences disclosed herein is further provided. The method comprises introducing into a plant an expression cassette comprising a promoter operably linked to a nucleotide sequence that encodes an antipathogenic polypeptide of the invention. Compositions comprising an antipathogenic polypeptide or a transformed microorganism comprising a nucleic acid of the invention in combination with a carrier and methods of using these compositions to protect a plant from a pathogen are further provided. Transformed plants, plant cells, seeds, and microorganisms comprising a nucleotide sequence that encodes an antipathogenic polypeptide of the invention, or variant or fragment thereof, are also disclosed.

  3. Intercalation of XR5944 with the estrogen response element is modulated by the tri-nucleotide spacer sequence between half-sites

    PubMed Central

    Sidell, Neil; Mathad, Raveendra I.; Shu, Feng-jue; Zhang, Zhenjiang; Kallen, Caleb B.; Yang, Danzhou

    2011-01-01

    DNA-intercalating molecules can impair DNA replication, DNA repair, and gene transcription. We previously demonstrated that XR5944, a DNA bis-intercalator, specifically blocks binding of estrogen receptor-α (ERα) to the consensus estrogen response element (ERE). The consensus ERE sequence is AGGTCAnnnTGACCT, where nnn is known as the tri-nucleotide spacer. Recent work has shown that the tri-nucleotide spacer can modulate ERα-ERE binding affinity and ligand-mediated transcriptional responses. To further understand the mechanism by which XR5944 inhibits ERα-ERE binding, we tested its ability to interact with consensus EREs with variable tri-nucleotide spacer sequences and with natural but non-consensus ERE sequences using one dimensional nuclear magnetic resonance (1D 1H NMR) titration studies. We found that the tri-nucleotide spacer sequence significantly modulates the binding of XR5944 to EREs. Of the sequences that were tested, EREs with CGG and AGG spacers showed the best binding specificity with XR5944, while those spaced with TTT demonstrated the least specific binding. The binding stoichiometry of XR5944 with EREs was 2:1, which can explain why the spacer influences the drug-DNA interaction; each XR5944 spans four nucleotides (including portions of the spacer) when intercalating with DNA. To validate our NMR results, we conducted functional studies using reporter constructs containing consensus EREs with tri-nucleotide spacers CGG, CTG, and TTT. Results of reporter assays in MCF-7 cells indicated that XR5944 was significantly more potent in inhibiting the activity of CGG- than TTT-spaced EREs, consistent with our NMR results. Taken together, these findings predict that the anti-estrogenic effects of XR5944 will depend not only on ERE half-site composition but also on the tri-nucleotide spacer sequence of EREs located in the promoters of estrogen-responsive genes. PMID:21333738

  4. Typing of canine parvovirus isolates using mini-sequencing based single nucleotide polymorphism analysis.

    PubMed

    Naidu, Hariprasad; Subramanian, B Mohana; Chinchkar, Shankar Ramchandra; Sriraman, Rajan; Rana, Samir Kumar; Srinivasan, V A

    2012-05-01

    The antigenic types of canine parvovirus (CPV) are defined based on differences in the amino acids of the major capsid protein VP2. Type specificity is conferred by a limited number of amino acid changes and in particular by few nucleotide substitutions. PCR based methods are not particularly suitable for typing circulating variants which differ in a few specific nucleotide substitutions. Assays for determining SNPs can detect efficiently nucleotide substitutions and can thus be adapted to identify CPV types. In the present study, CPV typing was performed by single nucleotide extension using the mini-sequencing technique. A mini-sequencing signature was established for all the four CPV types (CPV2, 2a, 2b and 2c) and feline panleukopenia virus. The CPV typing using the mini-sequencing reaction was performed for 13 CPV field isolates and the two vaccine strains available in our repository. All the isolates had been typed earlier by full-length sequencing of the VP2 gene. The typing results obtained from mini-sequencing matched completely with that of sequencing. Typing could be achieved with less than 100 copies of standard plasmid DNA constructs or ≤10¹ FAID₅₀ of virus by mini-sequencing technique. The technique was also efficient for detecting multiple types in mixed infections. Copyright © 2012 Elsevier B.V. All rights reserved.

  5. The nucleotide sequences of 5S rRNAs from a rotifer, Brachionus plicatilis, and two nematodes, Rhabditis tokai and Caenorhabditis elegans.

    PubMed Central

    Kumazaki, T; Hori, H; Osawa, S; Ishii, N; Suzuki, K

    1982-01-01

    The nucleotide sequences of 5S rRNAs from a rotifer, Brachionus plicatilis, and two nematodes, Rhabditis tokai and Caenorhabditis elegans have been determined. The rotifer has two 5S rRNA species that are composed of 120 and 121 nucleotides, respectively. The sequences of these two 5S rRNAs are the same except that the latter has an additional base at its 3'-terminus. The 5S rRNAs from the two nematode species are both 119 nucleotides long. The sequence similarity percents are 79% (Brachionus/Rhabditis), 80% (Brachionus/Caenorhabditis), and 95% (Rhabditis/Caenorhabditis) among these three species. Brachionus revealed the highest similarity to Lingula (89%), but not to the nematodes (79%). PMID:6891053

  6. Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.

  7. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  8. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  9. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  10. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  11. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  12. Biological nanopore MspA for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Manrao, Elizabeth A.

    Unlocking the information hidden in the human genome provides insight into the inner workings of complex biological systems and can be used to greatly improve health-care. In order to allow for widespread sequencing, new technologies are required that provide fast and inexpensive readings of DNA. Nanopore sequencing is a third generation DNA sequencing technology that is currently being developed to fulfill this need. In nanopore sequencing, a voltage is applied across a small pore in an electrolyte solution and the resulting ionic current is recorded. When DNA passes through the channel, the ionic current is partially blocked. If the DNA bases uniquely modulate the ionic current flowing through the channel, the time trace of the current can be related to the sequence of DNA passing through the pore. There are two main challenges to realizing nanopore sequencing: identifying a pore with sensitivity to single nucleotides and controlling the translocation of DNA through the pore so that the small single nucleotide current signatures are distinguishable from background noise. In this dissertation, I explore the use of Mycobacterium smegmatis porin A (MspA) for nanopore sequencing. In order to determine MspA's sensitivity to single nucleotides, DNA strands of various compositions are held in the pore as the resulting ionic current is measured. DNA is immobilized in MspA by attaching it to a large molecule which acts as an anchor. This technique confirms the single nucleotide resolution of the pore and additionally shows that MspA is sensitive to epigenetic modifications and single nucleotide polymorphisms. The forces from the electric field within MspA, the effective charge of nucleotides, and elasticity of DNA are estimated using a Freely Jointed Chain model of single stranded DNA. These results offer insight into the interactions of DNA within the pore. With the nucleotide sensitivity of MspA confirmed, a method is introduced to controllably pass DNA through the pore. Using a DNA polymerase, DNA strands are stepped through MspA one nucleotide at a time. The steps are observable as distinct levels on the ionic-current time-trace and are related to the DNA sequence. These experiments overcome the two fundamental challenges to realizing MspA nanopore sequencing and pave the way to the development of a commercial technology.

  13. Improving the prospects of cleavage-based nanopore sequencing engines

    NASA Astrophysics Data System (ADS)

    Brady, Kyle T.; Reiner, Joseph E.

    2015-08-01

    Recently proposed methods for DNA sequencing involve the use of cleavage-based enzymes attached to the opening of a nanopore. The idea is that DNA interacting with either an exonuclease or polymerase protein will lead to a small molecule being cleaved near the mouth of the nanopore, and subsequent entry into the pore will yield information about the DNA sequence. The prospects for this approach seem promising, but it has been shown that diffusion related effects impose a limit on the capture probability of molecules by the pore, which limits the efficacy of the technique. Here, we revisit the problem with the goal of optimizing the capture probability via a step decrease in the nucleotide diffusion coefficient between the pore and bulk solutions. It is shown through random walk simulations and a simplified analytical model that decreasing the molecule's diffusion coefficient in the bulk relative to its value in the pore increases the nucleotide capture probability. Specifically, we show that at sufficiently high applied transmembrane potentials (≥100 mV), increasing the potential by a factor f is equivalent to decreasing the diffusion coefficient ratio Dbulk/Dpore by the same factor f. This suggests a promising route toward implementation of cleavage-based sequencing protocols. We also discuss the feasibility of forming a step function in the diffusion coefficient across the pore-bulk interface.

  14. Identifying Likely Transmission Pathways within a 10-Year Community Outbreak of Tuberculosis by High-Depth Whole Genome Sequencing

    PubMed Central

    Sadsad, Rosemarie; Martinez, Elena; Jelfs, Peter; Hill-Cawthorne, Grant A.; Gilbert, Gwendolyn L.; Marais, Ben J.; Sintchenko, Vitali

    2016-01-01

    Background Improved tuberculosis control and the need to contain the spread of drug-resistant strains provide a strong rationale for exploring tuberculosis transmission dynamics at the population level. Whole-genome sequencing provides optimal strain resolution, facilitating detailed mapping of potential transmission pathways. Methods We sequenced 22 isolates from a Mycobacterium tuberculosis cluster in New South Wales, Australia, identified during routine 24-locus mycobacterial interspersed repetitive unit typing. Following high-depth paired-end sequencing using the Illumina HiSeq 2000 platform, two independent pipelines were employed for analysis, both employing read mapping onto reference genomes as well as de novo assembly, to control biases in variant detection. In addition to single-nucleotide polymorphisms, the analyses also sought to identify insertions, deletions and structural variants. Results Isolates were highly similar, with a distance of 13 variants between the most distant members of the cluster. The most sensitive analysis classified the 22 isolates into 18 groups. Four of the isolates did not appear to share a recent common ancestor with the largest clade; another four isolates had an uncertain ancestral relationship with the largest clade. Conclusion Whole genome sequencing, with analysis of single-nucleotide polymorphisms, insertions, deletions, structural variants and subpopulations, enabled the highest possible level of discrimination between cluster members, clarifying likely transmission pathways and exposing the complexity of strain origin. The analysis provides a basis for targeted public health intervention and enhanced classification of future isolates linked to the cluster. PMID:26938641

  15. Structurally complex and highly active RNA ligases derived from random RNA sequences

    NASA Technical Reports Server (NTRS)

    Ekland, E. H.; Szostak, J. W.; Bartel, D. P.

    1995-01-01

    Seven families of RNA ligases, previously isolated from random RNA sequences, fall into three classes on the basis of secondary structure and regiospecificity of ligation. Two of the three classes of ribozymes have been engineered to act as true enzymes, catalyzing the multiple-turnover transformation of substrates into products. The most complex of these ribozymes has a minimal catalytic domain of 93 nucleotides. An optimized version of this ribozyme has a kcat exceeding one per second, a value far greater than that of most natural RNA catalysts and approaching that of comparable protein enzymes. The fact that such a large and complex ligase emerged from a very limited sampling of sequence space implies the existence of a large number of distinct RNA structures of equivalent complexity and activity.

  16. First report of Beet western yellows virus infecting Epiphyllum spp

    USDA-ARS?s Scientific Manuscript database

    Beet western yellow virus (BWYV) was identified from an orchid cactus (Epiphyllum spp.) hybrid without obvious symptoms by high-throughput sequencing. The nearly complete genomic sequence of 5,458 nucleotides of the virus was determined. The isolate has the highest nucleotide sequence identity (93%)...

  17. A new single-nucleotide polymorphism database for rainbow trout generated through whole genome re-sequencing

    USDA-ARS?s Scientific Manuscript database

    Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout, SNP discovery has been done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL), RNA sequencing, and whole...

  18. Sequence analysis of the internal transcribed spacer (ITS) region reveals a novel clade of Ichthyophonus sp. from rainbow trout

    USGS Publications Warehouse

    Rasmussen, C.; Purcell, M.K.; Gregg, J.L.; LaPatra, S.E.; Winton, J.R.; Hershberger, P.K.

    2010-01-01

    The mesomycetozoean parasite Ichthyophonus hoferi is most commonly associated with marine fish hosts but also occurs in some components of the freshwater rainbow trout Oncorhynchus mykiss aquaculture industry in Idaho, USA. It is not certain how the parasite was introduced into rainbow trout culture, but it might have been associated with the historical practice of feeding raw, ground common carp Cyprinus carpio that were caught by commercial fisherman. Here, we report a major genetic division between west coast freshwater and marine isolates of Ichthyophonus hoferi. Sequence differences were not detected in 2 regions of the highly conserved small subunit (18S) rDNA gene; however, nucleotide variation was seen in internal transcribed spacer loci (ITS1 and ITS2), both within and among the isolates. Intra-isolate variation ranged from 2.4 to 7.6 nucleotides over a region consisting of ~740 bp. Majority consensus sequences from marine/anadromous hosts differed in only 0 to 3 nucleotides (99.6 to 100% nucleotide identity), while those derived from freshwater rainbow trout had no nucleotide substitutions relative to each other. However, the consensus sequences between isolates from freshwater rainbow trout and those from marine/anadromous hosts differed in 13 to 16 nucleotides (97.8 to 98.2% nucleotide identity).

  19. [Study on the genetic difference of SEO type Hantaviruses].

    PubMed

    Zhang, X; Zhou, S; Wang, H; Hu, J; Guan, Z; Liu, H

    2000-10-01

    To understand the genetic type of Hantaviruses and the difference between them caused by rodents in Beijing and to furhter explore the source of the infectious factors. Hantavirus RNA, isolated from lungs of rodents captured in Beijing and positive with Hantavirus antigens with frozen sectioning and Immunofluorescent assay, were reverse-transcribed and amplified with PCR with Hantavirus-specific primers. Five of the PCR amplifications were discovered and sequenced with 300 bp sequence data of M segments (from 2003 - 2302nt according cDNA of seoul 8039 strain). Nucleotide sequence homology showed that they were sequences of SEO-type Hantavirus. Compared with SEO type Hantavirus, the nucleotide sequence homology of these samples was more than 94% while the homology of amonia acid sequence was more than 98%. When compared with HNT type Hantavirus, the homology of nucleotide sequence became less than 72% with the homology of amonia acid sequence less than 81%. Similar to other Hantavirus of SEO type, their nucleotide sequences and deduced amino acid sequences were highly preserved. Phylogenetic tree analysis showed that the five viruses could be divided into at least 4 branches. It was quite likely that there were at least two sub-type SEO viruses with 4 branches that were circulating in Beijing.

  20. Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples.

    PubMed

    Pettengill, James B; Pightling, Arthur W; Baugher, Joseph D; Rand, Hugh; Strain, Errol

    2016-01-01

    The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging due to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). When analyzing empirical data (whole-genome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.

  1. Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples

    DOE PAGES

    Pettengill, James B.; Pightling, Arthur W.; Baugher, Joseph D.; ...

    2016-11-10

    The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging duemore » to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). Finally, when analyzing empirical data (wholegenome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.« less

  2. Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pettengill, James B.; Pightling, Arthur W.; Baugher, Joseph D.

    The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging duemore » to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). Finally, when analyzing empirical data (wholegenome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.« less

  3. Optimization of the genotyping-by-sequencing strategy for population genomic analysis in conifers.

    PubMed

    Pan, Jin; Wang, Baosheng; Pei, Zhi-Yong; Zhao, Wei; Gao, Jie; Mao, Jian-Feng; Wang, Xiao-Ru

    2015-07-01

    Flexibility and low cost make genotyping-by-sequencing (GBS) an ideal tool for population genomic studies of nonmodel species. However, to utilize the potential of the method fully, many parameters affecting library quality and single nucleotide polymorphism (SNP) discovery require optimization, especially for conifer genomes with a high repetitive DNA content. In this study, we explored strategies for effective GBS analysis in pine species. We constructed GBS libraries using HpaII, PstI and EcoRI-MseI digestions with different multiplexing levels and examined the effect of restriction enzymes on library complexity and the impact of sequencing depth and size selection of restriction fragments on sequence coverage bias. We tested and compared UNEAK, Stacks and GATK pipelines for the GBS data, and then developed a reference-free SNP calling strategy for haploid pine genomes. Our GBS procedure proved to be effective in SNP discovery, producing 7000-11 000 and 14 751 SNPs within and among three pine species, respectively, from a PstI library. This investigation provides guidance for the design and analysis of GBS experiments, particularly for organisms for which genomic information is lacking. © 2014 John Wiley & Sons Ltd.

  4. Sequencing and phylogenetic analysis of tobacco virus 2, a polerovirus from Nicotiana tabacum.

    PubMed

    Zhou, Benguo; Wang, Fang; Zhang, Xuesong; Zhang, Lina; Lin, Huafeng

    2017-07-01

    The complete genome sequence of a new virus, provisionally named tobacco virus 2 (TV2), was determined and identified from leaves of tobacco (Nicotiana tabacum) exhibiting leaf mosaic, yellowing, and deformity, in Anhui Province, China. The genome sequence of TV2 comprises 5,979 nucleotides, with 87% nucleotide sequence identity to potato leafroll virus (PLRV). Its genome organization is similar to that of PLRV, containing six open reading frames (ORFs) that potentially encode proteins with putative functions in cell-to-cell movement and suppression of RNA silencing. Phylogenetic analysis of the nucleotide sequence placed TV2 alongside members of the genus Polerovirus in the family Luteoviridae. To the best our knowledge, this study is the first report of a complete genome sequence of a new polerovirus identified in tobacco.

  5. Complete nucleotide sequences of the coat protein messenger RNAs of brome mosaic virus and cowpea chlorotic mottle virus.

    PubMed Central

    Dasgupta, R; Kaesberg, P

    1982-01-01

    The nucleotide sequences of the subgenomic coat protein messengers (RNA4's) of two related bromoviruses, brome mosaic virus (BMV) and cowpea chlorotic mottle virus (CCMV), have been determined by direct RNA and CDNA sequencing without cloning. BMV RNA4 is 876 b long including a 5' noncoding region of nine nucleotides and a 3' noncoding region of 300 nucleotides. CCMV RNA 4 is 824 b long, including a 5' noncoding region of 10 nucleotides and a 3' noncoding region of 244 nucleotides. The encoded coat proteins are similar in length (188 amino acids for BMV and 189 amino acids for CCMV) and display about 70% homology in their amino acid sequences. Length difference between the two RNAs is due mostly to a single deletion, in CCMV with respect to BMV, of about 57 b immediately following the coding region. Allowing for this deletion the RNAs are indicate that mutations leading to divergence were constrained in the coding region primarily by the requirement of maintaining a favorable coat protein structure and in the 3' noncoding region primarily by the requirement of maintaining a favorable RNA spatial configuration. PMID:6895941

  6. Nucleotide sequence analysis of the 3' terminal region of a wasabi strain of crucifer tobamovirus genomic RNA: subgrouping of crucifer tobamoviruses.

    PubMed

    Shimamoto, I; Sonoda, S; Vazquez, P; Minaka, N; Nishiguchi, M

    1998-01-01

    The 3' terminal 2378 nucleotides of a wasabi strain of crucifer tobamovirus (CTMV-W) infectious to crucifer plants was determined. This includes the 3' non-coding region of 235 nucleotides, coat protein (CP) gene (468 nucleotides), movement protein (MP) gene (798 nucleotides) and C-terminal partial readthrough portion of 180 K protein gene (940 nucleotides). Comparison of the sequence with homologous regions of thirteen other tobamovirus genomes showed that it had much higher identity to those of four other crucifer tobamoviruses, 85.2% to cr-TMV and turnip vein-clearing virus (TVCV), 87.4% to oilseed rape mosaic virus (ORMV) and 87.1% to TMV-Cg, than to those of other tobamoviruses. Thus CTMV-W was most similar to ORMV and TMV-Cg in sequence, but only marginally so, whereas the location and size of its MP gene was the same as cr-TMV amd TVCV. These results, together with other analyses, show that CTMV-W is a new crucifer tobamovirus, that the five crucifer tobamoviruses can be classified into two subgroups based on MP gene organization, and that the rate of sequence change is not the same in all lineages.

  7. Comparative genomic sequence analysis of novel Helicoverpa armigera nucleopolyhedrovirus (NPV) isolated from Kenya and three other previously sequenced Helicoverpa spp. NPVs.

    PubMed

    Ogembo, Javier Gordon; Caoili, Barbara L; Shikata, Masamitsu; Chaeychomsri, Sudawan; Kobayashi, Michihiro; Ikeda, Motoko

    2009-10-01

    A newly cloned Helicoverpa armigera nucleopolyhedrovirus (HearNPV) from Kenya, HearNPV-NNg1, has a higher insecticidal activity than HearNPV-G4, which also exhibits lower insecticidal activity than HearNPV-C1. In the search for genes and/or nucleotide sequences that might be involved in the observed virulence differences among Helicoverpa spp. NPVs, the entire genome of NNg1 was sequenced and compared with previously sequenced genomes of G4, C1 and Helicoverpa zea single-nucleocapsid NPV (Hz). The NNg1 genome was 132,425 bp in length, with a total of 143 putative open reading frames (ORFs), and shared high levels of overall amino acid and nucleotide sequence identities with G4, C1 and Hz. Three NNg1 ORFs, ORF5, ORF100 and ORF124, which were shared with C1, were absent in G4 and Hz, while NNg1 and C1 were missing a homologue of G4/Hz ORF5. Another three ORFs, ORF60 (bro-b), ORF119 and ORF120, and one direct repeat sequence (dr) were unique to NNg1. Relative to the overall nucleotide sequence identity, lower sequence identities were observed between NNg1 hrs and the homologous hrs in the other three Helicoverpa spp. NPVs, despite containing the same number of hrs located at essentially the same positions on the genomes. Differences were also observed between NNg1 and each of the other three Helicoverpa spp. NPVs in the diversity of bro genes encoded on the genomes. These results indicate several putative genes and nucleotide sequences that may be responsible for the virulence differences observed among Helicoverpa spp., yet the specific genes and/or nucleotide sequences responsible have not been identified.

  8. Optimized MOL-PCR for Characterization of Microbial Pathogens.

    PubMed

    Wuyts, Véronique; Roosens, Nancy H C; Bertrand, Sophie; Marchal, Kathleen; De Keersmaecker, Sigrid C J

    2016-01-06

    Characterization of microbial pathogens is necessary for surveillance, outbreak detection, and tracing of outbreak sources. This unit describes a multiplex oligonucleotide ligation-PCR (MOL-PCR) optimized for characterization of microbial pathogens. With MOL-PCR, different types of markers, like unique sequences, single-nucleotide polymorphisms (SNPs) and indels, can be simultaneously analyzed in one assay. This assay consists of a multiplex ligation for detection of the markers, a singleplex PCR for signal amplification, and hybridization to MagPlex-TAG beads for readout on a Luminex platform after fluorescent staining. The current protocol describes the MOL-PCR, as well as methods for DNA isolation, probe design, and data interpretation and it is based on an optimized MOL-PCR assay for subtyping of Salmonella Typhimurium. Copyright © 2016 John Wiley & Sons, Inc.

  9. Predicted stem-loop structures and variation in nucleotide sequence of 3' noncoding regions among animal calicivirus genomes.

    PubMed

    Seal, B S; Neill, J D; Ridpath, J F

    1994-07-01

    Caliciviruses are nonenveloped with a polyadenylated genome of approximately 7.6 kb and a single capsid protein. The "RNA Fold" computer program was used to analyze 3'-terminal noncoding sequences of five feline calicivirus (FCV), rabbit hemorrhagic disease virus (RHDV), and two San Miguel sea lion virus (SMSV) isolates. The FCV 3'-terminal sequences are 40-46 nucleotides in length and 72-91% similar. The FCV sequences were predicted to contain two possible duplex structures and one stem-loop structure with free energies of -2.1 to -18.2 kcal/mole. The RHDV genomic 3'-terminal RNA sequences are 54 nucleotides in length and share 49% sequence similarity to homologous regions of the FCV genome. The RHDV sequence was predicted to form two duplex structures in the 3'-terminal noncoding region with a single stem-loop structure, resembling that of FCV. In contrast, the SMSV 1 and 4 genomic 3'-terminal noncoding sequences were 185 and 182 nucleotides in length, respectively. Ten possible duplex structures were predicted with an average structural free energy of -35 kcal/mole. Sequence similarity between the two SMSV isolates was 75%. Furthermore, extensive cloverleaflike structures are predicted in the 3' noncoding region of the SMSV genome, in contrast to the predicted single stem-loop structures of FCV or RHDV.

  10. Terminal Duplex Stability and Nucleotide Identity Differentially Control siRNA Loading and Activity in RNA Interference

    PubMed Central

    Angart, Phillip A.; Carlson, Rebecca J.; Adu-Berchie, Kwasi

    2016-01-01

    Efficient short interfering RNA (siRNA)-mediated gene silencing requires selection of a sequence that is complementary to the intended target and possesses sequence and structural features that encourage favorable functional interactions with the RNA interference (RNAi) pathway proteins. In this study, we investigated how terminal sequence and structural characteristics of siRNAs contribute to siRNA strand loading and silencing activity and how these characteristics ultimately result in a functionally asymmetric duplex in cultured HeLa cells. Our results reiterate that the most important characteristic in determining siRNA activity is the 5′ terminal nucleotide identity. Our findings further suggest that siRNA loading is controlled principally by the hybridization stability of the 5′ terminus (Nucleotides: 1–2) of each siRNA strand, independent of the opposing terminus. Postloading, RNA-induced silencing complex (RISC)–specific activity was found to be improved by lower hybridization stability in the 5′ terminus (Nucleotides: 3–4) of the loaded siRNA strand and greater hybridization stability toward the 3′ terminus (Nucleotides: 17–18). Concomitantly, specific recognition of the 5′ terminal nucleotide sequence by human Argonaute 2 (Ago2) improves RISC half-life. These findings indicate that careful selection of siRNA sequences can maximize both the loading and the specific activity of the intended guide strand. PMID:27399870

  11. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  12. Porcine insulin receptor substrate 4 (IRS4) gene: cloning, polymorphism and association study

    USDA-ARS?s Scientific Manuscript database

    Using PCR and IPCR techniques we obtained a 4498 bp nucleotide sequence FN424076 encompassing the complete coding sequence of the porcine IRS4 gene and its proximal promoter. The 1269-amino acid porcine protein deduced from the nucleotide sequence shares 92% identity with the human IRS4 and possesse...

  13. The nucleotide sequence of 5S ribosomal RNA from Micrococcus lysodeikticus.

    PubMed Central

    Hori, H; Osawa, S; Murao, K; Ishikura, H

    1980-01-01

    The nucleotide sequence of ribosomal 5S RNA from Micrococcus lysodeikticus is pGUUACGGCGGCUAUAGCGUGGGGGAAACGCCCGGCCGUAUAUCGAACCCGGAAGCUAAGCCCCAUAGCGCCGAUGGUUACUGUAACCGGGAGGUUGUGGGAGAGUAGGUCGCCGCCGUGAOH. When compared to other 5S RNAs, the sequence homology is greatest with Thermus aquaticus, and these two 5S RNAs reveal several features intermediate between those of typical gram-positive bacteria and gram-negative bacteria. PMID:6780979

  14. Regions of conservation and divergence in the 3' untranslated sequences of genomic RNA from Ross River virus isolates.

    PubMed

    Faragher, S G; Dalgarno, L

    1986-07-20

    The 3' untranslated (UT) sequences of the genomic RNAs of five geographic variants of the alphavirus Ross River virus (RRV) were determined and compared with the 3' UT sequence of RRV T48, the prototype strain. Part of the 3' UT region of Getah virus, a close serological relative of RRV, was also sequenced. The RRV 3' UT region varies markedly in length between variants. Large deletions or insertions, sequence rearrangements and single nucleotide substitutions are observed. A sequence tract of 49 to 58 nucleotides, which is repeated as four blocks in the RRV T48 3' UT region, occurs only once in the 3' UT region of one RRV strain (NB5092), indicating that the existence of repeat sequence blocks is not essential for RRV replication. However, the precise sequence of the 3' proximal copy of the repeat block and its position relative to the poly(A) tail were identical in all RRV isolates examined, suggesting that it has an important role in RRV replication. Nucleotide substitutions between RRV variants are distributed non-randomly along the length of the 3' UT region. The sequence of 120 to 130 nucleotides adjacent to the poly(A) tail is strongly conserved. Getah virus RNA contains three repeat sequence blocks in the 3' UT region. These are similar in sequence to those in RRV RNA but differ in their arrangement. Homology between the RRV and Getah 3' UT sequences is greatest in the 3' proximal repeat sequence block that shows three differences in 49 nucleotides. The 3' proximal repeat in Getah RNA occurs at the same position, relative to the poly(A) tail, as in all RRV variants. The RRV and Getah virus 3' UT sequences show extensive homology in the region between the 3' proximal repeat and the poly(A) tail but, apart from the repeat blocks themselves, they show no significant homology elsewhere.

  15. Algorithms for optimizing cross-overs in DNA shuffling.

    PubMed

    He, Lu; Friedman, Alan M; Bailey-Kellogg, Chris

    2012-03-21

    DNA shuffling generates combinatorial libraries of chimeric genes by stochastically recombining parent genes. The resulting libraries are subjected to large-scale genetic selection or screening to identify those chimeras with favorable properties (e.g., enhanced stability or enzymatic activity). While DNA shuffling has been applied quite successfully, it is limited by its homology-dependent, stochastic nature. Consequently, it is used only with parents of sufficient overall sequence identity, and provides no control over the resulting chimeric library. This paper presents efficient methods to extend the scope of DNA shuffling to handle significantly more diverse parents and to generate more predictable, optimized libraries. Our CODNS (cross-over optimization for DNA shuffling) approach employs polynomial-time dynamic programming algorithms to select codons for the parental amino acids, allowing for zero or a fixed number of conservative substitutions. We first present efficient algorithms to optimize the local sequence identity or the nearest-neighbor approximation of the change in free energy upon annealing, objectives that were previously optimized by computationally-expensive integer programming methods. We then present efficient algorithms for more powerful objectives that seek to localize and enhance the frequency of recombination by producing "runs" of common nucleotides either overall or according to the sequence diversity of the resulting chimeras. We demonstrate the effectiveness of CODNS in choosing codons and allocating substitutions to promote recombination between parents targeted in earlier studies: two GAR transformylases (41% amino acid sequence identity), two very distantly related DNA polymerases, Pol X and β (15%), and beta-lactamases of varying identity (26-47%). Our methods provide the protein engineer with a new approach to DNA shuffling that supports substantially more diverse parents, is more deterministic, and generates more predictable and more diverse chimeric libraries.

  16. Update on Pneumocystis carinii f. sp. hominis Typing Based on Nucleotide Sequence Variations in Internal Transcribed Spacer Regions of rRNA Genes

    PubMed Central

    Lee, Chao-Hung; Helweg-Larsen, Jannik; Tang, Xing; Jin, Shaoling; Li, Baozheng; Bartlett, Marilyn S.; Lu, Jang-Jih; Lundgren, Bettina; Lundgren, Jens D.; Olsson, Mats; Lucas, Sebastian B.; Roux, Patricia; Cargnel, Antonietta; Atzori, Chiara; Matos, Olga; Smith, James W.

    1998-01-01

    Pneumocystis carinii f. sp. hominis isolates from 207 clinical specimens from nine countries were typed based on nucleotide sequence variations in the internal transcribed spacer regions I and II (ITS1 and ITS2, respectively) of rRNA genes. The number of ITS1 nucleotides has been revised from the previously reported 157 bp to 161 bp. Likewise, the number of ITS2 nucleotides has been changed from 177 to 192 bp. The number of ITS1 sequence types has increased from 2 to 15, and that of ITS2 has increased from 3 to 14. The 15 ITS1 sequence types are designated types A through O, and the 14 ITS2 types are named types a through n. A total of 59 types of P. carinii f. sp. hominis were found in this study. PMID:9508304

  17. Nucleotide Sequence Diversity and Linkage Disequilibrium of Four Nuclear Loci in Foxtail Millet (Setaria italica).

    PubMed

    He, Shui-Lian; Yang, Yang; Morrell, Peter L; Yi, Ting-Shuang

    2015-01-01

    Foxtail millet (Setaria italica (L.) Beauv) is one of the earliest domesticated grains, which has been cultivated in northern China by 8,700 years before present (YBP) and across Eurasia by 4,000 YBP. Owing to a small genome and diploid nature, foxtail millet is a tractable model crop for studying functional genomics of millets and bioenergy grasses. In this study, we examined nucleotide sequence diversity, geographic structure, and levels of linkage disequilibrium at four nuclear loci (ADH1, G3PDH, IGS1 and TPI1) in representative samples of 311 landrace accessions across its cultivated range. Higher levels of nucleotide sequence and haplotype diversity were observed in samples from China relative to other sampled regions. Genetic assignment analysis classified the accessions into seven clusters based on nucleotide sequence polymorphisms. Intralocus LD decayed rapidly to half the initial value within ~1.2 kb or less.

  18. Complete genomic sequence of Powassan virus: evaluation of genetic elements in tick-borne versus mosquito-borne flaviviruses.

    PubMed

    Mandl, C W; Holzmann, H; Kunz, C; Heinz, F X

    1993-05-01

    The complete nucleotide sequence of the positive-stranded RNA genome of the tick-borne flavivirus Powassan (10,839 nucleotides) was elucidated and the amino acid sequence of all viral proteins was derived. Based on this sequence as well as serological data, Powassan virus represents the most divergent member of the tick-borne serocomplex within the genus flaviviruses, family Flaviviridae. The primary nucleotide sequence and potential RNA secondary structures of the Powassan virus genome as well as the protein sequences and the reactivities of the virion with a panel of monoclonal antibodies were compared to other tick-borne and mosquito-borne flaviviruses. These analyses corroborated significant differences between tick-borne and mosquito-borne flaviviruses, but also emphasized structural elements that are conserved among both vector groups. The comparisons among tick-borne flaviviruses revealed conserved sequence elements that might represent important determinants of the tick-borne flavivirus phenotype.

  19. Detection of a divergent variant of grapevine virus F by next-generation sequencing.

    PubMed

    Molenaar, Nicholas; Burger, Johan T; Maree, Hans J

    2015-08-01

    The complete genome sequence of a South African isolate of grapevine virus F (GVF) is presented. It was first detected by metagenomic next-generation sequencing of field samples and validated through direct Sanger sequencing. The genome sequence of GVF isolate V5 consists of 7539 nucleotides and contains a poly(A) tail. It has a typical vitivirus genome arrangement that comprises five open reading frames (ORFs), which share only 88.96 % nucleotide sequence identity with the existing complete GVF genome sequence (JX105428).

  20. DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.

    PubMed

    Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

    2017-01-01

    Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.

  1. A Comparison Study for DNA Motif Modeling on Protein Binding Microarray.

    PubMed

    Wong, Ka-Chun; Li, Yue; Peng, Chengbin; Wong, Hau-San

    2016-01-01

    Transcription factor binding sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, protein binding microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k = 8∼10). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build TFBS (also known as DNA motif) models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement if choosing di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.

  2. Nucleotide sequence composition and method for detection of neisseria gonorrhoeae

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, A.; Yang, H.L.

    1990-02-13

    This patent describes a composition of matter that is specific for {ital Neisseria gonorrhoeae}. It comprises: at least one nucleotide sequence for which the ratio of the amount of the sequence which hybridizes to chromosomal DNA of {ital Neisseria gonorrhoeae} to the amount of the sequence which hybridizes to chromosomal DNA of {ital Neisseria meningitidis} is greater than about five. The ratio being obtained by a method described.

  3. Haplotag: Software for Haplotype-Based Genotyping-by-Sequencing Analysis

    PubMed Central

    Tinker, Nicholas A.; Bekele, Wubishet A.; Hattori, Jiro

    2016-01-01

    Genotyping-by-sequencing (GBS), and related methods, are based on high-throughput short-read sequencing of genomic complexity reductions followed by discovery of single nucleotide polymorphisms (SNPs) within sequence tags. This provides a powerful and economical approach to whole-genome genotyping, facilitating applications in genomics, diversity analysis, and molecular breeding. However, due to the complexity of analyzing large data sets, applications of GBS may require substantial time, expertise, and computational resources. Haplotag, the novel GBS software described here, is freely available, and operates with minimal user-investment on widely available computer platforms. Haplotag is unique in fulfilling the following set of criteria: (1) operates without a reference genome; (2) can be used in a polyploid species; (3) provides a discovery mode, and a production mode; (4) discovers polymorphisms based on a model of tag-level haplotypes within sequenced tags; (5) reports SNPs as well as haplotype-based genotypes; and (6) provides an intuitive visual “passport” for each inferred locus. Haplotag is optimized for use in a self-pollinating plant species. PMID:26818073

  4. Whole exome sequencing to estimate alloreactivity potential between donors and recipients in stem cell transplantation

    PubMed Central

    Sampson, Juliana K.; Sheth, Nihar U.; Koparde, Vishal N.; Scalora, Allison F.; Serrano, Myrna G.; Lee, Vladimir; Roberts, Catherine H.; Jameson-Lee, Max; Ferreira-Gonzalez, Andrea; Manjili, Masoud H.; Buck, Gregory A.; Neale, Michael C.; Toor, Amir A.

    2016-01-01

    Summary Whole exome sequencing (WES) was performed on stem cell transplant donor-recipient (D-R) pairs to determine the extent of potential antigenic variation at a molecular level. In a small cohort of D-R pairs, a high frequency of sequence variation was observed between the donor and recipient exomes independent of human leucocyte antigen (HLA) matching. Nonsynonymous, nonconservative single nucleotide polymorphisms were approximately twice as frequent in HLA-matched unrelated, compared with related D-R pairs. When mapped to individual chromosomes, these polymorphic nucleotides were uniformly distributed across the entire exome. In conclusion, WES reveals extensive nucleotide sequence variation in the exomes of HLA-matched donors and recipients. PMID:24749631

  5. The nucleotide sequences of 5S rRNAs from a fern Dryopteris acuminata and a horsetail Equisetum arvense.

    PubMed Central

    Hori, H; Osawa, S; Takaiwa, F; Sugiura, M

    1984-01-01

    The nucleotide sequences from two Pteridophyta species, a fern Dryopteris acuminata and a horsetail Equisetum arvense have been determined. These two sequences are more related to those of the Bryophyta species (88% identity on average) than to those of seed plants (84% identity on average). PMID:6538332

  6. Energy efficiency trade-offs drive nucleotide usage in transcribed regions

    PubMed Central

    Chen, Wei-Hua; Lu, Guanting; Bork, Peer; Hu, Songnian; Lercher, Martin J.

    2016-01-01

    Efficient nutrient usage is a trait under universal selection. A substantial part of cellular resources is spent on making nucleotides. We thus expect preferential use of cheaper nucleotides especially in transcribed sequences, which are often amplified thousand-fold compared with genomic sequences. To test this hypothesis, we derive a mutation-selection-drift equilibrium model for nucleotide skews (strand-specific usage of ‘A' versus ‘T' and ‘G' versus ‘C'), which explains nucleotide skews across 1,550 prokaryotic genomes as a consequence of selection on efficient resource usage. Transcription-related selection generally favours the cheaper nucleotides ‘U' and ‘C' at synonymous sites. However, the information encoded in mRNA is further amplified through translation. Due to unexpected trade-offs in the codon table, cheaper nucleotides encode on average energetically more expensive amino acids. These trade-offs apply to both strand-specific nucleotide usage and GC content, causing a universal bias towards the more expensive nucleotides ‘A' and ‘G' at non-synonymous coding sites. PMID:27098217

  7. Detection of possible restriction sites for type II restriction enzymes in DNA sequences.

    PubMed

    Gagniuc, P; Cimponeriu, D; Ionescu-Tîrgovişte, C; Mihai, Andrada; Stavarachi, Monica; Mihai, T; Gavrilă, L

    2011-01-01

    In order to make a step forward in the knowledge of the mechanism operating in complex polygenic disorders such as diabetes and obesity, this paper proposes a new algorithm (PRSD -possible restriction site detection) and its implementation in Applied Genetics software. This software can be used for in silico detection of potential (hidden) recognition sites for endonucleases and for nucleotide repeats identification. The recognition sites for endonucleases may result from hidden sequences through deletion or insertion of a specific number of nucleotides. Tests were conducted on DNA sequences downloaded from NCBI servers using specific recognition sites for common type II restriction enzymes introduced in the software database (n = 126). Each possible recognition site indicated by the PRSD algorithm implemented in Applied Genetics was checked and confirmed by NEBcutter V2.0 and Webcutter 2.0 software. In the sequence NG_008724.1 (which includes 63632 nucleotides) we found a high number of potential restriction sites for ECO R1 that may be produced by deletion (n = 43 sites) or insertion (n = 591 sites) of one nucleotide. The second module of Applied Genetics has been designed to find simple repeats sizes with a real future in understanding the role of SNPs (Single Nucleotide Polymorphisms) in the pathogenesis of the complex metabolic disorders. We have tested the presence of simple repetitive sequences in five DNA sequence. The software indicated exact position of each repeats detected in the tested sequences. Future development of Applied Genetics can provide an alternative for powerful tools used to search for restriction sites or repetitive sequences or to improve genotyping methods.

  8. Information Entropy of Influenza A Segment 7

    NASA Astrophysics Data System (ADS)

    Thompson, William A.; Fan, Shaohua; Weltman, Joel K.

    2008-12-01

    Information entropy (H) is a measure of uncertainty at each position within in a sequence of nucleotides.H was used to characterize a set of influenza A segment 7 nucleotide sequences. Nucleotide locations of high entropy were identified near the 5’ start of all of the sequences and the sequences were assigned to subsets according to synonymous nucleotide variants at those positions: either uracil at position six (U6), cytosine at position six (C6), adenine (A12) at position 12, guanine at position 12 (G12), adenine at position 15 (A15) or cytosine (C15) at position 15. H values were found to be correlated/corresponding (Kendall tau) along the lengths of the nucleotide segments of the subset pairs at each position. However, the H values of each subset of sequences were statistically distinguishable from those of the other member of the pair (Kolmogorov-Smirnov test). The joint probability of uncorrelated distributions of U6 and C6 sequences to viral subtypes and to viral host species was 34 times greater than for the A12:G12 subset pair and 214 times greater than for the A15:C15 pair. This result indicates that the high entropy position six of segment 7 is either a reporter or a sentinel location. The fact that not one of the H5N1 sequences in the dataset was a member of the C6 subset, but all 125 H5N1 sequences are members of the U6 subset suggests a non-random sentinel function.

  9. PUTATIVE GENE PROMOTER SEQUENCES IN THE CHLORELLA VIRUSES

    PubMed Central

    Fitzgerald, Lisa A.; Boucher, Philip T.; Yanai-Balser, Giane; Suhre, Karsten; Graves, Michael V.; Van Etten, James L.

    2008-01-01

    Three short (7 to 9 nucleotides) highly conserved nucleotide sequences were identified in the putative promoter regions (150 bp upstream and 50 bp downstream of the ATG translation start site) of three members of the genus Chlorovirus, family Phycodnaviridae. Most of these sequences occurred in similar locations within the defined promoter regions. The sequence and location of the motifs were often conserved among homologous ORFs within the Chlorovirus family. One of these conserved sequences (AATGACA) is predominately associated with genes expressed early in virus replication. PMID:18768195

  10. The complete sequence of Cymbidium mosaic virus from Vanilla fragrans in Hainan, China.

    PubMed

    He, Zhen; Jiang, Dongmei; Liu, Aiqin; Sang, Liwei; Li, Wenfeng; Li, Shifang

    2011-06-01

    The complete nucleotide sequence of Cymbidium mosaic virus (CymMV) isolated from vanilla in Hainan province, China was determined for the first time. It comprised 6,224 nucleotides; sequence analysis suggested that the isolate we obtained was a member of the genus Potexvirus, and its sequence shared 86.67-96.61% identities with previously reported sequences. Phylogenetic analysis suggested that CymMV from vanilla fragrans was clustered into subgroup A and the isolates in this subgroup displayed little regional difference.

  11. Stereochemical analysis of the functional significance of the conserved inverted CCAAT and TATA elements in the rat bone sialoprotein gene promoter.

    PubMed

    Su, Ming; Lee, Daniel; Ganss, Bernhard; Sodek, Jaro

    2006-04-14

    Basal transcription of the bone sialoprotein gene is mediated by highly conserved inverted CCAAT (ICE; ATTGG) and TATA elements (TTTATA) separated by precisely 21 nucleotides. Here we studied the importance of the relative position and orientation of the CCAAT and TATA elements in the proximal promoter by measuring the transcriptional activity of a series of mutated reporter constructs in transient transfection assays. Whereas inverting the TTTATA (wild type) to a TATAAA (consensus TATA) sequence increased transcription slightly, transcription was reduced when the flanking dinucleotides were also inverted. In contrast, reversing the ATTGG (wild type; ICE) to a CCAAT (RICE) sequence caused a marked reduction in transcription, whereas both transcription and NF-Y binding were progressively increased with the simultaneous inversion of flanking nucleotides (f-RICE-f). Reducing the distance between the ICE and TATA elements produced cyclical changes in transcriptional activity that correlated with progressive alterations in the relative positions of the CCAAT and TATA elements on the face of the DNA helix. Minimal transcription was observed after 5 nucleotides were deleted (equivalent to approximately one half turn of the helix), whereas transcription was fully restored after deleting 10 nucleotides (approximately one full turn of the DNA helix), transcriptional activity being progressively lost with deletions beyond 10 nucleotides. In comparison, when deletions were made with the ICE in the reversed (f-RICE-f) orientation transcriptional activity was progressively lost with no recovery. These results show that, although transcription can still occur when the CCAAT box is reversed and/or displaced relative to the TATA box, the activity is dependent upon the flexibility of the intervening DNA helix needed to align the NF-Y complex on the CCAAT box with preinitiation complex proteins that bind to the TATA box. Thus, the precise location and orientation of the CCAAT element is necessary for optimizing basal transcription of the bone sialoprotein gene.

  12. Human papillomavirus type 18 variant lineages in United States populations characterized by sequence analysis of LCR-E6, E2, and L1 regions.

    PubMed

    Arias-Pulido, Hugo; Peyton, Cheri L; Torrez-Martínez, Norah; Anderson, D Nelson; Wheeler, Cosette M

    2005-07-20

    While HPV 16 variant lineages have been well characterized, the knowledge about HPV 18 variants is limited. In this study, HPV 18 nucleotide variations in the E2 hinge region were characterized by sequence analysis in 47 control and 51 tumor specimens. Fifty of these specimens were randomly selected for sequencing of an LCR-E6 segment and 20 samples representative of LCR-E6 and E2 sequence variants were examined across the L1 region. A total of 2770 nucleotides per HPV 18 variant genome were considered in this study. HPV 18 variant nucleotides were linked among all gene segments analyzed and grouped into three main branches: Asian-American (AA), European (E), and African (Af). These three branches were equally distributed among controls and cases and when stratified by Hispanic and non-Hispanic ethnicities. Among invasive cervical cancer cases, no significant differences in the three HPV variant branches were observed among ethnic groups or when stratified by histopathology (squamous vs. adenocarcinoma). The Af branch showed the greatest nucleotide variability when compared to the HPV 18 reference sequence and was more closely related to HPV 45 than either AA or E branches. Our data also characterize nucleotide and amino acid variations in the L1 capsid gene among HPV 18 variants, which may be relevant to vaccine strategies and subsequent studies of naturally occurring HPV 18 variants. Several novel HPV 18 nucleotide variations were identified in this study.

  13. The repeating nucleotide sequence in the repetitive mitochondrial DNA from a "low-density" petite mutant of yeast.

    PubMed Central

    Van Kreijl, C F; Bos, J L

    1977-01-01

    The repeating nucleotide sequence of 68 base pairs in the mtDNA from an ethidium-induced cytoplasmic petite mutant of yeast has been determined. For sequence analysis specifically primed and terminated RNA copies, obtained by in vitro transcription of the separated strands, were use. The sequence consists of 66 consecutive AT base pairs flanked by two GC pairs and comprises nearly all of the mutant mitochondrial genome. The sequence, moreover, also represents the first part of wild-type mtDNA sequence so far. Images PMID:198740

  14. [Detection of gene mutation in glucose-6-phosphate dehydrogenase deficiency by RT-PCR sequencing].

    PubMed

    Lyu, Rong-Yu; Chen, Xiao-Wen; Zhang, Min; Chen, Yun-Sheng; Yu, Jie; Wen, Fei-Qiu

    2016-07-01

    Since glucose-6-phosphate dehydrogenase (G6PD) deficiency is the most common hereditary hemolytic erythrocyte enzyme deficiency, most cases have single nucleotide mutations in the coding region, and current test methods for gene mutation have some missed detections, this study aimed to investigate the feasibility of RT-PCR sequencing in the detection of gene mutation in G6PD deficiency. According to the G6PD/6GPD ratio, 195 children with anemia of unknown cause or who underwent physical examination between August 2013 and July 2014 were classified into G6PD-deficiency group with 130 children (G6PD/6GPD ratio <1.00) and control group with 65 children (G6PD/6GPD ratio≥1.00). The primer design and PCR amplification conditions were optimized, and RT-PCR sequencing was used to analyze the complete coding sequence and verify the genomic DNA sequence in the two groups. In the G6PD-deficiency group, the detection rate of gene mutation was 100% and 13 missense mutations were detected, including one new mutation. In the control group, no missense mutation was detected in 28 boys; 13 heterozygous missense mutations, 1 homozygous same-sense mutation (C1191T) which had not been reported in China and abroad, and 14 single nucleotide polymorphisms of C1311T were detected in 37 girls. The control group showed a high rate of missed detection of G6PD deficiency (carriers) in the specimens from girls (35%, 13/37). RT-PCR sequencing has a high detection rate of G6PD gene mutation and a certain value in clinical diagnosis of G6PD deficiency.

  15. [Correlation of codon biases and potential secondary structures with mRNA translation efficiency in unicellular organisms].

    PubMed

    Vladimirov, N V; Likhoshvaĭ, V A; Matushkin, Iu G

    2007-01-01

    Gene expression is known to correlate with degree of codon bias in many unicellular organisms. However, such correlation is absent in some organisms. Recently we demonstrated that inverted complementary repeats within coding DNA sequence must be considered for proper estimation of translation efficiency, since they may form secondary structures that obstruct ribosome movement. We have developed a program for estimation of potential coding DNA sequence expression in defined unicellular organism using its genome sequence. The program computes elongation efficiency index. Computation is based on estimation of coding DNA sequence elongation efficiency, taking into account three key factors: codon bias, average number of inverted complementary repeats, and free energy of potential stem-loop structures formed by the repeats. The influence of these factors on translation is numerically estimated. An optimal proportion of these factors is computed for each organism individually. Quantitative translational characteristics of 384 unicellular organisms (351 bacteria, 28 archaea, 5 eukaryota) have been computed using their annotated genomes from NCBI GenBank. Five potential evolutionary strategies of translational optimization have been determined among studied organisms. A considerable difference of preferred translational strategies between Bacteria and Archaea has been revealed. Significant correlations between elongation efficiency index and gene expression levels have been shown for two organisms (S. cerevisiae and H. pylori) using available microarray data. The proposed method allows to estimate numerically the coding DNA sequence translation efficiency and to optimize nucleotide composition of heterologous genes in unicellular organisms. http://www.mgs.bionet.nsc.ru/mgs/programs/eei-calculator/.

  16. Thermodynamically balanced inside-out (TBIO) PCR-based gene synthesis: a novel method of primer design for high-fidelity assembly of longer gene sequences

    PubMed Central

    Gao, Xinxin; Yo, Peggy; Keith, Andrew; Ragan, Timothy J.; Harris, Thomas K.

    2003-01-01

    A novel thermodynamically-balanced inside-out (TBIO) method of primer design was developed and compared with a thermodynamically-balanced conventional (TBC) method of primer design for PCR-based gene synthesis of codon-optimized gene sequences for the human protein kinase B-2 (PKB2; 1494 bp), p70 ribosomal S6 subunit protein kinase-1 (S6K1; 1622 bp) and phosphoinositide-dependent protein kinase-1 (PDK1; 1712 bp). Each of the 60mer TBIO primers coded for identical nucleotide regions that the 60mer TBC primers covered, except that half of the TBIO primers were reverse complement sequences. In addition, the TBIO and TBC primers contained identical regions of temperature- optimized primer overlaps. The TBC method was optimized to generate sequential overlapping fragments (∼0.4–0.5 kb) for each of the gene sequences, and simultaneous and sequential combinations of overlapping fragments were tested for their ability to be assembled under an array of PCR conditions. However, no fully synthesized gene sequences could be obtained by this approach. In contrast, the TBIO method generated an initial central fragment (∼0.4–0.5 kb), which could be gel purified and used for further inside-out bidirectional elongation by additional increments of 0.4–0.5 kb. By using the newly developed TBIO method of PCR-based gene synthesis, error-free synthetic genes for the human protein kinases PKB2, S6K1 and PDK1 were obtained with little or no corrective mutagenesis. PMID:14602936

  17. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Form and format for... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... Code for Information Interchange (ASCII) text. No other formats shall be allowed. (3) The computer...

  18. The nucleotide sequence of 5S rRNA from a cellular slime mold Dictyostelium discoideum.

    PubMed Central

    Hori, H; Osawa, S; Iwabuchi, M

    1980-01-01

    The nucleotide sequence of ribosomal 5S rRNA from a cellular slime mold Dictyostelium discoideum is GUAUACGGCCAUACUAGGUUGGAAACACAUCAUCCCGUUCGAUCUGAUA AGUAAAUCGACCUCAGGCCUUCCAAGUACUCUGGUUGGAGACAACAGGGGAACAUAGGGUGCUGUAUACU. A model for the secondary structure of this 5S rRNA is proposed. The sequence is more similar to those of animals (62% similarity on the average) rather than those of yeasts (56%). Images PMID:7465421

  19. A nucleotide sequence comparison of coxsackievirus B4 isolates from aquatic samples and clinical specimens.

    PubMed Central

    Hughes, M. S.; Hoey, E. M.; Coyle, P. V.

    1993-01-01

    Ten coxsackievirus B4 (CVB4) strains isolated from clinical and environmental sources in Northern Ireland in 1985-7, were compared at the nucleotide sequence level. Dideoxynucleotide sequencing of a polymerase chain reaction (PCR) amplified fragment, spanning the VP1/P2A genomic region, classified the isolates into two distinct groups or genotypes as defined by Rico-Hesse and colleagues for poliovirus type 1. Isolates within each group shared approximately 99% sequence identity at the nucleotide level whereas < or = 86% sequence identity was shared between groups. One isolate derived from a clinical specimen in 1987 was grouped with six CVB4 isolates recovered from the aquatic environment in 1986-7. The second group comprised CVB4 isolates from clinical specimens in 1985-6. Both groups were different at the nucleotide level from the prototype strain isolated in 1950. It was concluded that the method could be used to sub-type CVB4 isolates and would be of value in epidemiological studies of CVB4. Predicted amino acid sequences revealed non-conservation of the tyrosine residue at the VP1/P2A cleavage site but were of little value in distinguishing CVB4 variants. PMID:8386098

  20. The complete nucleotide sequence of RNA beta from the type strain of barley stripe mosaic virus.

    PubMed Central

    Gustafson, G; Armour, S L

    1986-01-01

    The complete nucleotide sequence of RNA beta from the type strain of barley stripe mosaic virus (BSMV) has been determined. The sequence is 3289 nucleotides in length and contains four open reading frames (ORFs) which code for proteins of Mr 22,147 (ORF1), Mr 58,098 (ORF2), Mr 17,378 (ORF3), and Mr 14,119 (ORF4). The predicted N-terminal amino acid sequence of the polypeptide encoded by the ORF nearest the 5'-end of the RNA (ORF1) is identical (after the initiator methionine) to the published N-terminal amino acid sequence of BSMV coat protein for 29 of the first 30 amino acids. ORF2 occupies the central portion of the coding region of RNA beta and ORF3 is located at the 3'-end. The ORF4 sequence overlaps the 3'-region of ORF2 and the 5'-region of ORF3 and differs in codon usage from the other three RNA beta ORFs. The coding region of RNA beta is followed by a poly(A) tract and a 238 nucleotide tRNA-like structure which are common to all three BSMV genomic RNAs. Images PMID:3754962

  1. GrigoraSNPs: Optimized Analysis of SNPs for DNA Forensics.

    PubMed

    Ricke, Darrell O; Shcherbina, Anna; Michaleas, Adam; Fremont-Smith, Philip

    2018-04-16

    High-throughput sequencing (HTS) of single nucleotide polymorphisms (SNPs) enables additional DNA forensic capabilities not attainable using traditional STR panels. However, the inclusion of sets of loci selected for mixture analysis, extended kinship, phenotype, biogeographic ancestry prediction, etc., can result in large panel sizes that are difficult to analyze in a rapid fashion. GrigoraSNP was developed to address the allele-calling bottleneck that was encountered when analyzing SNP panels with more than 5000 loci using HTS. GrigoraSNPs uses a MapReduce parallel data processing on multiple computational threads plus a novel locus-identification hashing strategy leveraging target sequence tags. This tool optimizes the SNP calling module of the DNA analysis pipeline with runtimes that scale linearly with the number of HTS reads. Results are compared with SNP analysis pipelines implemented with SAMtools and GATK. GrigoraSNPs removes a computational bottleneck for processing forensic samples with large HTS SNP panels. Published 2018. This article is a U.S. Government work and is in the public domain in the USA.

  2. Control of total GFP expression by alterations to the 3′ region nucleotide sequence

    PubMed Central

    2013-01-01

    Background Previously, we distinguished the Escherichia coli type II cytoplasmic membrane translocation pathways of Tat, Yid, and Sec for unfolded and folded soluble target proteins. The translocation of folded protein to the periplasm for soluble expression via the Tat pathway was controlled by an N-terminal hydrophilic leader sequence. In this study, we investigated the effect of the hydrophilic C-terminal end and its nucleotide sequence on total and soluble protein expression. Results The native hydrophilic C-terminal end of GFP was obtained by deleting the C-terminal peptide LeuGlu-6×His, derived from pET22b(+). The corresponding clones induced total and soluble GFP expression that was either slightly increased or dramatically reduced, apparently through reconstruction of the nucleotide sequence around the stop codon in the 3′ region. In the expression-induced clones, the hydrophilic C-terminus showed increased Tat pathway specificity for soluble expression. However, in the expression-reduced clone, after analyzing the role of the 5′ poly(A) coding sequence with a substituted synonymous codon, we proved that the longer 5′ poly(A) coding sequence interacted with the reconstructed 3′ region nucleotide sequence to create a new mRNA tertiary structure between the 5′ and 3′ regions, which resulted in reduced total GFP expression. Further, to recover the reduced expression by changing the 3′ nucleotide sequence, after replacing selected C-terminal 5′ codons and the stop codon in the ORF with synonymous codons, total GFP expression in most of the clones was recovered to the undeleted control level. The insertion of trinucleotides after the stop codon in the 3′-UTR recovered or reduced total GFP expression. RT-PCR revealed that the level of total protein expression was controlled by changes in translational or transcriptional regulation, which were induced or reduced by the substitution or insertion of 3′ region nucleotides. Conclusions We found that the hydrophilic C-terminal end of GFP increased Tat pathway specificity and that the 3′ nucleotide sequence played an important role in total protein expression through translational and transcriptional regulation. These findings may be useful for efficiently producing recombinant proteins as well as for potentially controlling the expression level of specific genes in the body for therapeutic purposes. PMID:23834827

  3. Molecular characterization of the vitamin D receptor (VDR) gene in Holstein cows.

    PubMed

    Ali, Mayar O; El-Adl, Mohamed A; Ibrahim, Hussam M M; Elseedy, Youssef Y; Rizk, Mohamed A; El-Khodery, Sabry A

    2018-06-01

    Vitamin D plays a vital role in calcium homeostasis, growth, and immunoregulation. Because little is known about the vitamin D receptor (VDR) gene in cattle, the aim of the present investigation was to present the molecular characterization of exons 5 and 6 of the VDR gene in Holstein cows. DNA extraction, genomic sequencing, phylogenetic analysis, synteny mapping and single nucleotide gene polymorphism analysis of the VDR gene were performed to assess blood samples collected from 50 clinically healthy Holstein cows. The results revealed the presence of a 450-base pair (bp) nucleotide sequence that resembled exons 5 and 6 with intron 5 enclosed between these exons. Sequence alignment and phylogenetic analysis revealed a close relationship between the sequenced VDR region and that found in Hereford cattle. A close association between this region and the corresponding region in small ruminants was also documented. Moreover, a single nucleotide polymorphism (SNP) that caused the replacement of a glutamate with an arginine in the deduced amino acid sequence was detected at position 7 of exon 5. In conclusion, Holstein and Hereford cattle differ with respect to exon 5 of the VDR gene. Phylogenetic analysis of the VDR gene based on nucleotide sequence produced different results from prior analyses based on amino acid sequence. Copyright © 2018 Elsevier Ltd. All rights reserved.

  4. The CD8α gene in duck (Anatidae): cloning, characterization, and expression during viral infection.

    PubMed

    Xu, Qi; Chen, Yang; Zhao, Wen Ming; Huang, Zheng Yang; Duan, Xiu Jun; Tong, Yi Yu; Zhang, Yang; Li, Xiu; Chang, Guo Bin; Chen, Guo Hong

    2015-02-01

    Cluster of differentiation 8 alpha (CD8α) is critical for cell-mediated immune defense and T-cell development. Although CD8α sequences have been reported for several species, very little is known about CD8α in ducks. To elucidate the mechanisms involved in the innate and adaptive immune responses of ducks, we cloned CD8α coding sequences from domestic, Muscovy, Mallard, and Spotbill ducks using reverse transcription polymerase chain reaction (RT-PCR). Each sequence consisted of 714 nucleotides and encoded a signal peptide, an IgV-like domain, a stalk region, a transmembrane region, and a cytoplasmic tail. We identified 58 nucleotide differences and 37 amino acid differences among the four types of duck; of these, 53 nucleotide and 33 amino acid differences were between Muscovy ducks and the other duck species. The CD8α cDNA sequence from domestic duck consisted of a 61-nucleotide 5' untranslated region (UTR), a 714-nucleotide open reading frame, and an 849-nucleotide 3' UTR. Multiple sequence alignments showed that the amino acid sequence of CD8α is conserved in vertebrates. RT-PCR revealed that expression of CD8α mRNA of domestic ducks was highest in the thymus and very low in the kidney, cerebrum, cerebellum, and muscle. Immunohistochemical analyses detected CD8α on the splenic corpuscle and periarterial lymphatic sheath of the spleen. CD8α mRNA in domestic ducklings was initially up-regulated, and then down-regulated, in the thymus, spleen, and liver after treatment with duck hepatitis virus type I (DHV-1) or the immunostimulant polyriboinosinic polyribocytidylic acid (poly I:C).

  5. Whole exome sequencing to estimate alloreactivity potential between donors and recipients in stem cell transplantation.

    PubMed

    Sampson, Juliana K; Sheth, Nihar U; Koparde, Vishal N; Scalora, Allison F; Serrano, Myrna G; Lee, Vladimir; Roberts, Catherine H; Jameson-Lee, Max; Ferreira-Gonzalez, Andrea; Manjili, Masoud H; Buck, Gregory A; Neale, Michael C; Toor, Amir A

    2014-08-01

    Whole exome sequencing (WES) was performed on stem cell transplant donor-recipient (D-R) pairs to determine the extent of potential antigenic variation at a molecular level. In a small cohort of D-R pairs, a high frequency of sequence variation was observed between the donor and recipient exomes independent of human leucocyte antigen (HLA) matching. Nonsynonymous, nonconservative single nucleotide polymorphisms were approximately twice as frequent in HLA-matched unrelated, compared with related D-R pairs. When mapped to individual chromosomes, these polymorphic nucleotides were uniformly distributed across the entire exome. In conclusion, WES reveals extensive nucleotide sequence variation in the exomes of HLA-matched donors and recipients. © 2014 John Wiley & Sons Ltd.

  6. Molecular characterization of the virulent infectious hematopoietic necrosis virus (IHNV) strain 220-90

    PubMed Central

    2010-01-01

    Background Infectious hematopoietic necrosis virus (IHNV) is the type species of the genus Novirhabdovirus, within the family Rhabdoviridae, infecting several species of wild and hatchery reared salmonids. Similar to other rhabdoviruses, IHNV has a linear single-stranded, negative-sense RNA genome of approximately 11,000 nucleotides. The IHNV genome encodes six genes; the nucleocapsid, phosphoprotein, matrix protein, glycoprotein, non-virion protein and polymerase protein genes, respectively. This study describes molecular characterization of the virulent IHNV strain 220-90, belonging to the M genogroup, and its phylogenetic relationships with available sequences of IHNV isolates worldwide. Results The complete genomic sequence of IHNV strain 220-90 was determined from the DNA of six overlapping clones obtained by RT-PCR amplification of genomic RNA. The complete genome sequence of 220-90 comprises 11,133 nucleotides (GenBank GQ413939) with the gene order of 3'-N-P-M-G-NV-L-5'. These genes are separated by conserved gene junctions, with di-nucleotide gene spacers. An additional uracil nucleotide was found at the end of the 5'-trailer region, which was not reported before in other IHNV strains. The first 15 of the 16 nucleotides at the 3'- and 5'-termini of the genome are complementary, and the first 4 nucleotides at 3'-ends of the IHNV are identical to other novirhadoviruses. Sequence homology and phylogenetic analysis of the glycoprotein genes show that 220-90 strain is 97% identical to most of the IHNV strains. Comparison of the virulent 220-90 genomic sequences with less virulent WRAC isolate shows more than 300 nucleotides changes in the genome, which doesn't allow one to speculate putative residues involved in the virulence of IHNV. Conclusion We have molecularly characterized one of the well studied IHNV isolates, 220-90 of genogroup M, which is virulent for rainbow trout, and compared phylogenetic relationship with North American and other strains. Determination of the complete nucleotide sequence is essential for future studies on pathogenesis of IHNV using a reverse genetics approach and developing efficient control strategies. PMID:20085652

  7. Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing.

    PubMed

    Zhang, Jin; Ruhlman, Tracey A; Mower, Jeffrey P; Jansen, Robert K

    2013-12-29

    Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition, results indicated no major improvements in breadth of coverage with data sets larger than six billion nucleotides or when sampling RNA from four tissue types rather than from a single tissue. Finally, this work demonstrates the power of cross-compartmental genomic analyses to deepen our understanding of the correlated evolution of the nuclear, plastid, and mitochondrial genomes in plants.

  8. Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing

    PubMed Central

    2013-01-01

    Background Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Results Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. Conclusions The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition, results indicated no major improvements in breadth of coverage with data sets larger than six billion nucleotides or when sampling RNA from four tissue types rather than from a single tissue. Finally, this work demonstrates the power of cross-compartmental genomic analyses to deepen our understanding of the correlated evolution of the nuclear, plastid, and mitochondrial genomes in plants. PMID:24373163

  9. Hop stunt viroid: molecular cloning and nucleotide sequence of the complete cDNA copy.

    PubMed Central

    Ohno, T; Takamatsu, N; Meshi, T; Okada, Y

    1983-01-01

    The complete cDNA of hop stunt viroid (HSV) has been cloned by the method of Okayama and Berg (Mol.Cell.Biol.2,161-170. (1982] and the complete nucleotide sequence has been established. The covalently closed circular single-stranded HSV RNA consists of 297 nucleotides. The secondary structure predicted for HSV contains 67% of its residues base-paired. The native HSV can possess an extended rod-like structure characteristic of viroids previously established. The central region of the native HSV has a similar structure to the conserved region found in all viroids sequenced so far except for avocado sunblotch viroid. The sequence homologous to the 5'-end of U1a RNA is also found in the sequence of HSV but not in the central conserved region. Images PMID:6312412

  10. Nucleotide sequences of Japanese isolates of citrus vein enation virus.

    PubMed

    Nakazono-Nagaoka, Eiko; Fujikawa, Takashi; Iwanami, Toru

    2017-03-01

    The genomic sequences of five Japanese isolates of citrus vein enation virus (CVEV) isolates that induce vein enation were determined and compared with that of the Spanish isolate VE-1. The nucleotide sequences of all Japanese isolates were 5,983 nt in length. The genomic RNA of Japanese isolates had five potential open reading frames (ORF 0, ORF 1, ORF 2, ORF 3, and ORF 5) in the positive-sense strand. The nucleotide sequence identity among the Japanese isolates and Spanish isolate VE-1 ranged from 98.0% to 99.8%. Comparison of the partial amino acid sequences of ten Japanese isolates and three Spanish isolates suggested that four amino acid residues, at positions of 83, 104, and 113 in ORF 2 and position 41 in ORF 5, might be unique to some Japanese isolates.

  11. Detecting and Analyzing Genetic Recombination Using RDP4.

    PubMed

    Martin, Darren P; Murrell, Ben; Khoosal, Arjun; Muhire, Brejnev

    2017-01-01

    Recombination between nucleotide sequences is a major process influencing the evolution of most species on Earth. The evolutionary value of recombination has been widely debated and so too has its influence on evolutionary analysis methods that assume nucleotide sequences replicate without recombining. When nucleic acids recombine, the evolution of the daughter or recombinant molecule cannot be accurately described by a single phylogeny. This simple fact can seriously undermine the accuracy of any phylogenetics-based analytical approach which assumes that the evolutionary history of a set of recombining sequences can be adequately described by a single phylogenetic tree. There are presently a large number of available methods and associated computer programs for analyzing and characterizing recombination in various classes of nucleotide sequence datasets. Here we examine the use of some of these methods to derive and test recombination hypotheses using multiple sequence alignments.

  12. Technical Considerations for Reduced Representation Bisulfite Sequencing with Multiplexed Libraries

    PubMed Central

    Chatterjee, Aniruddha; Rodger, Euan J.; Stockwell, Peter A.; Weeks, Robert J.; Morison, Ian M.

    2012-01-01

    Reduced representation bisulfite sequencing (RRBS), which couples bisulfite conversion and next generation sequencing, is an innovative method that specifically enriches genomic regions with a high density of potential methylation sites and enables investigation of DNA methylation at single-nucleotide resolution. Recent advances in the Illumina DNA sample preparation protocol and sequencing technology have vastly improved sequencing throughput capacity. Although the new Illumina technology is now widely used, the unique challenges associated with multiplexed RRBS libraries on this platform have not been previously described. We have made modifications to the RRBS library preparation protocol to sequence multiplexed libraries on a single flow cell lane of the Illumina HiSeq 2000. Furthermore, our analysis incorporates a bioinformatics pipeline specifically designed to process bisulfite-converted sequencing reads and evaluate the output and quality of the sequencing data generated from the multiplexed libraries. We obtained an average of 42 million paired-end reads per sample for each flow-cell lane, with a high unique mapping efficiency to the reference human genome. Here we provide a roadmap of modifications, strategies, and trouble shooting approaches we implemented to optimize sequencing of multiplexed libraries on an a RRBS background. PMID:23193365

  13. Dimeric PROP1 binding to diverse palindromic TAAT sequences promotes its transcriptional activity.

    PubMed

    Nakayama, Michie; Kato, Takako; Susa, Takao; Sano, Akiko; Kitahara, Kousuke; Kato, Yukio

    2009-08-13

    Mutations in the Prop1 gene are responsible for murine Ames dwarfism and human combined pituitary hormone deficiency with hypogonadism. Recently, we reported that PROP1 is a possible transcription factor for gonadotropin subunit genes through plural cis-acting sites composed of AT-rich sequences containing a TAAT motif which differs from its consensus binding sequence known as PRDQ9 (TAATTGAATTA). This study aimed to verify the binding specificity and sequence of PROP1 by applying the method of SELEX (Systematic Evolution of Ligands by EXponential enrichment), EMSA (electrophoretic mobility shift assay) and transient transfection assay. SELEX, after 5, 7 and 9 generations of selection using a random sequence library, showed that nucleotides containing one or two TAAT motifs were accumulated and accounted for 98.5% at the 9th generation. Aligned sequences and EMSA demonstrated that PROP1 binds preferentially to 11 nucleotides composed of an inverted TAAT motif separated by 3 nucleotides with variation in the half site of palindromic TAAT motifs and with preferential requirement of T at the nucleotide number 5 immediately 3' to a TAAT motif. Transient transfection assay demonstrated first that dimeric binding of PROP1 to an inverted TAAT motif and its cognates resulted in transcriptional activation, whereas monomeric binding of PROP1 to a single TAAT motif and an inverted ATTA motif did not mediate activation. Thus, this study demonstrated that dimeric binding of PROP1 is able to recognize diverse palindromic TAAT sequences separated by 3 nucleotides and to exhibit its transcriptional activity.

  14. Nucleotide sequence of a resistance breaking mutant of southern bean mosaic virus.

    PubMed

    Lee, L; Anderson, E J

    1998-01-01

    SBMV-S is a resistance-breaking mutant of an Arkansas isolate of the bean strain of southern bean mosaic virus (SBMV-BARK) that is able to move systemically in Phaseolus vulgaris cvs. Pinto and Great Northern, whereas the wild-type SBMV-BARK causes local necrotic lesions and is restricted to the inoculated leaves of these hosts. Sequence analysis of the 4136 nucleotide genomes of SBMV-BARK and SBMV-S revealed seven nucleotide differences, but only four deduced amino acid changes. A single amino acid change occurred in the C-terminal region of the putative RNA-dependent RNA polymerase and three differences were identified in the N-terminal portion of the virus coat protein. SBMV-BARK and SBMV-S were compared with other sobemoviruses and were found to contain a high level of nucleotide sequence identity (91.3%) to SBMV-B. Unlike SBMV-B however, SBMV-BARK and SBMV-S contained four putative overlapping open reading frames, making them more similar in genome organization to the cowpea strain, SBMV-C. The possibility exists that mutations or even errors, that resulted in mis-identification of open reading frames, occurred in previously published information on nucleotide sequence and genomic organization for SBMV-B.

  15. Normalization of Complete Genome Characteristics: Application to Evolution from Primitive Organisms to Homo sapiens.

    PubMed

    Sorimachi, Kenji; Okayasu, Teiji; Ohhira, Shuji

    2015-04-01

    Normalized nucleotide and amino acid contents of complete genome sequences can be visualized as radar charts. The shapes of these charts depict the characteristics of an organism's genome. The normalized values calculated from the genome sequence theoretically exclude experimental errors. Further, because normalization is independent of both target size and kind, this procedure is applicable not only to single genes but also to whole genomes, which consist of a huge number of different genes. In this review, we discuss the applications of the normalization of the nucleotide and predicted amino acid contents of complete genomes to the investigation of genome structure and to evolutionary research from primitive organisms to Homo sapiens. Some of the results could never have been obtained from the analysis of individual nucleotide or amino acid sequences but were revealed only after the normalization of nucleotide and amino acid contents was applied to genome research. The discovery that genome structure was homogeneous was obtained only after normalization methods were applied to the nucleotide or predicted amino acid contents of genome sequences. Normalization procedures are also applicable to evolutionary research. Thus, normalization of the contents of whole genomes is a useful procedure that can help to characterize organisms.

  16. Detection of a new bat gammaherpesvirus in the Philippines.

    PubMed

    Watanabe, Shumpei; Ueda, Naoya; Iha, Koichiro; Masangkay, Joseph S; Fujii, Hikaru; Alviola, Phillip; Mizutani, Tetsuya; Maeda, Ken; Yamane, Daisuke; Walid, Azab; Kato, Kentaro; Kyuwa, Shigeru; Tohya, Yukinobu; Yoshikawa, Yasuhiro; Akashi, Hiroomi

    2009-08-01

    A new bat herpesvirus was detected in the spleen of an insectivorous bat (Hipposideros diadema, family Hipposideridae) collected on Panay Island, the Philippines. PCR analyses were performed using COnsensus-DEgenerate Hybrid Oligonucleotide Primers (CODEHOPs) targeting the herpesvirus DNA polymerase (DPOL) gene. Although we obtained PCR products with CODEHOPs, direct sequencing using the primers was not possible because of high degree of degeneracy. Direct sequencing technology developed in our rapid determination system of viral RNA sequences (RDV) was applied in this study, and a partial DPOL nucleotide sequence was determined. In addition, a partial gB gene nucleotide sequence was also determined using the same strategy. We connected the partial gB and DPOL sequences with long-distance PCR, and a 3741-bp nucleotide fragment, including the 3' part of the gB gene and the 5' part of the DPOL gene, was finally determined. Phylogenetic analysis showed that the sequence was novel and most similar to those of the subfamily Gammaherpesvirinae.

  17. Plant nitrogen regulatory P-PII polypeptides

    DOEpatents

    Coruzzi, Gloria M.; Lam, Hon-Ming; Hsieh, Ming-Hsiun

    2004-11-23

    The present invention generally relates to plant nitrogen regulatory PII gene (hereinafter P-PII gene), a gene involved in regulating plant nitrogen metabolism. The invention provides P-PII nucleotide sequences, expression constructs comprising said nucleotide sequences, and host cells and plants having said constructs and, optionally expressing the P-PII gene from said constructs. The invention also provides substantially pure P-PII proteins. The P-PII nucleotide sequences and constructs of the invention may be used to engineer organisms to overexpress wild-type or mutant P-PII regulatory protein. Engineered plants that overexpress or underexpress P-PII regulatory protein may have increased nitrogen assimilation capacity. Engineered organisms may be used to produce P-PII proteins which, in turn, can be used for a variety of purposes including in vitro screening of herbicides. P-PII nucleotide sequences have additional uses as probes for isolating additional genomic clones having the promoters of P-PII gene. P-PII promoters are light- and/or sucrose-inducible and may be advantageously used in genetic engineering of plants.

  18. Greater numbers of nucleotide substitutions are introduced into the genomic RNA of bovine viral diarrhea virus during acute infections of pregnant cattle than of non-pregnant cattle.

    PubMed

    Neill, John D; Newcomer, Benjamin W; Marley, Shonda D; Ridpath, Julia F; Givens, M Daniel

    2012-08-06

    Bovine viral diarrhea virus (BVDV) strains circulating in livestock herds show significant sequence variation. Conventional wisdom states that most sequence variation arises during acute infections in response to immune or other environmental pressures. A recent study showed that more nucleotide changes were introduced into the BVDV genomic RNA during the establishment of a single fetal persistent infection than following a series of acute infections of naïve cattle. However, it was not known if nucleotide changes were introduce when the virus crossed the placenta and infected the fetus or during the acute infection of the dam. The sequence of the open reading frame (ORF) from viruses isolated from four acutely infected pregnant heifers following exposure to persistently infected (PI) calves was compared to the sequences of the virus from the progenitor PI calf and the virus from the resulting progeny PI calf to determine when genetic change was introduced. This was compared to genetic change found in viruses isolated from a pregnant PI cow and its PI calf, and in three viruses isolated from acutely infected, non-pregnant cattle exposed to PI calves. Most genetic changes previously identified between the progenitor and progeny PI viruses were in place in the acute phase viruses isolated from the dams six days post-exposure to the progenitor PI calf. Additionally, each progeny PI virus had two to three unique nucleotide substitutions that were introduced in crossing the placenta and infection of the fetus. The nucleotide sequence of two acute phase viruses isolated from steers exposed to PI calves revealed that six and seven nucleotide changes were introduced during the acute infection. The sequence of the BVDV-2 virus isolated from an acute infection of a PI calf (BVDV-1a) co-housed with a BVDV-2 PI calf had ten nucleotides that were different from the progenitor PI virus. Finally, twenty nucleotide changes were identified in the PI virus of a calf born to a PI dam. These results demonstrate that nucleotide changes are introduced into the BVDV infecting pregnant cattle at rates of 2.3 to 8 fold higher then during the acute infection of non-pregnant animals.

  19. Human Genome Sequencing in Health and Disease

    PubMed Central

    Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

    2013-01-01

    Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

  20. Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

    PubMed Central

    Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

    2018-01-01

    Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence’s saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them. PMID:27896980

  1. Eukaryotic tRNAs fingerprint invertebrates vis-à-vis vertebrates.

    PubMed

    Mitra, Sanga; Das, Pijush; Samadder, Arpa; Das, Smarajit; Betai, Rupal; Chakrabarti, Jayprokas

    2015-01-01

    During translation, aminoacyl-tRNA synthetases recognize the identities of the tRNAs to charge them with their respective amino acids. The conserved identities of 58,244 eukaryotic tRNAs of 24 invertebrates and 45 vertebrates in genomic tRNA database were analyzed and their novel features extracted. The internal promoter sequences, namely, A-Box and B-Box, were investigated and evidence gathered that the intervention of optional nucleotides at 17a and 17b correlated with the optimal length of the A-Box. The presence of canonical transcription terminator sequences at the immediate vicinity of tRNA genes was ventured. Even though non-canonical introns had been reported in red alga, green alga, and nucleomorph so far, fairly motivating evidence of their existence emerged in tRNA genes of other eukaryotes. Non-canonical introns were seen to interfere with the internal promoters in two cases, questioning their transcription fidelity. In a first of its kind, phylogenetic constructs based on tRNA molecules delineated and built the trees of the vast and diverse invertebrates and vertebrates. Finally, two tRNA models representing the invertebrates and the vertebrates were drawn, by isolating the dominant consensus in the positional fluctuations of nucleotide compositions.

  2. Silicon nanowire sensor for DNA detection and sequencing: an ab initio simulation

    NASA Astrophysics Data System (ADS)

    Lu, Wenchang; Li, Yan; Hodak, Miroslav; Xiao, Zhongcan; Bernholc, Jerry

    Electrical sensors able to detect DNA replication and determine its sequence would enable fast and relatively cheap diagnosis of gene-related vulnerabilities and cancers. At present, it is already possible to electrically monitor DNA replication events using a Klenow fragment of polymerase I attached to a carbon nanotube. Since devices based on Si nanowires would be much easier to produce in quantity, we examine theoretically the sensitivity of a Si nanowire/Klenow fragment for electrical detection of nucleotide addition. A highly parallel real-space multigrid code is used for DFT-based non-equilibrium Green's function calculations involving up to 16,000 atoms, employing highly-accurate variationally-optimized localized orbitals. We find that the open and closed Klenow fragment configurations, prior and during nucleotide addition, respectively, screen the Si nanowire differently and result in a detectable current difference. The sensitivity is the largest in the subthreshold regime while the absolute current difference is maximized in the turn-on state. The sensitivity decreases with an increase of the nanowire size, as expected, but the current difference between different enzymatic states is nearly independent on the nanowire size up to 800 Å2 cross section.

  3. Intramolecular interactions in aminoacyl nucleotides: Implications regarding the origin of genetic coding and protein synthesis

    NASA Technical Reports Server (NTRS)

    Lacey, J. C., Jr.; Mullins, D. W., Jr.; Watkins, C. L.; Hall, L. M.

    1986-01-01

    Cellular organisms store information as sequences of nucleotides in double stranded DNA. This information is useless unless it can be converted into the active molecular species, protein. This is done in contemporary creatures first by transcription of one strand to give a complementary strand of mRNA. The sequence of nucleotides is then translated into a specific sequence of amino acids in a protein. Translation is made possible by a genetic coding system in which a sequence of three nucleotides codes for a specific amino acid. The origin and evolution of any chemical system can be understood through elucidation of the properties of the chemical entities which make up the system. There is an underlying logic to the coding system revealed by a correlation of the hydrophobicities of amino acids and their anticodonic nucleotides (i.e., the complement of the codon). Its importance lies in the fact that every amino acid going into protein synthesis must first be activated. This is universally accomplished with ATP. Past studies have concentrated on the chemistry of the adenylates, but more recently we have found, through the use of NMR, that we can observe intramolecular interactions even at low concentrations, between amino acid side chains and nucleotide base rings in these adenylates. The use of this type of compound thus affords a novel way of elucidating the manner in which amino acids and nucleotides interact with each other. In aqueous solution, when a hydrophobic amino acid is attached to the most hydrophobic nucleotide, AMP, a hydrophobic interaction takes place between the amino acid side chain and the adenine ring. The studies to be reported concern these hydrophobic interactions.

  4. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  5. The sequence specificity of UV-induced DNA damage in a systematically altered DNA sequence.

    PubMed

    Khoe, Clairine V; Chung, Long H; Murray, Vincent

    2018-06-01

    The sequence specificity of UV-induced DNA damage was investigated in a specifically designed DNA plasmid using two procedures: end-labelling and linear amplification. Absorption of UV photons by DNA leads to dimerisation of pyrimidine bases and produces two major photoproducts, cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). A previous study had determined that two hexanucleotide sequences, 5'-GCTC*AC and 5'-TATT*AA, were high intensity UV-induced DNA damage sites. The UV clone plasmid was constructed by systematically altering each nucleotide of these two hexanucleotide sequences. One of the main goals of this study was to determine the influence of single nucleotide alterations on the intensity of UV-induced DNA damage. The sequence 5'-GCTC*AC was designed to examine the sequence specificity of 6-4PPs and the highest intensity 6-4PP damage sites were found at 5'-GTTC*CC nucleotides. The sequence 5'-TATT*AA was devised to investigate the sequence specificity of CPDs and the highest intensity CPD damage sites were found at 5'-TTTT*CG nucleotides. It was proposed that the tetranucleotide DNA sequence, 5'-YTC*Y (where Y is T or C), was the consensus sequence for the highest intensity UV-induced 6-4PP adduct sites; while it was 5'-YTT*C for the highest intensity UV-induced CPD damage sites. These consensus tetranucleotides are composed entirely of consecutive pyrimidines and must have a DNA conformation that is highly productive for the absorption of UV photons. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.

  6. Information capacity of nucleotide sequences and its applications.

    PubMed

    Sadovsky, M G

    2006-05-01

    The information capacity of nucleotide sequences is defined through the specific entropy of frequency dictionary of a sequence determined with respect to another one containing the most probable continuations of shorter strings. This measure distinguishes a sequence both from a random one, and from ordered entity. A comparison of sequences based on their information capacity is studied. An order within the genetic entities is found at the length scale ranged from 3 to 8. Some other applications of the developed methodology to genetics, bioinformatics, and molecular biology are discussed.

  7. Nucleic acid constructs containing orthogonal site selective recombinases (OSSRs)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilmore, Joshua M.; Anderson, J. Christopher; Dueber, John E.

    The present invention provides for a recombinant nucleic acid comprising a nucleotide sequence comprising a plurality of constructs, wherein each construct independently comprises a nucleotide sequence of interest flanked by a pair of recombinase recognition sequences. Each pair of recombinase recognition sequences is recognized by a distinct recombinase. Optionally, each construct can, independently, further comprise one or more genes encoding a recombinase capable of recognizing the pair of recombinase recognition sequences of the construct. The recombinase can be an orthogonal (non-cross reacting), site-selective recombinase (OSSR).

  8. Complete genome sequence and phylogenetic analyses of an aquabirnavirus isolated from a diseased marbled eel culture in Taiwan.

    PubMed

    Wen, Chiu-Ming

    2017-08-01

    An aquabirnavirus was isolated from diseased marbled eels (Anguilla marmorata; MEIPNV1310) with gill haemorrhages and associated mortality. Its genome segment sequences were obtained through next-generation sequencing and compared with published aquabirnavirus sequences. The results indicated that the genome sequence of MEIPNV1310 contains segment A (3099 nucleotides) and segment B (2789 nucleotides). Phylogenetic analysis showed that MEIPNV1310 is closely related to the infectious pancreatic necrosis Ab strain within genogroup II. This genome sequence is beneficial for studying the geographic distribution and evolution of aquabirnaviruses.

  9. Unlinking the methylome pattern from nucleotide sequence, revealed by large-scale in vivo genome engineering and methylome editing in medaka fish

    PubMed Central

    Nakamura, Ryohei; Uno, Ayako; Kumagai, Masahiko; Fukushima, Hiroto S.; Morishita, Shinichi; Takeda, Hiroyuki

    2017-01-01

    The heavily methylated vertebrate genomes are punctuated by stretches of poorly methylated DNA sequences that usually mark gene regulatory regions. It is known that the methylation state of these regions confers transcriptional control over their associated genes. Given its governance on the transcriptome, cellular functions and identity, genome-wide DNA methylation pattern is tightly regulated and evidently predefined. However, how is the methylation pattern determined in vivo remains enigmatic. Based on in silico and in vitro evidence, recent studies proposed that the regional hypomethylated state is primarily determined by local DNA sequence, e.g., high CpG density and presence of specific transcription factor binding sites. Nonetheless, the dependency of DNA methylation on nucleotide sequence has not been carefully validated in vertebrates in vivo. Herein, with the use of medaka (Oryzias latipes) as a model, the sequence dependency of DNA methylation was intensively tested in vivo. Our statistical modeling confirmed the strong statistical association between nucleotide sequence pattern and methylation state in the medaka genome. However, by manipulating the methylation state of a number of genomic sequences and reintegrating them into medaka embryos, we demonstrated that artificially conferred DNA methylation states were predominantly and robustly maintained in vivo, regardless of their sequences and endogenous states. This feature was also observed in the medaka transgene that had passed across generations. Thus, despite the observed statistical association, nucleotide sequence was unable to autonomously determine its own methylation state in medaka in vivo. Our results apparently argue against the notion of the governance on the DNA methylation by nucleotide sequence, but instead suggest the involvement of other epigenetic factors in defining and maintaining the DNA methylation landscape. Further investigation in other vertebrate models in vivo will be needed for the generalization of our observations made in medaka. PMID:29267279

  10. DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest.

    PubMed

    Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang

    2018-01-05

    DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html.

  11. DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest

    PubMed Central

    Manavalan, Balachandran; Shin, Tae Hwan; Lee, Gwang

    2018-01-01

    DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html PMID:29416743

  12. Using expected sequence features to improve basecalling accuracy of amplicon pyrosequencing data.

    PubMed

    Rask, Thomas S; Petersen, Bent; Chen, Donald S; Day, Karen P; Pedersen, Anders Gorm

    2016-04-22

    Amplicon pyrosequencing targets a known genetic region and thus inherently produces reads highly anticipated to have certain features, such as conserved nucleotide sequence, and in the case of protein coding DNA, an open reading frame. Pyrosequencing errors, consisting mainly of nucleotide insertions and deletions, are on the other hand likely to disrupt open reading frames. Such an inverse relationship between errors and expectation based on prior knowledge can be used advantageously to guide the process known as basecalling, i.e. the inference of nucleotide sequence from raw sequencing data. The new basecalling method described here, named Multipass, implements a probabilistic framework for working with the raw flowgrams obtained by pyrosequencing. For each sequence variant Multipass calculates the likelihood and nucleotide sequence of several most likely sequences given the flowgram data. This probabilistic approach enables integration of basecalling into a larger model where other parameters can be incorporated, such as the likelihood for observing a full-length open reading frame at the targeted region. We apply the method to 454 amplicon pyrosequencing data obtained from a malaria virulence gene family, where Multipass generates 20 % more error-free sequences than current state of the art methods, and provides sequence characteristics that allow generation of a set of high confidence error-free sequences. This novel method can be used to increase accuracy of existing and future amplicon sequencing data, particularly where extensive prior knowledge is available about the obtained sequences, for example in analysis of the immunoglobulin VDJ region where Multipass can be combined with a model for the known recombining germline genes. Multipass is available for Roche 454 data at http://www.cbs.dtu.dk/services/MultiPass-1.0 , and the concept can potentially be implemented for other sequencing technologies as well.

  13. The nucleotide sequence and genome organization of Plasmopara halstedii virus.

    PubMed

    Heller-Dohmen, Marion; Göpfert, Jens C; Pfannstiel, Jens; Spring, Otmar

    2011-03-17

    Only very few viruses of Oomycetes have been studied in detail. Isometric virions were found in different isolates of the oomycete Plasmopara halstedii, the downy mildew pathogen of sunflower. However, complete nucleotide sequences and data on the genome organization were lacking. Viral RNA of different P. halstedii isolates was subjected to nucleotide sequencing and analysis of the viral genome. The N-terminal sequence of the viral coat protein was determined using Top-Down MALDI-TOF analysis. The complete nucleotide sequences of both single-stranded RNA segments (RNA1 and RNA2) were established. RNA1 consisted of 2793 nucleotides (nt) exclusive its 3' poly(A) tract and a single open-reading frame (ORF1) of 2745 nt. ORF1 was framed by a 5' untranslated region (5' UTR) of 18 nt and a 3' untranslated region (3' UTR) of 30 nt. ORF1 contained motifs of RNA-dependent RNA polymerases (RdRp) and showed similarities to RdRp of Scleropthora macrospora virus A (SmV A) and viruses within the Nodaviridae family. RNA2 consisted of 1526 nt exclusive its 3' poly(A) tract and a second ORF (ORF2) of 1128 nt. ORF2 coded for the single viral coat protein (CP) and was framed by a 5' UTR of 164 nt and a 3' UTR of 234 nt. The deduced amino acid sequence of ORF2 was verified by nano-LC-ESI-MS/MS experiments. Top-Down MALDI-TOF analysis revealed the N-terminal sequence of the CP. The N-terminal sequence represented a region within ORF2 suggesting a proteolytic processing of the CP in vivo. The CP showed similarities to CP of SmV A and viruses within the Tombusviridae family. Fragments of RNA1 (ca. 1.9 kb) and RNA2 (ca. 1.4 kb) were used to analyze the nucleotide sequence variation of virions in different P. halstedii isolates. Viral sequence variation was 0.3% or less regardless of their host's pathotypes, the geographical origin and the sensitivity towards the fungicide metalaxyl. The results showed the presence of a single and new virus type in different P. halstedii isolates. Insignificant viral sequence variation indicated that the virus did not account for differences in pathogenicity of the oomycete P. halstedii.

  14. [Isolation and characterization of a new glyphosate-resistant strain from extremely polluted environment].

    PubMed

    Sh, Jiying; Jin, Dan; Lu, Wei; Zhang, Xiaoyu; Zhang, Chao; Li, Liang; Ma, Ruiqiang; Xiao, Lei; Wang, Yiding; Lin, Min

    2008-06-01

    To isolate and characterize a glyphosate-resistant strain from extremely polluted environment. A glyphosate-resistant strain was isolated from extremely polluted soil taking glyphosate as the selection pressure. Its glyphosate resistance, growth optimal pH and antibiotic sensitivity were detected. Its morphology, cultural characteristics, physiological and biochemical properties, chemotaxonomy and 16S rDNA sequences were studied. Based on these results, the strain was identified according to the ninth edition of Bergey's manual of determinative bacteriology. The isolate was named SL06500. It could grow in M9 minimal medium containing up to 500 mmol/L glyphosate. The cell growth optimal pH of SL06500 was 4.0. It was resistant to ampicillin, kanamycin, tetracycline and chloromycetin. The 16S rDNA of SL06500 was amplified by PCR and sequenced. Compared with the published nucleotide sequence of 16S rDNA in NCBI (National Center for Biotechnology Information), SL06500 showed high identity with Achromobacter and Alcaligenes. Based on morphological, physiological and biochemical characteristics, the strain was identified as Alcaligenes xylosoxidans subsp.xylosoxidans SL06500 according to the ninth edition of Bergey's manual of determinative bacteriology. Strain SL06500 is worthy to be studied because of its high glyphosate resistance.

  15. The immediate upstream region of the 5′-UTR from the AUG start codon has a pronounced effect on the translational efficiency in Arabidopsis thaliana

    PubMed Central

    Kim, Younghyun; Lee, Goeun; Jeon, Eunhyun; Sohn, Eun ju; Lee, Yongjik; Kang, Hyangju; Lee, Dong wook; Kim, Dae Heon; Hwang, Inhwan

    2014-01-01

    The nucleotide sequence around the translational initiation site is an important cis-acting element for post-transcriptional regulation. However, it has not been fully understood how the sequence context at the 5′-untranslated region (5′-UTR) affects the translational efficiency of individual mRNAs. In this study, we provide evidence that the 5′-UTRs of Arabidopsis genes showing a great difference in the nucleotide sequence vary greatly in translational efficiency with more than a 200-fold difference. Of the four types of nucleotides, the A residue was the most favourable nucleotide from positions −1 to −21 of the 5′-UTRs in Arabidopsis genes. In particular, the A residue in the 5′-UTR from positions −1 to −5 was required for a high-level translational efficiency. In contrast, the T residue in the 5′-UTR from positions −1 to −5 was the least favourable nucleotide in translational efficiency. Furthermore, the effect of the sequence context in the −1 to −21 region of the 5′-UTR was conserved in different plant species. Based on these observations, we propose that the sequence context immediately upstream of the AUG initiation codon plays a crucial role in determining the translational efficiency of plant genes. PMID:24084084

  16. eShadow: A tool for comparing closely related sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ovcharenko, Ivan; Boffelli, Dario; Loots, Gabriela G.

    2004-01-15

    Primate sequence comparisons are difficult to interpret due to the high degree of sequence similarity shared between such closely related species. Recently, a novel method, phylogenetic shadowing, has been pioneered for predicting functional elements in the human genome through the analysis of multiple primate sequence alignments. We have expanded this theoretical approach to create a computational tool, eShadow, for the identification of elements under selective pressure in multiple sequence alignments of closely related genomes, such as in comparisons of human to primate or mouse to rat DNA. This tool integrates two different statistical methods and allows for the dynamic visualizationmore » of the resulting conservation profile. eShadow also includes a versatile optimization module capable of training the underlying Hidden Markov Model to differentially predict functional sequences. This module grants the tool high flexibility in the analysis of multiple sequence alignments and in comparing sequences with different divergence rates. Here, we describe the eShadow comparative tool and its potential uses for analyzing both multiple nucleotide and protein alignments to predict putative functional elements. The eShadow tool is publicly available at http://eshadow.dcode.org/« less

  17. Nucleotide Sequence and Genetic Structure of a Novel Carbaryl Hydrolase Gene (cehA) from Rhizobium sp. Strain AC100

    PubMed Central

    Hashimoto, Masayuki; Fukui, Mitsuru; Hayano, Kouichi; Hayatsu, Masahito

    2002-01-01

    Rhizobium sp. strain AC100, which is capable of degrading carbaryl (1-naphthyl-N-methylcarbamate), was isolated from soil treated with carbaryl. This bacterium hydrolyzed carbaryl to 1-naphthol and methylamine. Carbaryl hydrolase from the strain was purified to homogeneity, and its N-terminal sequence, molecular mass (82 kDa), and enzymatic properties were determined. The purified enzyme hydrolyzed 1-naphthyl acetate and 4-nitrophenyl acetate indicating that the enzyme is an esterase. We then cloned the carbaryl hydrolase gene (cehA) from the plasmid DNA of the strain and determined the nucleotide sequence of the 10-kb region containing cehA. No homologous sequences were found by a database homology search using the nucleotide and deduced amino acid sequences of the cehA gene. Six open reading frames including the cehA gene were found in the 10-kb region, and sequencing analysis shows that the cehA gene is flanked by two copies of insertion sequence-like sequence, suggesting that it makes part of a composite transposon. PMID:11872471

  18. The complete nucleotide sequence and genome organization of a novel betaflexivirus infecting Citrullus lanatus.

    PubMed

    Xin, Min; Zhang, Peipei; Liu, Wenwen; Ren, Yingdang; Cao, Mengji; Wang, Xifeng

    2017-10-01

    The complete nucleotide sequence of a novel positive single-stranded (+ss) RNA virus, tentatively named watermelon virus A (WVA), was determined using a combination of three methods: RNA sequencing, small RNA sequencing, and Sanger sequencing. The full genome of WVA is comprised of 8,372 nucleotides (nt), excluding the poly (A) tail, and contains four open reading frames (ORFs). The largest ORF, ORF1 encodes a putative replication-associated polyprotein (RP) with three conserved domains. ORF2 and ORF4 encode a movement protein (MP) and coat protein (CP), respectively. The putative product encoded by ORF3, of an estimated molecular mass of 25 kDa, has no significant similarity with other proteins. Identity and phylogenetic analysis indicate that WVA is a new virus, closely related to members of the family Betaflexiviridae. However, the final taxonomic allocation of WVA within the family is yet to be determined.

  19. Manipulation of lignin composition in plants using a tissue-specific promoter

    DOEpatents

    Chapple, Clinton C. S.

    2003-08-26

    The present invention relates to methods and materials in the field of molecular biology, the manipulation of the phenylpropanoid pathway and the regulation of proteins synthesis through plant genetic engineering. More particularly, the invention relates to the introduction of a foreign nucleotide sequence into a plant genome, wherein the introduction of the nucleotide sequence effects an increase in the syringyl content of the plant's lignin. In one specific aspect, the invention relates to methods for modifying the plant lignin composition in a plant cell by the introduction there into of a foreign nucleotide sequence comprising at issue specific plant promoter sequence and a sequence encoding an active ferulate-5-hydroxylase (F5H) enzyme. Plant transformants harboring an inventive promoter-F5H construct demonstrate increased levels of syringyl monomer residues in their lignin, rendering the polymer more readily delignified and, thereby, rendering the plant more readily pulped or digested.

  20. Dietary nitrogen alters codon bias and genome composition in parasitic microorganisms.

    PubMed

    Seward, Emily A; Kelly, Steven

    2016-11-15

    Genomes are composed of long strings of nucleotide monomers (A, C, G and T) that are either scavenged from the organism's environment or built from metabolic precursors. The biosynthesis of each nucleotide differs in atomic requirements with different nucleotides requiring different quantities of nitrogen atoms. However, the impact of the relative availability of dietary nitrogen on genome composition and codon bias is poorly understood. Here we show that differential nitrogen availability, due to differences in environment and dietary inputs, is a major determinant of genome nucleotide composition and synonymous codon use in both bacterial and eukaryotic microorganisms. Specifically, low nitrogen availability species use nucleotides that require fewer nitrogen atoms to encode the same genes compared to high nitrogen availability species. Furthermore, we provide a novel selection-mutation framework for the evaluation of the impact of metabolism on gene sequence evolution and show that it is possible to predict the metabolic inputs of related organisms from an analysis of the raw nucleotide sequence of their genes. Taken together, these results reveal a previously hidden relationship between cellular metabolism and genome evolution and provide new insight into how genome sequence evolution can be influenced by adaptation to different diets and environments.

  1. Alfalfa virus S, a new species in the family Alphaflexiviridae

    USDA-ARS?s Scientific Manuscript database

    A new species of the family Alphaflexiviridae provisionally named alfalfa virus S (AVS) was discovered in alfalfa samples originating from Sudan. A complete nucleotide sequence of the viral genome consisting of 8,349 nucleotides excluding the 3’ poly(A) tail was determined by high throughput sequenc...

  2. Developing Single Nucleotide Polymorphism (SNP) markers from transcriptome sequences for the identification of longan (Dimocarpus longan) germplasm

    USDA-ARS?s Scientific Manuscript database

    Longan (Dimocarpus longan Lour.) is an important tropical fruit tree crop. Accurate varietal identification is essential for germplasm management and breeding. Using longan transcriptome sequences from public databases, we developed single nucleotide polymorphism (SNP) markers; validated 60 SNPs in...

  3. Molecular Characterization of an Avian Astrovirus

    PubMed Central

    Koci, Matthew D.; Seal, Bruce S.; Schultz-Cherry, Stacey

    2000-01-01

    Astroviruses are known to cause enteric disease in several animal species, including turkeys. However, only human astroviruses have been well characterized at the nucleotide level. Herein we report the nucleotide sequence, genomic organization, and predicted amino acid sequence of a turkey astrovirus isolated from poults with an emerging enteric disease. PMID:10846102

  4. A new single-nucleotide polymorphisms database for rainbow trout generated through whole genome resequencing of selected samples

    USDA-ARS?s Scientific Manuscript database

    Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout, SNP discovery has been done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL), RNA sequencing, and whole...

  5. Nucleotide sequencing and identification of some wild mushrooms.

    PubMed

    Das, Sudip Kumar; Mandal, Aninda; Datta, Animesh K; Gupta, Sudha; Paul, Rita; Saha, Aditi; Sengupta, Sonali; Dubey, Priyanka Kumari

    2013-01-01

    The rDNA-ITS (Ribosomal DNA Internal Transcribed Spacers) fragment of the genomic DNA of 8 wild edible mushrooms (collected from Eastern Chota Nagpur Plateau of West Bengal, India) was amplified using ITS1 (Internal Transcribed Spacers 1) and ITS2 primers and subjected to nucleotide sequence determination for identification of mushrooms as mentioned. The sequences were aligned using ClustalW software program. The aligned sequences revealed identity (homology percentage from GenBank data base) of Amanita hemibapha [CN (Chota Nagpur) 1, % identity 99 (JX844716.1)], Amanita sp. [CN 2, % identity 98 (JX844763.1)], Astraeus hygrometricus [CN 3, % identity 87 (FJ536664.1)], Termitomyces sp. [CN 4, % identity 90 (JF746992.1)], Termitomyces sp. [CN 5, % identity 99 (GU001667.1)], T. microcarpus [CN 6, % identity 82 (EF421077.1)], Termitomyces sp. [CN 7, % identity 76 (JF746993.1)], and Volvariella volvacea [CN 8, % identity 100 (JN086680.1)]. Although out of 8 mushrooms 4 could be identified up to species level, the nucleotide sequences of the rest may be relevant to further characterization. A phylogenetic tree is constructed using Neighbor-Joining method showing interrelationship between/among the mushrooms. The determined nucleotide sequences of the mushrooms may provide additional information enriching GenBank database aiding to molecular taxonomy and facilitating its domestication and characterization for human benefits.

  6. Sequence variation and phylogenetic analysis of envelope glycoprotein of hepatitis G virus.

    PubMed

    Lim, M Y; Fry, K; Yun, A; Chong, S; Linnen, J; Fung, K; Kim, J P

    1997-11-01

    A transfusion-transmissible agent provisionally designated hepatitis G virus (HGV) was recently identified. In this study, we examined the variability of the HGV genome by analysing sequences in the putative envelope region from 72 isolates obtained from diverse geographical sources. The 1561 nucleotide sequence of the E1/E2/NS2a region of HGV was determined from 12 isolates, and compared with three published sequences. The most variability was observed in 400 nucleotides at the N terminus of E2. We next analysed this 400 nucleotide envelope variable region (EV) from an additional 60 HGV isolates. This sequence varied considerably among the 75 isolates, with overall identity ranging from 79.3% to 99.5% at the nucleotide level, and from 83.5% to 100% at the amino acid level. However, hypervariable regions were not identified. Phylogenetic analyses indicated that the 75 HGV isolates belong to a single genotype. A single-tier distribution of evolutionary distances was observed among the 15 E1/E2/NS2a sequences and the 75 EV sequences. In contrast, 11 isolates of HCV were analysed and showed a three-tiered distribution, representing genotypes, subtypes, and isolates. The 75 isolates of HGV fell into four clusters on the phylogenetic tree. Tight geographical clustering was observed among the HGV isolates from Japan and Korea.

  7. Ariadne: a database search engine for identification and chemical analysis of RNA using tandem mass spectrometry data.

    PubMed

    Nakayama, Hiroshi; Akiyama, Misaki; Taoka, Masato; Yamauchi, Yoshio; Nobe, Yuko; Ishikawa, Hideaki; Takahashi, Nobuhiro; Isobe, Toshiaki

    2009-04-01

    We present here a method to correlate tandem mass spectra of sample RNA nucleolytic fragments with an RNA nucleotide sequence in a DNA/RNA sequence database, thereby allowing tandem mass spectrometry (MS/MS)-based identification of RNA in biological samples. Ariadne, a unique web-based database search engine, identifies RNA by two probability-based evaluation steps of MS/MS data. In the first step, the software evaluates the matches between the masses of product ions generated by MS/MS of an RNase digest of sample RNA and those calculated from a candidate nucleotide sequence in a DNA/RNA sequence database, which then predicts the nucleotide sequences of these RNase fragments. In the second step, the candidate sequences are mapped for all RNA entries in the database, and each entry is scored for a function of occurrences of the candidate sequences to identify a particular RNA. Ariadne can also predict post-transcriptional modifications of RNA, such as methylation of nucleotide bases and/or ribose, by estimating mass shifts from the theoretical mass values. The method was validated with MS/MS data of RNase T1 digests of in vitro transcripts. It was applied successfully to identify an unknown RNA component in a tRNA mixture and to analyze post-transcriptional modification in yeast tRNA(Phe-1).

  8. PCV: An Alignment Free Method for Finding Homologous Nucleotide Sequences and its Application in Phylogenetic Study.

    PubMed

    Kumar, Rajnish; Mishra, Bharat Kumar; Lahiri, Tapobrata; Kumar, Gautam; Kumar, Nilesh; Gupta, Rahul; Pal, Manoj Kumar

    2017-06-01

    Online retrieval of the homologous nucleotide sequences through existing alignment techniques is a common practice against the given database of sequences. The salient point of these techniques is their dependence on local alignment techniques and scoring matrices the reliability of which is limited by computational complexity and accuracy. Toward this direction, this work offers a novel way for numerical representation of genes which can further help in dividing the data space into smaller partitions helping formation of a search tree. In this context, this paper introduces a 36-dimensional Periodicity Count Value (PCV) which is representative of a particular nucleotide sequence and created through adaptation from the concept of stochastic model of Kolekar et al. (American Institute of Physics 1298:307-312, 2010. doi: 10.1063/1.3516320 ). The PCV construct uses information on physicochemical properties of nucleotides and their positional distribution pattern within a gene. It is observed that PCV representation of gene reduces computational cost in the calculation of distances between a pair of genes while being consistent with the existing methods. The validity of PCV-based method was further tested through their use in molecular phylogeny constructs in comparison with that using existing sequence alignment methods.

  9. Sequence of rat alpha- and gamma-casein mRNAs: evolutionary comparison of the calcium-dependent rat casein multigene family.

    PubMed Central

    Hobbs, A A; Rosen, J M

    1982-01-01

    The complete sequences of rat alpha- and gamma-casein mRNAs have been determined. The 1402-nucleotide alpha- and 864-nucleotide gamma-casein mRNAs both encode 15 amino acid signal peptides and mature proteins of 269 and 164 residues, respectively. Considerable homology between the 5' non-coding regions, and the regions encoding the signal peptides and the phosphorylation sites, in these mRNAs as compared to several other rodent casein mRNAs, was observed. Significant homology was also detected between rat alpha- and bovine alpha s1-casein. Comparison of the rodent and bovine sequences suggests that the caseins evolved at about the time of the appearance of the primitive mammals. This may have occurred by intragenic duplication of a nucleotide sequence encoding a primitive phosphorylation site, -(Ser)n-Glu-Glu-, and intergenic duplication resulting in the small casein multigene family. A unique feature of the rat alpha-casein sequence is an insertion in the coding region containing 10 repeated elements of 18 nucleotides each. This insertion appears to have occurred 7-12 million years ago, just prior to the divergence of rat and mouse. Images PMID:6298707

  10. EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A

    PubMed Central

    Ndhlovu, Andrew; Durand, Pierre M.; Hazelhurst, Scott

    2015-01-01

    The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. Database URL: http://www.bioinf.wits.ac.za/software/fire/evodb PMID:26140928

  11. EvoDB: a database of evolutionary rate profiles, associated protein domains and phylogenetic trees for PFAM-A.

    PubMed

    Ndhlovu, Andrew; Durand, Pierre M; Hazelhurst, Scott

    2015-01-01

    The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases. © The Author(s) 2015. Published by Oxford University Press.

  12. DNA Sequence-Dependent Ionic Currents in Ultra-Small Solid-State Nanopores†

    PubMed Central

    Comer, Jeffrey

    2016-01-01

    Measurements of ionic currents through nanopores partially blocked by DNA have emerged as a powerful method for characterization of the DNA nucleotide sequence. Although the effect of the nucleotide sequence on the nanopore blockade current has been experimentally demonstrated, prediction and interpretation of such measurements remain a formidable challenge. Using atomic resolution computational approaches, here we show how the sequence, molecular conformation, and pore geometry affect the blockade ionic current in model solid-state nanopores. We demonstrate that the blockade current from a DNA molecule is determined by the chemical identities and conformations of at least three consecutive nucleotides. We find the blockade currents produced by the nucleotide triplets to vary considerably with their nucleotide sequence despite having nearly identical molecular conformations. Encouragingly, we find blockade current differences as large as 25% for single-base substitutions in ultra small (1.6 nm × 1.1 nm cross section; 2 nm length) solid-state nanopores. Despite the complex dependence of the blockade current on the sequence and conformation of the DNA triplets, we find that, under many conditions, the number of thymine bases is positively correlated with the current, whereas the number of purine bases and the presence of both purine and pyrimidines in the triplet are negatively correlated with the current. Based on these observations, we construct a simple theoretical model that relates the ion current to the base content of a solid-state nanopore. Furthermore, we show that compact conformations of DNA in narrow pores provide the greatest signal-to-noise ratio for single base detection, whereas reduction of the nanopore length increases the ionic current noise. Thus, the sequence dependence of nanopore blockade current can be theoretically rationalized, although the predictions will likely need to be customized for each nanopore type. PMID:27103233

  13. SeqReporter: automating next-generation sequencing result interpretation and reporting workflow in a clinical laboratory.

    PubMed

    Roy, Somak; Durso, Mary Beth; Wald, Abigail; Nikiforov, Yuri E; Nikiforova, Marina N

    2014-01-01

    A wide repertoire of bioinformatics applications exist for next-generation sequencing data analysis; however, certain requirements of the clinical molecular laboratory limit their use: i) comprehensive report generation, ii) compatibility with existing laboratory information systems and computer operating system, iii) knowledgebase development, iv) quality management, and v) data security. SeqReporter is a web-based application developed using ASP.NET framework version 4.0. The client-side was designed using HTML5, CSS3, and Javascript. The server-side processing (VB.NET) relied on interaction with a customized SQL server 2008 R2 database. Overall, 104 cases (1062 variant calls) were analyzed by SeqReporter. Each variant call was classified into one of five report levels: i) known clinical significance, ii) uncertain clinical significance, iii) pending pathologists' review, iv) synonymous and deep intronic, and v) platform and panel-specific sequence errors. SeqReporter correctly annotated and classified 99.9% (859 of 860) of sequence variants, including 68.7% synonymous single-nucleotide variants, 28.3% nonsynonymous single-nucleotide variants, 1.7% insertions, and 1.3% deletions. One variant of potential clinical significance was re-classified after pathologist review. Laboratory information system-compatible clinical reports were generated automatically. SeqReporter also facilitated quality management activities. SeqReporter is an example of a customized and well-designed informatics solution to optimize and automate the downstream analysis of clinical next-generation sequencing data. We propose it as a model that may envisage the development of a comprehensive clinical informatics solution. Copyright © 2014 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  14. VirVarSeq: a low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering.

    PubMed

    Verbist, Bie M P; Thys, Kim; Reumers, Joke; Wetzels, Yves; Van der Borght, Koen; Talloen, Willem; Aerssens, Jeroen; Clement, Lieven; Thas, Olivier

    2015-01-01

    In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical noise. For Illumina sequencing, single base substitutions are the main error source and impede powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores (Qs) that are useful for differentiating errors from the real low-frequency mutations. A variant calling tool, Q-cpileup, is proposed, which exploits the Qs of nucleotides in a filtering strategy to increase specificity. The tool is imbedded in an open-source pipeline, VirVarSeq, which allows variant calling starting from fastq files. Using both plasmid mixtures and clinical samples, we show that Q-cpileup is able to reduce the number of false-positive findings. The filtering strategy is adaptive and provides an optimized threshold for individual samples in each sequencing run. Additionally, linkage information is kept between single-nucleotide polymorphisms as variants are called at the codon level. This enables virologists to have an immediate biological interpretation of the reported variants with respect to their antiviral drug responses. A comparison with existing SNP caller tools reveals that calling variants at the codon level with Q-cpileup results in an outstanding sensitivity while maintaining a good specificity for variants with frequencies down to 0.5%. The VirVarSeq is available, together with a user's guide and test data, at sourceforge: http://sourceforge.net/projects/virtools/?source=directory. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. [Replication of Streptomyces plasmids: the DNA nucleotide sequence of plasmid pSB 24.2].

    PubMed

    Bolotin, A P; Sorokin, A V; Aleksandrov, N N; Danilenko, V N; Kozlov, Iu I

    1985-11-01

    The nucleotide sequence of DNA in plasmid pSB 24.2, a natural deletion derivative of plasmid pSB 24.1 isolated from S. cyanogenus was studied. The plasmid amounted by its size to 3706 nucleotide pairs. The G-C composition was equal to 73 per cent. The analysis of the DNA structure in plasmid pSB 24.2 revealed the protein-encoding sequence of DNA, the continuity of which was significant for replication of the plasmid containing more than 1300 nucleotide pairs. The analysis also revealed two A-T-rich areas of DNA, the G-C composition of which was less than 55 per cent and a DNA area with a branched pin structure. The results may be of value in investigation of plasmid replication in actinomycetes and experimental cloning of DNA with this plasmid as a vector.

  16. Sequence-specific bias correction for RNA-seq data using recurrent neural networks.

    PubMed

    Zhang, Yao-Zhong; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru

    2017-01-25

    The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures.

  17. pDHS-SVM: A prediction method for plant DNase I hypersensitive sites based on support vector machine.

    PubMed

    Zhang, Shanxin; Zhou, Zhiping; Chen, Xinmeng; Hu, Yong; Yang, Lindong

    2017-08-07

    DNase I hypersensitive sites (DHSs) are accessible chromatin regions hypersensitive to cleavages by DNase I endonucleases. DHSs are indicative of cis-regulatory DNA elements (CREs), all of which play important roles in global gene expression regulation. It is helpful for discovering CREs by recognition of DHSs in genome. To accelerate the investigation, it is an important complement to develop cost-effective computational methods to identify DHSs. However, there is a lack of tools used for identifying DHSs in plant genome. Here we presented pDHS-SVM, a computational predictor to identify plant DHSs. To integrate the global sequence-order information and local DNA properties, reverse complement kmer and dinucleotide-based auto covariance of DNA sequences were applied to construct the feature space. In this work, fifteen physical-chemical properties of dinucleotides were used and Support Vector Machine (SVM) was employed. To further improve the performance of the predictor and extract an optimized subset of nucleotide physical-chemical properties positive for the DHSs, a heuristic nucleotide physical-chemical property selection algorithm was introduced. With the optimized subset of properties, experimental results of Arabidopsis thaliana and rice (Oryza sativa) showed that pDHS-SVM could achieve accuracies up to 87.00%, and 85.79%, respectively. The results indicated the effectiveness of proposed method for predicting DHSs. Furthermore, pDHS-SVM could provide a helpful complement for predicting CREs in plant genome. Our implementation of the novel proposed method pDHS-SVM is freely available as source code, at https://github.com/shanxinzhang/pDHS-SVM. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. Promoter for Sindbis virus RNA-dependent subgenomic RNA transcription.

    PubMed

    Levis, R; Schlesinger, S; Huang, H V

    1990-04-01

    Sindbis virus is a positive-strand RNA enveloped virus, a member of the Alphavirus genus of the Togaviridae family. Two species of mRNA are synthesized in cells infected with Sindbis virus; one, the 49S RNA, is the genomic RNA; the other, the 26S RNA, is a subgenomic RNA that is identical in sequence to the 3' one-third of the genomic RNA. Ou et al. (J.-H. Ou, C. M. Rice, L. Dalgarno, E. G. Strauss, and J. H. Strauss, Proc. Natl. Acad. Sci. USA 79:5235-5239, 1982) identified a highly conserved region 19 nucleotides upstream and 2 nucleotides downstream from the start of the 26S RNA and proposed that in the negative-strand template, these nucleotides compose the promoter for directing the synthesis of the subgenomic RNA. Defective interfering (DI) RNAs of Sindbis virus were used to test this proposal. A 227-nucleotide sequence encompassing 98 nucleotides upstream and 117 nucleotides downstream from the start site of the Sindbis virus subgenomic RNA was inserted into a DI genome. The DI RNA containing the insert was replicated and packaged in the presence of helper virus, and cells infected with these DI particles produced a subgenomic RNA of the size and sequence expected if the promoter was functional. The initiating nucleotide was identical to that used for Sindbis virus subgenomic mRNA synthesis. Deletion analysis showed that the minimal region required to detect transcription of a subgenomic RNA from the negative-strand template of a DI RNA was 18 or 19 nucleotides upstream and 5 nucleotides downstream from the start of the subgenomic RNA.

  19. Design and characterization of a nanopore-coupled polymerase for single-molecule DNA sequencing by synthesis on an electrode array

    PubMed Central

    Stranges, P. Benjamin; Palla, Mirkó; Kalachikov, Sergey; Nivala, Jeff; Dorwart, Michael; Trans, Andrew; Kumar, Shiv; Porel, Mintu; Chien, Minchen; Tao, Chuanjuan; Morozova, Irina; Li, Zengmin; Shi, Shundi; Aberra, Aman; Arnold, Cleoma; Yang, Alexander; Aguirre, Anne; Harada, Eric T.; Korenblum, Daniel; Pollard, James; Bhat, Ashwini; Gremyachinskiy, Dmitriy; Bibillo, Arek; Chen, Roger; Davis, Randy; Russo, James J.; Fuller, Carl W.; Roever, Stefan; Ju, Jingyue; Church, George M.

    2016-01-01

    Scalable, high-throughput DNA sequencing is a prerequisite for precision medicine and biomedical research. Recently, we presented a nanopore-based sequencing-by-synthesis (Nanopore-SBS) approach, which used a set of nucleotides with polymer tags that allow discrimination of the nucleotides in a biological nanopore. Here, we designed and covalently coupled a DNA polymerase to an α-hemolysin (αHL) heptamer using the SpyCatcher/SpyTag conjugation approach. These porin–polymerase conjugates were inserted into lipid bilayers on a complementary metal oxide semiconductor (CMOS)-based electrode array for high-throughput electrical recording of DNA synthesis. The designed nanopore construct successfully detected the capture of tagged nucleotides complementary to a DNA base on a provided template. We measured over 200 tagged-nucleotide signals for each of the four bases and developed a classification method to uniquely distinguish them from each other and background signals. The probability of falsely identifying a background event as a true capture event was less than 1.2%. In the presence of all four tagged nucleotides, we observed sequential additions in real time during polymerase-catalyzed DNA synthesis. Single-polymerase coupling to a nanopore, in combination with the Nanopore-SBS approach, can provide the foundation for a low-cost, single-molecule, electronic DNA-sequencing platform. PMID:27729524

  20. Molecular detection of viral agents in free-ranging and captive neotropical felids in Brazil.

    PubMed

    Furtado, Mariana M; Taniwaki, Sueli A; de Barros, Iracema N; Brandão, Paulo E; Catão-Dias, José L; Cavalcanti, Sandra; Cullen, Laury; Filoni, Claudia; Jácomo, Anah T de Almeida; Jorge, Rodrigo S P; Silva, Nairléia Dos Santos; Silveira, Leandro; Ferreira Neto, José S

    2017-09-01

    We describe molecular testing for felid alphaherpesvirus 1 (FHV-1), carnivore protoparvovirus 1 (CPPV-1), feline calicivirus (FCV), alphacoronavirus 1 (feline coronavirus [FCoV]), feline leukemia virus (FeLV), feline immunodeficiency virus (FIV), and canine distemper virus (CDV) in whole blood samples of 109 free-ranging and 68 captive neotropical felids from Brazil. Samples from 2 jaguars ( Panthera onca) and 1 oncilla ( Leopardus tigrinus) were positive for FHV-1; 2 jaguars, 1 puma ( Puma concolor), and 1 jaguarundi ( Herpairulus yagouaroundi) tested positive for CPPV-1; and 1 puma was positive for FIV. Based on comparison of 103 nucleotides of the UL24-UL25 gene, the FHV-1 sequences were 99-100% similar to the FHV-1 strain of domestic cats. Nucleotide sequences of CPPV-1 were closely related to sequences detected in other wild carnivores, comparing 294 nucleotides of the VP1 gene. The FIV nucleotide sequence detected in the free-ranging puma, based on comparison of 444 nucleotides of the pol gene, grouped with other lentiviruses described in pumas, and had 82.4% identity with a free-ranging puma from Yellowstone Park and 79.5% with a captive puma from Brazil. Our data document the circulation of FHV-1, CPPV-1, and FIV in neotropical felids in Brazil.

  1. A space-efficient algorithm for local similarities.

    PubMed

    Huang, X Q; Hardison, R C; Miller, W

    1990-10-01

    Existing dynamic-programming algorithms for identifying similar regions of two sequences require time and space proportional to the product of the sequence lengths. Often this space requirement is more limiting than the time requirement. We describe a dynamic-programming local-similarity algorithm that needs only space proportional to the sum of the sequence lengths. The method can also find repeats within a single long sequence. To illustrate the algorithm's potential, we discuss comparison of a 73,360 nucleotide sequence containing the human beta-like globin gene cluster and a corresponding 44,594 nucleotide sequence for rabbit, a problem well beyond the capabilities of other dynamic-programming software.

  2. RoboOligo: software for mass spectrometry data to support manual and de novo sequencing of post-transcriptionally modified ribonucleic acids

    PubMed Central

    Sample, Paul J.; Gaston, Kirk W.; Alfonzo, Juan D.; Limbach, Patrick A.

    2015-01-01

    Ribosomal ribonucleic acid (RNA), transfer RNA and other biological or synthetic RNA polymers can contain nucleotides that have been modified by the addition of chemical groups. Traditional Sanger sequencing methods cannot establish the chemical nature and sequence of these modified-nucleotide containing oligomers. Mass spectrometry (MS) has become the conventional approach for determining the nucleotide composition, modification status and sequence of modified RNAs. Modified RNAs are analyzed by MS using collision-induced dissociation tandem mass spectrometry (CID MS/MS), which produces a complex dataset of oligomeric fragments that must be interpreted to identify and place modified nucleosides within the RNA sequence. Here we report the development of RoboOligo, an interactive software program for the robust analysis of data generated by CID MS/MS of RNA oligomers. There are three main functions of RoboOligo: (i) automated de novo sequencing via the local search paradigm. (ii) Manual sequencing with real-time spectrum labeling and cumulative intensity scoring. (iii) A hybrid approach, coined ‘variable sequencing’, which combines the user intuition of manual sequencing with the high-throughput sampling of automated de novo sequencing. PMID:25820423

  3. Molecular characterization of a novel rhabdovirus infecting blackcurrant identified by high-throughput sequencing.

    PubMed

    Wu, L-P; Yang, T; Liu, H-W; Postman, J; Li, R

    2018-05-01

    A large contig with sequence similarities to several nucleorhabdoviruses was identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genome sequence of this new nucleorhabdovirus is 14,432 nucleotides long. Its genomic organization is very similar to those of unsegmented plant rhabdoviruses, containing six open reading frames in the order 3'-N-P-P3-M-G-L-5. The virus, which is provisionally named "black currant-associated rhabdovirus", is 41-52% identical in its genome nucleotide sequence to other nucleorhabdoviruses and may represent a new species in the genus Nucleorhabdovirus.

  4. Differential sequence diversity at merozoite surface protein-1 locus of Plasmodium knowlesi from humans and macaques in Thailand.

    PubMed

    Putaporntip, Chaturong; Thongaree, Siriporn; Jongwutiwes, Somchai

    2013-08-01

    To determine the genetic diversity and potential transmission routes of Plasmodium knowlesi, we analyzed the complete nucleotide sequence of the gene encoding the merozoite surface protein-1 of this simian malaria (Pkmsp-1), an asexual blood-stage vaccine candidate, from naturally infected humans and macaques in Thailand. Analysis of Pkmsp-1 sequences from humans (n=12) and monkeys (n=12) reveals five conserved and four variable domains. Most nucleotide substitutions in conserved domains were dimorphic whereas three of four variable domains contained complex repeats with extensive sequence and size variation. Besides purifying selection in conserved domains, evidence of intragenic recombination scattering across Pkmsp-1 was detected. The number of haplotypes, haplotype diversity, nucleotide diversity and recombination sites of human-derived sequences exceeded that of monkey-derived sequences. Phylogenetic networks based on concatenated conserved sequences of Pkmsp-1 displayed a character pattern that could have arisen from sampling process or the presence of two independent routes of P. knowlesi transmission, i.e. from macaques to human and from human to humans in Thailand. Copyright © 2013 Elsevier B.V. All rights reserved.

  5. First Complete Genome Sequence of an Isolate of Tomato Mottle Mosaic Virus Infecting Plants of Solanum lycopersicum in South America.

    PubMed

    Nagai, Alice; Duarte, Lígia M L; Chaves, Alexandre L R; Alexandre, Maria A V; Ramos-González, Pedro L; Chabi-Jesus, Camila; Harakava, Ricardo; Dos Santos, Déborah Y A C

    2018-05-10

    The complete nucleotide sequence of an isolate of tomato mottle mosaic virus (ToMMV) was determined. The virus, originally isolated from symptomatic tomato plants found in a county near the city of São Paulo, Brazil, has a genome with 99% nucleotide sequence identity with ToMMV from Mexico, China, Spain, and the United States. Copyright © 2018 Nagai et al.

  6. Use of signal sequences as an in situ removable sequence element to stimulate protein synthesis in cell-free extracts

    PubMed Central

    Ahn, Jin-Ho; Hwang, Mi-Yeon; Lee, Kyung-Ho; Choi, Cha-Yong; Kim, Dong-Myung

    2007-01-01

    This study developed a method to boost the expression of recombinant proteins in a cell-free protein synthesis system without leaving additional amino acid residues. It was found that the nucleotide sequences of the signal peptides serve as an efficient downstream box to stimulate protein synthesis when they were fused upstream of the target genes. The extent of stimulation was critically affected by the identity of the second codons of the signal sequences. Moreover, the yield of the synthesized protein was enhanced by as much as 10 times in the presence of an optimal second codon. The signal peptides were in situ cleaved and the target proteins were produced in their native sizes by carrying out the cell-free synthesis reactions in the presence of Triton X-100, most likely through the activation of signal peptidase in the S30 extract. The amplification of the template DNA and the addition of the signal sequences were accomplished by PCR. Hence, elevated levels of recombinant proteins were generated within several hours. PMID:17185295

  7. Transcription Factor Map Alignment of Promoter Regions

    PubMed Central

    Blanco, Enrique; Messeguer, Xavier; Smith, Temple F; Guigó, Roderic

    2006-01-01

    We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments. PMID:16733547

  8. A resource of single-nucleotide polymorphisms for rainbow trout generated by restriction-site associated DNA sequencing of doubled haploids

    USDA-ARS?s Scientific Manuscript database

    Salmonid genomes are considered to be in a pseudo-tetraploid state as a result of an evolutionarily recent genome duplication event. This situation complicates single nucleotide polymorphism (SNP) discovery in rainbow trout as many putative SNPs are actually paralogous sequence variants (PSVs) and ...

  9. Nucleotide sequencing and serological evidence that the recently recognized deer tick virus is a genotype of Powassan virus.

    PubMed

    Beasley, D W; Suderman, M T; Holbrook, M R; Barrett, A D

    2001-11-05

    Deer tick virus (DTV) is a recently recognized North American virus isolated from Ixodes dammini ticks. Nucleotide sequencing of fragments of structural and non-structural protein genes suggested that this virus was most closely related to the tick-borne flavivirus Powassan (POW), which causes potentially fatal encephalitis in humans. To determine whether DTV represents a new and distinct member of the Flavivirus genus of the family Flaviviridae, we sequenced the structural protein genes and 5' and 3' non-coding regions of this virus. In addition, we compared the reactivity of DTV and POW in hemagglutination inhibition tests with a panel of polyclonal and monoclonal antisera, and performed cross-neutralization experiments using anti-DTV antisera. Nucleotide sequencing revealed a high degree of homology between DTV and POW at both nucleotide (>80% homology) and amino acid (>90% homology) levels, and the two viruses were indistinguishable in serological assays and mouse neuroinvasiveness. On the basis of these results, we suggest that DTV should be classified as a genotype of POW virus.

  10. Tidying Up International Nucleotide Sequence Databases: Ecological, Geographical and Sequence Quality Annotation of ITS Sequences of Mycorrhizal Fungi

    PubMed Central

    Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R. Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M.; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

    2011-01-01

    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi. PMID:21949797

  11. Gause's Principle and the Effect of Resource Partitioning on the Dynamical Coexistence of Replicating Templates

    PubMed Central

    Szilágyi, András; Zachar, István; Szathmáry, Eörs

    2013-01-01

    Models of competitive template replication, although basic for replicator dynamics and primordial evolution, have not yet taken different sequences explicitly into account, neither have they analyzed the effect of resource partitioning (feeding on different resources) on coexistence. Here we show by analytical and numerical calculations that Gause's principle of competitive exclusion holds for template replicators if resources (nucleotides) affect growth linearly and coexistence is at fixed point attractors. Cases of complementary or homologous pairing between building blocks with parallel or antiparallel strands show no deviation from the rule that the nucleotide compositions of stably coexisting species must be different and there cannot be more coexisting replicator species than nucleotide types. Besides this overlooked mechanism of template coexistence we show also that interesting sequence effects prevail as parts of sequences that are copied earlier affect coexistence more strongly due to the higher concentration of the corresponding replication intermediates. Template and copy always count as one species due their constraint of strict stoichiometric coupling. Stability of fixed-point coexistence tends to decrease with the length of sequences, although this effect is unlikely to be detrimental for sequences below 100 nucleotides. In sum, resource partitioning (niche differentiation) is the default form of competitive coexistence for replicating templates feeding on a cocktail of different nucleotides, as it may have been the case in the RNA world. Our analysis of different pairing and strand orientation schemes is relevant for artificial and potentially astrobiological genetics. PMID:23990769

  12. Genome Survey Sequencing of Luffa Cylindrica L. and Microsatellite High Resolution Melting (SSR-HRM) Analysis for Genetic Relationship of Luffa Genotypes.

    PubMed

    An, Jianyu; Yin, Mengqi; Zhang, Qin; Gong, Dongting; Jia, Xiaowen; Guan, Yajing; Hu, Jin

    2017-09-11

    Luffa cylindrica (L.) Roem. is an economically important vegetable crop in China. However, the genomic information on this species is currently unknown. In this study, for the first time, a genome survey of L. cylindrica was carried out using next-generation sequencing (NGS) technology. In total, 43.40 Gb sequence data of L. cylindrica , about 54.94× coverage of the estimated genome size of 789.97 Mb, were obtained from HiSeq 2500 sequencing, in which the guanine plus cytosine (GC) content was calculated to be 37.90%. The heterozygosity of genome sequences was only 0.24%. In total, 1,913,731 contigs (>200 bp) with 525 bp N 50 length and 1,410,117 scaffolds (>200 bp) with 885.01 Mb total length were obtained. From the initial assembled L. cylindrica genome, 431,234 microsatellites (SSRs) (≥5 repeats) were identified. The motif types of SSR repeats included 62.88% di-nucleotide, 31.03% tri-nucleotide, 4.59% tetra-nucleotide, 0.96% penta-nucleotide and 0.54% hexa-nucleotide. Eighty genomic SSR markers were developed, and 51/80 primers could be used in both "Zheda 23" and "Zheda 83". Nineteen SSRs were used to investigate the genetic diversity among 32 accessions through SSR-HRM analysis. The unweighted pair group method analysis (UPGMA) dendrogram tree was built by calculating the SSR-HRM raw data. SSR-HRM could be effectively used for genotype relationship analysis of Luffa species.

  13. DNA sequence-based comparative studies between non-extremophile and extremophile organisms with implications in exobiology

    NASA Astrophysics Data System (ADS)

    Holden, Todd; Marchese, P.; Tremberger, G., Jr.; Cheung, E.; Subramaniam, R.; Sullivan, R.; Schneider, P.; Flamholz, A.; Lieberman, D.; Cheung, T.

    2008-08-01

    We have characterized function related DNA sequences of various organisms using informatics techniques, including fractal dimension calculation, nucleotide and multi-nucleotide statistics, and sequence fluctuation analysis. Our analysis shows trends which differentiate extremophile from non-extremophile organisms, which could be reproduced in extraterrestrial life. Among the systems studied are radiation repair genes, genes involved in thermal shocks, and genes involved in drug resistance. We also evaluate sequence level changes that have occurred during short term evolution (several thousand generations) under extreme conditions.

  14. Nucleotide Interdependency in Transcription Factor Binding Sites in the Drosophila Genome.

    PubMed

    Dresch, Jacqueline M; Zellers, Rowan G; Bork, Daniel K; Drewell, Robert A

    2016-01-01

    A long-standing objective in modern biology is to characterize the molecular components that drive the development of an organism. At the heart of eukaryotic development lies gene regulation. On the molecular level, much of the research in this field has focused on the binding of transcription factors (TFs) to regulatory regions in the genome known as cis-regulatory modules (CRMs). However, relatively little is known about the sequence-specific binding preferences of many TFs, especially with respect to the possible interdependencies between the nucleotides that make up binding sites. A particular limitation of many existing algorithms that aim to predict binding site sequences is that they do not allow for dependencies between nonadjacent nucleotides. In this study, we use a recently developed computational algorithm, MARZ, to compare binding site sequences using 32 distinct models in a systematic and unbiased approach to explore nucleotide dependencies within binding sites for 15 distinct TFs known to be critical to Drosophila development. Our results indicate that many of these proteins have varying levels of nucleotide interdependencies within their DNA recognition sequences, and that, in some cases, models that account for these dependencies greatly outperform traditional models that are used to predict binding sites. We also directly compare the ability of different models to identify the known KRUPPEL TF binding sites in CRMs and demonstrate that a more complex model that accounts for nucleotide interdependencies performs better when compared with simple models. This ability to identify TFs with critical nucleotide interdependencies in their binding sites will lead to a deeper understanding of how these molecular characteristics contribute to the architecture of CRMs and the precise regulation of transcription during organismal development.

  15. Nucleotide Interdependency in Transcription Factor Binding Sites in the Drosophila Genome

    PubMed Central

    Dresch, Jacqueline M.; Zellers, Rowan G.; Bork, Daniel K.; Drewell, Robert A.

    2016-01-01

    A long-standing objective in modern biology is to characterize the molecular components that drive the development of an organism. At the heart of eukaryotic development lies gene regulation. On the molecular level, much of the research in this field has focused on the binding of transcription factors (TFs) to regulatory regions in the genome known as cis-regulatory modules (CRMs). However, relatively little is known about the sequence-specific binding preferences of many TFs, especially with respect to the possible interdependencies between the nucleotides that make up binding sites. A particular limitation of many existing algorithms that aim to predict binding site sequences is that they do not allow for dependencies between nonadjacent nucleotides. In this study, we use a recently developed computational algorithm, MARZ, to compare binding site sequences using 32 distinct models in a systematic and unbiased approach to explore nucleotide dependencies within binding sites for 15 distinct TFs known to be critical to Drosophila development. Our results indicate that many of these proteins have varying levels of nucleotide interdependencies within their DNA recognition sequences, and that, in some cases, models that account for these dependencies greatly outperform traditional models that are used to predict binding sites. We also directly compare the ability of different models to identify the known KRUPPEL TF binding sites in CRMs and demonstrate that a more complex model that accounts for nucleotide interdependencies performs better when compared with simple models. This ability to identify TFs with critical nucleotide interdependencies in their binding sites will lead to a deeper understanding of how these molecular characteristics contribute to the architecture of CRMs and the precise regulation of transcription during organismal development. PMID:27330274

  16. Nucleotide Sequence Database Comparison for Routine Dermatophyte Identification by Internal Transcribed Spacer 2 Genetic Region DNA Barcoding.

    PubMed

    Normand, A C; Packeu, A; Cassagne, C; Hendrickx, M; Ranque, S; Piarroux, R

    2018-05-01

    Conventional dermatophyte identification is based on morphological features. However, recent studies have proposed to use the nucleotide sequences of the rRNA internal transcribed spacer (ITS) region as an identification barcode of all fungi, including dermatophytes. Several nucleotide databases are available to compare sequences and thus identify isolates; however, these databases often contain mislabeled sequences that impair sequence-based identification. We evaluated five of these databases on a clinical isolate panel. We selected 292 clinical dermatophyte strains that were prospectively subjected to an ITS2 nucleotide sequence analysis. Sequences were analyzed against the databases, and the results were compared to clusters obtained via DNA alignment of sequence segments. The DNA tree served as the identification standard throughout the study. According to the ITS2 sequence identification, the majority of strains (255/292) belonged to the genus Trichophyton , mainly T. rubrum complex ( n = 184), T. interdigitale ( n = 40), T. tonsurans ( n = 26), and T. benhamiae ( n = 5). Other genera included Microsporum (e.g., M. canis [ n = 21], M. audouinii [ n = 10], Nannizzia gypsea [ n = 3], and Epidermophyton [ n = 3]). Species-level identification of T. rubrum complex isolates was an issue. Overall, ITS DNA sequencing is a reliable tool to identify dermatophyte species given that a comprehensive and correctly labeled database is consulted. Since many inaccurate identification results exist in the DNA databases used for this study, reference databases must be verified frequently and amended in line with the current revisions of fungal taxonomy. Before describing a new species or adding a new DNA reference to the available databases, its position in the phylogenetic tree must be verified. Copyright © 2018 American Society for Microbiology.

  17. Complete genome sequence analysis of novel human bocavirus reveals genetic recombination between human bocavirus 2 and human bocavirus 4.

    PubMed

    Khamrin, Pattara; Okitsu, Shoko; Ushijima, Hiroshi; Maneekarn, Niwat

    2013-07-01

    Epidemiological surveillance of human bocavirus (HBoV) was conducted on fecal specimens collected from hospitalized children with diarrhea in Chiang Mai, Thailand in 2011. By partial sequence analysis of VP1 gene, an unusual strain of HBoV (CMH-S011-11), was initially identified as HBoV4. The complete genome sequence of CMH-S011-11 was performed and analyzed further to clarify whether it was a recombinant strain or a new HBoV variant. Analysis of complete genome sequence revealed that the coding sequence starting from NS1, NP1 to VP1/VP2 was 4795 nucleotides long. Interestingly, the nucleotide sequence of NS1 gene of CMH-S011-11 was most closely related to the HBoV2 reference strains detected in Pakistan, which contradicted to the initial genotyping result of the partial VP1 region in the previous study. In addition, comparison of NP1 nucleotide sequence of CMH-S011-11 with those of other HBoV1-4 reference strains also revealed a high level of sequence identity with HBoV2. On the other hand, nucleotide sequence of VP1/VP2 gene of CMH-S011-11 was most closely related to those of HBoV4 reference strains detected in Nigeria. The overall full-length sequence analysis revealed that this CMH-S011-11 was grouped within HBoV4 species, but located in a separate branch from other HBoV4 prototype strains. Recombination analysis revealed that CMH-S011-11 was the result of recombination between HBoV2 and HBoV4 strains with the break point located near the start codon of VP2. Copyright © 2013 Elsevier B.V. All rights reserved.

  18. Deep whole-genome sequencing of 90 Han Chinese genomes.

    PubMed

    Lan, Tianming; Lin, Haoxiang; Zhu, Wenjuan; Laurent, Tellier Christian Asker Melchior; Yang, Mengcheng; Liu, Xin; Wang, Jun; Wang, Jian; Yang, Huanming; Xu, Xun; Guo, Xiaosen

    2017-09-01

    Next-generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low-frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low-frequency and novel variants. Although whole-exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole-genome sequencing data are limited for any population, and a large amount of low-frequency, population-specific variants remain uncharacterized. We have performed whole-genome sequencing at a high depth (∼×80) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genomes Project samples, including 45 Northern Han Chinese and 45 Southern Han Chinese samples. Eighty-three of these 90 have been sequenced by the 1000 Genomes Project. We have identified 12 568 804 single nucleotide polymorphisms, 2 074 210 short InDels, and 26 142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7 000 629 novel variants with low frequency (defined as minor allele frequency < 5%), including 5 813 503 single nucleotide polymorphisms, 1 169 199 InDels, and 17 927 structural variants. Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, these Han Chinese deep sequencing data enhance the characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement to the 1000 Genomes Project, as well as to other human genome projects. © The Authors 2017. Published by Oxford University Press.

  19. Masking as an effective quality control method for next-generation sequencing data analysis.

    PubMed

    Yun, Sajung; Yun, Sijung

    2014-12-13

    Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).

  20. Characterization of the repetitive DNA elements in the genome of fish lymphocystis disease viruses.

    PubMed

    Schnitzler, P; Darai, G

    1989-09-01

    The complete DNA nucleotide sequence of the repetitive DNA elements in the genome of fish lymphocystis disease virus (FLDV) isolated from two different species (flounder and dab) was determined. The size of these repetitive DNA elements was found to be 1413 bp which corresponds to the DNA sequences of the 5' terminus of the EcoRI DNA fragment B (0.034 to 0.052 m.u.) and to the EcoRI DNA fragment M (0.718 to 0.736 m.u.) of the FLDV genome causing lymphocystis disease in flounder and plaice. The degree of DNA nucleotide homology between both regions was found to be 99%. The repetitive DNA element in the genome of FLDV isolated from other fish species (dab) was identified and is located within the EcoRI DNA fragment B and J of the viral genome. The DNA nucleotide sequence of one duplicate of this repetition (EcoRI DNA fragment J) was determined (1410 bp) and compared to the DNA nucleotide sequences of the repetitive DNA elements of the genome of FLDV isolated from flounder. It was found that the repetitive DNA elements of the genome of FLDV derived from two different fish species are highly conserved and possess a degree of DNA sequence homology of 94%. The DNA sequences of each strand of the individual repetitive element possess one open reading frame.

  1. ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

    PubMed

    Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

    2012-09-08

    The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  2. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    PubMed Central

    2012-01-01

    Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836

  3. Prediction of RNA secondary structures: from theory to models and real molecules

    NASA Astrophysics Data System (ADS)

    Schuster, Peter

    2006-05-01

    RNA secondary structures are derived from RNA sequences, which are strings built form the natural four letter nucleotide alphabet, {AUGC}. These coarse-grained structures, in turn, are tantamount to constrained strings over a three letter alphabet. Hence, the secondary structures are discrete objects and the number of sequences always exceeds the number of structures. The sequences built from two letter alphabets form perfect structures when the nucleotides can form a base pair, as is the case with {GC} or {AU}, but the relation between the sequences and structures differs strongly from the four letter alphabet. A comprehensive theory of RNA structure is presented, which is based on the concepts of sequence space and shape space, being a space of structures. It sets the stage for modelling processes in ensembles of RNA molecules like evolutionary optimization or kinetic folding as dynamical phenomena guided by mappings between the two spaces. The number of minimum free energy (mfe) structures is always smaller than the number of sequences, even for two letter alphabets. Folding of RNA molecules into mfe energy structures constitutes a non-invertible mapping from sequence space onto shape space. The preimage of a structure in sequence space is defined as its neutral network. Similarly the set of suboptimal structures is the preimage of a sequence in shape space. This set represents the conformation space of a given sequence. The evolutionary optimization of structures in populations is a process taking place in sequence space, whereas kinetic folding occurs in molecular ensembles that optimize free energy in conformation space. Efficient folding algorithms based on dynamic programming are available for the prediction of secondary structures for given sequences. The inverse problem, the computation of sequences for predefined structures, is an important tool for the design of RNA molecules with tailored properties. Simultaneous folding or cofolding of two or more RNA molecules can be modelled readily at the secondary structure level and allows prediction of the most stable (mfe) conformations of complexes together with suboptimal states. Cofolding algorithms are important tools for efficient and highly specific primer design in the polymerase chain reaction (PCR) and help to explain the mechanisms of small interference RNA (si-RNA) molecules in gene regulation. The evolutionary optimization of RNA structures is illustrated by the search for a target structure and mimics aptamer selection in evolutionary biotechnology. It occurs typically in steps consisting of short adaptive phases interrupted by long epochs of little or no obvious progress in optimization. During these quasi-stationary epochs the populations are essentially confined to neutral networks where they search for sequences that allow a continuation of the adaptive process. Modelling RNA evolution as a simultaneous process in sequence and shape space provides answers to questions of the optimal population size and mutation rates. Kinetic folding is a stochastic process in conformation space. Exact solutions are derived by direct simulation in the form of trajectory sampling or by solving the master equation. The exact solutions can be approximated straightforwardly by Arrhenius kinetics on barrier trees, which represent simplified versions of conformational energy landscapes. The existence of at least one sequence forming any arbitrarily chosen pair of structures is granted by the intersection theorem. Folding kinetics is the key to understanding and designing multistable RNA molecules or RNA switches. These RNAs form two or more long lived conformations, and conformational changes occur either spontaneously or are induced through binding of small molecules or other biopolymers. RNA switches are found in nature where they act as elements in genetic and metabolic regulation. The reliability of RNA secondary structure prediction is limited by the accuracy with which the empirical parameters can be determined and by principal deficiencies, for example by the lack of energy contributions resulting from tertiary interactions. In addition, native structures may be determined by folding kinetics rather than by thermodynamics. We address the first problem by considering base pair probabilities or base pairing entropies, which are derived from the partition function of conformations. A high base pair probability corresponding to a low pairing entropy is taken as an indicator of a high reliability of prediction. Pseudoknots are discussed as an example of a tertiary interaction that is highly important for RNA function. Moreover, pseudoknot formation is readily incorporated into structure prediction algorithms. Some examples of experimental data on RNA secondary structures that are readily explained using the landscape concept are presented. They deal with (i) properties of RNA molecules with random sequences, (ii) RNA molecules from restricted alphabets, (iii) existence of neutral networks, (iv) shape space covering, (v) riboswitches and (vi) evolution of non-coding RNAs as an example of evolution restricted to neutral networks.

  4. Two combinatorial optimization problems for SNP discovery using base-specific cleavage and mass spectrometry.

    PubMed

    Chen, Xin; Wu, Qiong; Sun, Ruimin; Zhang, Louxin

    2012-01-01

    The discovery of single-nucleotide polymorphisms (SNPs) has important implications in a variety of genetic studies on human diseases and biological functions. One valuable approach proposed for SNP discovery is based on base-specific cleavage and mass spectrometry. However, it is still very challenging to achieve the full potential of this SNP discovery approach. In this study, we formulate two new combinatorial optimization problems. While both problems are aimed at reconstructing the sample sequence that would attain the minimum number of SNPs, they search over different candidate sequence spaces. The first problem, denoted as SNP - MSP, limits its search to sequences whose in silico predicted mass spectra have all their signals contained in the measured mass spectra. In contrast, the second problem, denoted as SNP - MSQ, limits its search to sequences whose in silico predicted mass spectra instead contain all the signals of the measured mass spectra. We present an exact dynamic programming algorithm for solving the SNP - MSP problem and also show that the SNP - MSQ problem is NP-hard by a reduction from a restricted variation of the 3-partition problem. We believe that an efficient solution to either problem above could offer a seamless integration of information in four complementary base-specific cleavage reactions, thereby improving the capability of the underlying biotechnology for sensitive and accurate SNP discovery.

  5. Screening of Pro-Asp Sequences Exposed on Bacteriophage M13 as an Ideal Anchor for Gold Nanocubes.

    PubMed

    Lee, Hwa Kyoung; Lee, Yujean; Kim, Hyori; Lee, Hye-Eun; Chang, Hyejin; Nam, Ki Tae; Jeong, Dae Hong; Chung, Junho

    2017-09-15

    Bacteriophages are thought to be ideal vehicles for linking antibodies to nanoparticles. Here, we define the sequence of peptides exposed as a fusion protein on M13 bacteriophages to yield optimal binding of gold nanocubes and efficient bacteriophage amplification. We generated five helper bacteriophage libraries using AE(X) 2 DP, AE(X) 3 DP, AE(X) 4 DP, AE(X) 5 DP, and AE(X) 6 DP as the exposed portion of pVIII, in which X was a randomized amino acid residue encoded by the nucleotide sequence NNK. Efficient phage amplification was achievable only in the AE(X) 2 DP, AE(X) 3 DP, and AE(X) 4 DP libraries. Through biopanning with gold nanocubes, we enriched the phage clones and selected the clone with the highest fold change after enrichment. This clone displayed Pro-Asp on the surface of the bacteriophage and had amplification yields similar to those of the wild-type helper bacteriophage (VCSM13). The clone displayed even binding of gold nanocubes along its length and minimal aggregation after binding. We conclude that, for efficient amplification, the exposed pVIII amino acid length should be limited to six residues and Ala-Glu-Pro-Asp-Asp-Pro (AEPDDP) is the ideal fusion protein sequence for guaranteeing the optimal formation of a complex with gold nanocubes.

  6. Single nucleotide polymorphisms in common bean: their discovery and genotyping using a multiplex detection system

    USDA-ARS?s Scientific Manuscript database

    Single-nucleotide Polymorphism (SNP) markers are by far the most common form of DNA polymorphism in a genome. The objectives of this study were to discover SNPs in common bean comparing sequences from coding and non-coding regions obtained from Genbank and genomic DNA and to compare sequencing resu...

  7. An integrated genetic linkage map of watermelon and genetic diversity based on single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers

    USDA-ARS?s Scientific Manuscript database

    Watermelon (Citrullus lanatus var. lanatus) is an important vegetable fruit throughout the world. A high number of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers should provide large coverage of the watermelon genome and high phylogenetic resolution of germplasm acces...

  8. A new begomovirus associated with alpha- and betasatellite molecules isolated from Vernonia cinerea in China.

    PubMed

    Zulfiqar, Awais; Zhang, Jie; Cui, Xiaofeng; Qian, Yajuan; Zhou, Xueping; Xie, Yan

    2012-01-01

    A begomovirus disease complex associated with Vernonia cinerea showing yellow vein symptoms was studied. The full-length genomic DNA was comprised of 2739 nucleotides (nt) and contained the typical genome structure of begomoviruses. Comparison analysis showed that it shared the highest (78.9%) nucleotide sequence identity with recently characterized Vernonia yellow vein virus (VeYVV) from India. For associated satellites, betasatellite showed the highest nucleotide sequence identity (52.1%) with Vernonia yellow vein virus betasatellite (VeYVVB) and alphasatellite shared the highest sequence identity (70.7%) with Gossypium mustelinium symptomless alphasatellite (GMusSLA). It is a member of a distinct species with cognate alpha- and betasatellites for which the name Vernonia yellow vein Fujian virus (VeYVFjV) is proposed.

  9. The complete nucleotide sequence of the barley yellow dwarf GPV isolate from China shows that it is a new member of the genus Polerovirus.

    PubMed

    Zhang, Wenwei; Cheng, Zhuomin; Xu, Lei; Wu, Maosen; Waterhouse, Peter; Zhou, Guanghe; Li, Shifang

    2009-01-01

    The complete nucleotide sequence of the ssRNA genome of a Chinese GPV isolate of barley yellow dwarf virus (BYDV) was determined. It comprised 5673 nucleotides, and the deduced genome organization resembled that of members of the genus Polerovirus. It was most closely related to cereal yellow dwarf virus-RPV (77% nt identity over the entire genome; coat protein amino acid identity 79%). The GPV isolate also differs in vector specificity from other BYDV strains. Biological properties, phylogenetic analyses and detailed sequence comparisons suggest that GPV should be considered a member of a new species within the genus, and the name Wheat yellow dwarf virus-GPV is proposed.

  10. Promoter for Sindbis virus RNA-dependent subgenomic RNA transcription.

    PubMed Central

    Levis, R; Schlesinger, S; Huang, H V

    1990-01-01

    Sindbis virus is a positive-strand RNA enveloped virus, a member of the Alphavirus genus of the Togaviridae family. Two species of mRNA are synthesized in cells infected with Sindbis virus; one, the 49S RNA, is the genomic RNA; the other, the 26S RNA, is a subgenomic RNA that is identical in sequence to the 3' one-third of the genomic RNA. Ou et al. (J.-H. Ou, C. M. Rice, L. Dalgarno, E. G. Strauss, and J. H. Strauss, Proc. Natl. Acad. Sci. USA 79:5235-5239, 1982) identified a highly conserved region 19 nucleotides upstream and 2 nucleotides downstream from the start of the 26S RNA and proposed that in the negative-strand template, these nucleotides compose the promoter for directing the synthesis of the subgenomic RNA. Defective interfering (DI) RNAs of Sindbis virus were used to test this proposal. A 227-nucleotide sequence encompassing 98 nucleotides upstream and 117 nucleotides downstream from the start site of the Sindbis virus subgenomic RNA was inserted into a DI genome. The DI RNA containing the insert was replicated and packaged in the presence of helper virus, and cells infected with these DI particles produced a subgenomic RNA of the size and sequence expected if the promoter was functional. The initiating nucleotide was identical to that used for Sindbis virus subgenomic mRNA synthesis. Deletion analysis showed that the minimal region required to detect transcription of a subgenomic RNA from the negative-strand template of a DI RNA was 18 or 19 nucleotides upstream and 5 nucleotides downstream from the start of the subgenomic RNA. Images PMID:2319651

  11. Molecular identification of Trichuris vulpis and Trichuris suis isolated from different hosts.

    PubMed

    Cutillas, Cristina; de Rojas, Manuel; Ariza, Concepción; Ubeda, José Manuel; Guevara, Diego

    2007-01-01

    Trichuris suis was isolated from the cecum of two different hosts (Sus scrofa domestica -- swine and Sus scrofa scrofa -- wild boar) and Trichuris vulpis from dogs in Sevilla, Spain. Genomic DNA was isolated and internal transcribed spacers (ITS)1-5.8S-ITS2 segment from the ribosomal DNA (rDNA) was amplified and sequenced using polymerase chain reaction techniques. The sequence of T. suis from both hosts was 1,396 bp in length while that of T. vulpis was 1,044 bp. ITS1 of both populations isolated of T. suis was 661 nucleotides in length, while the ITS2 was 534 nucleotides in length. Furthermore, the ITS1 of T. vulpis was 410 nucleotides in length, while the ITS2 was 433 nucleotides in length. One hundred fifty-four nucleotides were observed along the 5.8S gene of T. suis and T. vulpis. Intraindividual and intraspecific variations were detected in the rDNA of both species. The presence of microsatellites was observed in all the individuals assayed. Sequence analysis of the ITSs and the 5.8S gene has demonstrated no sequence differences between T. suis isolated from both hosts (S. scrofa domestica -- swine and S. scrofa scrofa -- wild boar). Nevertheless, clear differences were detected between the ITS1 and ITS2 of T. suis and T. vulpis. Furthermore, a comparative molecular analysis between both species and the previously published ITS1-5.8S-ITS2 sequence data of Trichuris ovis, Trichuris leporis, Trichuris muris, Trichuris arvicolae, and Trichuris skrjabini was carried out. A common homology zone was detected in the ITS1 sequence of all species of trichurids.

  12. Genetic variation in potential Giardia vaccine candidates cyst wall protein 2 and α1-giardin.

    PubMed

    Radunovic, Matej; Klotz, Christian; Saghaug, Christina Skår; Brattbakk, Hans-Richard; Aebischer, Toni; Langeland, Nina; Hanevik, Kurt

    2017-08-01

    Giardia is a prevalent intestinal parasitic infection. The trophozoite structural protein a1-giardin (a1-g) and the cyst protein cyst wall protein 2 (CWP2) have shown promise as Giardia vaccine antigen candidates in murine models. The present study assesses the genetic diversity of a1-g and CWP2 between and within assemblages A and B in human clinical isolates. a1-g and CWP2 sequences were acquired from 15 Norwegian isolates by PCR amplification and 20 sequences from German cultured isolates by whole genome sequencing. Sequences were aligned to reference genomes from assemblage A2 and B to identify genetic variance. Genetic diversity was found between assemblage A and B reference sequences for both a1-g (90.8% nucleotide identity) and CWP2 (82.5% nucleotide identity). However, for a1-g, this translated into only 3 amino acid (aa) substitutions, while for CWP2 there were 41 aa substitutions, and also one aa deletion. Genetic diversity within assemblage B was larger; nucleotide identity 92.0% for a1-g and 94.3% for CWP2, than within assemblage A (nucleotide identity 99.0% for a1-g and 99.7% for CWP2). For CWP2, the diversity on both nucleotide and protein level was higher in the C-terminal end. Predicted antigenic epitopes were not affected for a1-g, but partially for CWP2. Despite genetic diversity in a1-g, we found aa sequence, characteristics, and antigenicity to be well preserved. CWP2 showed more aa variance and potential antigenic differences. Several CWP2 antigens might be necessary in a future Giardia vaccine to provide cross protection against both Giardia assemblages infecting humans.

  13. Complete nucleotide sequence and genome structure of a Japanese isolate of hibiscus latent Fort Pierce virus, a unique tobamovirus that contains an internal poly(A) region in its 3' end.

    PubMed

    Yoshida, Tetsuya; Kitazawa, Yugo; Komatsu, Ken; Neriya, Yutaro; Ishikawa, Kazuya; Fujita, Naoko; Hashimoto, Masayoshi; Maejima, Kensaku; Yamaji, Yasuyuki; Namba, Shigetou

    2014-11-01

    In this study, we detected a Japanese isolate of hibiscus latent Fort Pierce virus (HLFPV-J), a member of the genus Tobamovirus, in a hibiscus plant in Japan and determined the complete sequence and organization of its genome. HLFPV-J has four open reading frames (ORFs), each of which shares more than 98 % nucleotide sequence identity with those of other HLFPV isolates. Moreover, HLFPV-J contains a unique internal poly(A) region of variable length, ranging from 44 to 78 nucleotides, in its 3'-untranslated region (UTR), as is the case with hibiscus latent Singapore virus (HLSV), another hibiscus-infecting tobamovirus. The length of the HLFPV-J genome was 6431 nucleotides, including the shortest internal poly(A) region. The sequence identities of ORFs 1, 2, 3 and 4 of HLFPV-J to other tobamoviruses were 46.6-68.7, 49.9-70.8, 31.0-70.8 and 39.4-70.1 %, respectively, at the nucleotide level and 39.8-75.0, 43.6-77.8, 19.2-70.4 and 31.2-74.2 %, respectively, at the amino acid level. The 5'- and 3'-UTRs of HLFPV-J showed 24.3-58.6 and 13.0-79.8 % identity, respectively, to other tobamoviruses. In particular, when compared to other tobamoviruses, each ORF and UTR of HLFPV-J showed the highest sequence identity to those of HLSV. Phylogenetic analysis showed that HLFPV-J, other HLFPV isolates and HLSV constitute a malvaceous-plant-infecting tobamovirus cluster. These results indicate that the genomic structure of HLFPV-J has unique features similar to those of HLSV. To our knowledge, this is the first report of the complete genome sequence of HLFPV.

  14. PGen: large-scale genomic variations analysis workflow and browser in SoyKB.

    PubMed

    Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti

    2016-10-06

    With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most efficient analysis of soybean data using thorough testing and validation. This research serves as an example of best practices for development of genomics data analysis workflows by integrating remote HPC resources and efficient data management with ease of use for biological users. PGen workflow can also be easily customized for analysis of data in other species.

  15. Detection of a novel herpesvirus from bats in the Philippines.

    PubMed

    Sano, Kaori; Okazaki, Sachiko; Taniguchi, Satoshi; Masangkay, Joseph S; Puentespina, Roberto; Eres, Eduardo; Cosico, Edison; Quibod, Niña; Kondo, Taisuke; Shimoda, Hiroshi; Hatta, Yuuki; Mitomo, Shumpei; Oba, Mami; Katayama, Yukie; Sassa, Yukiko; Furuya, Tetsuya; Nagai, Makoto; Une, Yumi; Maeda, Ken; Kyuwa, Shigeru; Yoshikawa, Yasuhiro; Akashi, Hiroomi; Omatsu, Tsutomu; Mizutani, Tetsuya

    2015-08-01

    Bats are natural hosts of many zoonotic viruses. Monitoring bat viruses is important to detect novel bat-borne infectious diseases. In this study, next generation sequencing techniques and conventional PCR were used to analyze intestine, lung, and blood clot samples collected from wild bats captured at three locations in Davao region, in the Philippines in 2012. Different viral genes belonging to the Retroviridae and Herpesviridae families were identified using next generation sequencing. The existence of herpesvirus in the samples was confirmed by PCR using herpesvirus consensus primers. The nucleotide sequences of the resulting PCR amplicons were 166-bp. Further phylogenetic analysis identified that the virus from which this nucleotide sequence was obtained belonged to the Gammaherpesvirinae subfamily. PCR using primers specific to the nucleotide sequence obtained revealed that the infection rate among the captured bats was 30 %. In this study, we present the partial genome of a novel gammaherpesvirus detected from wild bats. Our observations also indicate that this herpesvirus may be widely distributed in bat populations in Davao region.

  16. A mixed integer linear programming model to reconstruct phylogenies from single nucleotide polymorphism haplotypes under the maximum parsimony criterion

    PubMed Central

    2013-01-01

    Background Phylogeny estimation from aligned haplotype sequences has attracted more and more attention in the recent years due to its importance in analysis of many fine-scale genetic data. Its application fields range from medical research, to drug discovery, to epidemiology, to population dynamics. The literature on molecular phylogenetics proposes a number of criteria for selecting a phylogeny from among plausible alternatives. Usually, such criteria can be expressed by means of objective functions, and the phylogenies that optimize them are referred to as optimal. One of the most important estimation criteria is the parsimony which states that the optimal phylogeny T∗for a set H of n haplotype sequences over a common set of variable loci is the one that satisfies the following requirements: (i) it has the shortest length and (ii) it is such that, for each pair of distinct haplotypes hi,hj∈H, the sum of the edge weights belonging to the path from hi to hj in T∗ is not smaller than the observed number of changes between hi and hj. Finding the most parsimonious phylogeny for H involves solving an optimization problem, called the Most Parsimonious Phylogeny Estimation Problem (MPPEP), which is NP-hard in many of its versions. Results In this article we investigate a recent version of the MPPEP that arises when input data consist of single nucleotide polymorphism haplotypes extracted from a population of individuals on a common genomic region. Specifically, we explore the prospects for improving on the implicit enumeration strategy of implicit enumeration strategy used in previous work using a novel problem formulation and a series of strengthening valid inequalities and preliminary symmetry breaking constraints to more precisely bound the solution space and accelerate implicit enumeration of possible optimal phylogenies. We present the basic formulation and then introduce a series of provable valid constraints to reduce the solution space. We then prove that these constraints can often lead to significant reductions in the gap between the optimal solution and its non-integral linear programming bound relative to the prior art as well as often substantially faster processing of moderately hard problem instances. Conclusion We provide an indication of the conditions under which such an optimal enumeration approach is likely to be feasible, suggesting that these strategies are usable for relatively large numbers of taxa, although with stricter limits on numbers of variable sites. The work thus provides methodology suitable for provably optimal solution of some harder instances that resist all prior approaches. PMID:23343437

  17. De-MetaST-BLAST: A Tool for the Validation of Degenerate Primer Sets and Data Mining of Publicly Available Metagenomes

    PubMed Central

    Gulvik, Christopher A.; Effler, T. Chad; Wilhelm, Steven W.; Buchan, Alison

    2012-01-01

    Development and use of primer sets to amplify nucleic acid sequences of interest is fundamental to studies spanning many life science disciplines. As such, the validation of primer sets is essential. Several computer programs have been created to aid in the initial selection of primer sequences that may or may not require multiple nucleotide combinations (i.e., degeneracies). Conversely, validation of primer specificity has remained largely unchanged for several decades, and there are currently few available programs that allows for an evaluation of primers containing degenerate nucleotide bases. To alleviate this gap, we developed the program De-MetaST that performs an in silico amplification using user defined nucleotide sequence dataset(s) and primer sequences that may contain degenerate bases. The program returns an output file that contains the in silico amplicons. When De-MetaST is paired with NCBI’s BLAST (De-MetaST-BLAST), the program also returns the top 10 nr NCBI database hits for each recovered in silico amplicon. While the original motivation for development of this search tool was degenerate primer validation using the wealth of nucleotide sequences available in environmental metagenome and metatranscriptome databases, this search tool has potential utility in many data mining applications. PMID:23189198

  18. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.

    PubMed

    Otto, Thomas D; Sanders, Mandy; Berriman, Matthew; Newbold, Chris

    2010-07-15

    The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications. The software is available at http://icorn.sourceforge.net

  19. Isolated nucleic acids encoding antipathogenic polypeptides and uses thereof

    DOEpatents

    Altier, Daniel J.; Crane, Virginia C.; Ellanskaya, Irina; Ellanskaya, Natalia; Gilliam, Jacob T.; Hunter-Cevera, Jennie; Presnail, James K.; Schepers, Eric J.; Simmons, Carl R.; Torok, Tamas; Yalpani, Nasser

    2010-04-20

    Compositions and methods for protecting a plant from a pathogen, particularly a fungal pathogen, are provided. Compositions include amino acid sequences, and variants and fragments thereof, for antipathogenic polypeptides that were isolated from fungal fermentation broths. Nucleic acids that encode the antipathogenic polypeptides are also provided. A method for inducing pathogen resistance in a plant using the nucleotide sequences disclosed herein is further provided. The method comprises introducing into a plant an expression cassette comprising a promoter operably linked to a nucleotide sequence that encodes an antipathogenic polypeptide of the invention. Compositions comprising an antipathogenic polypeptide or a transformed microorganism comprising a nucleic acid of the invention in combination with a carrier and methods of using these compositions to protect a plant from a pathogen are further provided. Transformed plants, plant cells, seeds, and microorganisms comprising a nucleotide sequence that encodes an antipathogenic polypeptide of the invention are also disclosed.

  20. Nucleotide sequence and proposed secondary structure of Columnea latent viroid: a natural mosaic of viroid sequences.

    PubMed Central

    Hammond, R; Smith, D R; Diener, T O

    1989-01-01

    The Columnea latent viroid (CLV) occurs latently in certain Columnea erythrophae plants grown commercially. In potato and tomato, CLV causes potato spindle tuber viroid (PSTV)-like symptoms. Its nucleotide sequence and proposed secondary structure reveal that CLV consists of a single-stranded circular RNA of 370 nucleotides which can assume a rod-like structure with extensive base-pairing characteristic of all known viroids. The electrophoretic mobility of circular CLV under nondenaturing conditions suggests a potential tertiary structure. CLV contains extensive sequence homologies to the PSTV group of viroids but contains a central conserved region identical to that of hop stunt viroid (HSV). CLV also shares some biological properties with each of the two types of viroids. Most probably, CLV is the result of intracellular RNA recombination between an HSV-type and one or more PSTV-type viroids replicating in the same plant. Images PMID:2602114

  1. iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition

    PubMed Central

    Lin, Hao; Deng, En-Ze; Ding, Hui; Chen, Wei; Chou, Kuo-Chen

    2014-01-01

    The σ54 promoters are unique in prokaryotic genome and responsible for transcripting carbon and nitrogen-related genes. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying the σ54 promoters. Here, a predictor called ‘iPro54-PseKNC’ was developed. In the predictor, the samples of DNA sequences were formulated by a novel feature vector called ‘pseudo k-tuple nucleotide composition’, which was further optimized by the incremental feature selection procedure. The performance of iPro54-PseKNC was examined by the rigorous jackknife cross-validation tests on a stringent benchmark data set. As a user-friendly web-server, iPro54-PseKNC is freely accessible at http://lin.uestc.edu.cn/server/iPro54-PseKNC. For the convenience of the vast majority of experimental scientists, a step-by-step protocol guide was provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics that were presented in this paper just for its integrity. Meanwhile, we also discovered through an in-depth statistical analysis that the distribution of distances between the transcription start sites and the translation initiation sites were governed by the gamma distribution, which may provide a fundamental physical principle for studying the σ54 promoters. PMID:25361964

  2. Differentiation of Campylobacter jejuni and Campylobacter coli Using Multiplex-PCR and High Resolution Melt Curve Analysis

    PubMed Central

    Banowary, Banya; Dang, Van Tuan; Sarker, Subir; Connolly, Joanne H.; Chenu, Jeremy; Groves, Peter; Ayton, Michelle; Raidal, Shane; Devi, Aruna; Vanniasinkam, Thiru; Ghorashi, Seyed A.

    2015-01-01

    Campylobacter spp. are important causes of bacterial gastroenteritis in humans in developed countries. Among Campylobacter spp. Campylobacter jejuni (C. jejuni) and C. coli are the most common causes of human infection. In this study, a multiplex PCR (mPCR) and high resolution melt (HRM) curve analysis were optimized for simultaneous detection and differentiation of C. jejuni and C. coli isolates. A segment of the hippuricase gene (hipO) of C. jejuni and putative aspartokinase (asp) gene of C. coli were amplified from 26 Campylobacter isolates and amplicons were subjected to HRM curve analysis. The mPCR-HRM was able to differentiate between C. jejuni and C. coli species. All DNA amplicons generated by mPCR were sequenced. Analysis of the nucleotide sequences from each isolate revealed that the HRM curves were correlated with the nucleotide sequences of the amplicons. Minor variation in melting point temperatures of C. coli or C. jejuni isolates was also observed and enabled some intraspecies differentiation between C. coli and/or C. jejuni isolates. The potential of PCR-HRM curve analysis for the detection and speciation of Campylobacter in additional human clinical specimens and chicken swab samples was also confirmed. The sensitivity and specificity of the test were found to be 100% and 92%, respectively. The results indicated that mPCR followed by HRM curve analysis provides a rapid (8 hours) technique for differentiation between C. jejuni and C. coli isolates. PMID:26394042

  3. GGRNA: an ultrafast, transcript-oriented search engine for genes and transcripts

    PubMed Central

    Naito, Yuki; Bono, Hidemasa

    2012-01-01

    GGRNA (http://GGRNA.dbcls.jp/) is a Google-like, ultrafast search engine for genes and transcripts. The web server accepts arbitrary words and phrases, such as gene names, IDs, gene descriptions, annotations of gene and even nucleotide/amino acid sequences through one simple search box, and quickly returns relevant RefSeq transcripts. A typical search takes just a few seconds, which dramatically enhances the usability of routine searching. In particular, GGRNA can search sequences as short as 10 nt or 4 amino acids, which cannot be handled easily by popular sequence analysis tools. Nucleotide sequences can be searched allowing up to three mismatches, or the query sequences may contain degenerate nucleotide codes (e.g. N, R, Y, S). Furthermore, Gene Ontology annotations, Enzyme Commission numbers and probe sequences of catalog microarrays are also incorporated into GGRNA, which may help users to conduct searches by various types of keywords. GGRNA web server will provide a simple and powerful interface for finding genes and transcripts for a wide range of users. All services at GGRNA are provided free of charge to all users. PMID:22641850

  4. GGRNA: an ultrafast, transcript-oriented search engine for genes and transcripts.

    PubMed

    Naito, Yuki; Bono, Hidemasa

    2012-07-01

    GGRNA (http://GGRNA.dbcls.jp/) is a Google-like, ultrafast search engine for genes and transcripts. The web server accepts arbitrary words and phrases, such as gene names, IDs, gene descriptions, annotations of gene and even nucleotide/amino acid sequences through one simple search box, and quickly returns relevant RefSeq transcripts. A typical search takes just a few seconds, which dramatically enhances the usability of routine searching. In particular, GGRNA can search sequences as short as 10 nt or 4 amino acids, which cannot be handled easily by popular sequence analysis tools. Nucleotide sequences can be searched allowing up to three mismatches, or the query sequences may contain degenerate nucleotide codes (e.g. N, R, Y, S). Furthermore, Gene Ontology annotations, Enzyme Commission numbers and probe sequences of catalog microarrays are also incorporated into GGRNA, which may help users to conduct searches by various types of keywords. GGRNA web server will provide a simple and powerful interface for finding genes and transcripts for a wide range of users. All services at GGRNA are provided free of charge to all users.

  5. Nucleotide sequence of an exceptionally long 5.8S ribosomal RNA from Crithidia fasciculata.

    PubMed Central

    Schnare, M N; Gray, M W

    1982-01-01

    In Crithidia fasciculata, a trypanosomatid protozoan, the large ribosomal subunit contains five small RNA species (e, f, g, i, j) in addition to 5S rRNA [Gray, M.W. (1981) Mol. Cell. Biol. 1, 347-357]. The complete primary sequence of species i is shown here to be pAACGUGUmCGCGAUGGAUGACUUGGCUUCCUAUCUCGUUGA ... AGAmACGCAGUAAAGUGCGAUAAGUGGUApsiCAAUUGmCAGAAUCAUUCAAUUACCGAAUCUUUGAACGAAACGG ... CGCAUGGGAGAAGCUCUUUUGAGUCAUCCCCGUGCAUGCCAUAUUCUCCAmGUGUCGAA(C)OH. This sequence establishes that species i is a 5.8S rRNA, despite its exceptional length (171-172 nucleotides). The extra nucleotides in C. fasciculata 5.8S rRNA are located in a region whose primary sequence and length are highly variable among 5.8S rRNAs, but which is capable of forming a stable hairpin loop structure (the "G+C-rich hairpin"). The sequence of C. fasciculata 5.8S rRNA is no more closely related to that of another protozoan, Acanthamoeba castellanii, than it is to representative 5.8S rRNA sequences from the other eukaryotic kingdoms, emphasizing the deep phylogenetic divisions that seem to exist within the Kingdom Protista. Images PMID:7079176

  6. The C-terminal Helix of Pseudomonas aeruginosa Elongation Factor Ts Tunes EF-Tu Dynamics to Modulate Nucleotide Exchange.

    PubMed

    De Laurentiis, Evelina Ines; Mercier, Evan; Wieden, Hans-Joachim

    2016-10-28

    Little is known about the conservation of critical kinetic parameters and the mechanistic strategies of elongation factor (EF) Ts-catalyzed nucleotide exchange in EF-Tu in bacteria and particularly in clinically relevant pathogens. EF-Tu from the clinically relevant pathogen Pseudomonas aeruginosa shares over 84% sequence identity with the corresponding elongation factor from Escherichia coli Interestingly, the functionally closely linked EF-Ts only shares 55% sequence identity. To identify any differences in the nucleotide binding properties, as well as in the EF-Ts-mediated nucleotide exchange reaction, we performed a comparative rapid kinetics and mutagenesis analysis of the nucleotide exchange mechanism for both the E. coli and P. aeruginosa systems, identifying helix 13 of EF-Ts as a previously unnoticed regulatory element in the nucleotide exchange mechanism with species-specific elements. Our findings support the base side-first entry of the nucleotide into the binding pocket of the EF-Tu·EF-Ts binary complex, followed by displacement of helix 13 and rapid binding of the phosphate side of the nucleotide, ultimately leading to the release of EF-Ts. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  7. GFinisher: a new strategy to refine and finish bacterial genome assemblies

    NASA Astrophysics Data System (ADS)

    Guizelini, Dieval; Raittz, Roberto T.; Cruz, Leonardo M.; Souza, Emanuel M.; Steffens, Maria B. R.; Pedrosa, Fabio O.

    2016-10-01

    Despite the development in DNA sequencing technology, improving the number and the length of reads, the process of reconstruction of complete genome sequences, the so called genome assembly, is still complex. Only 13% of the prokaryotic genome sequencing projects have been completed. Draft genome sequences deposited in public databases are fragmented in contigs and may lack the full gene complement. The aim of the present work is to identify assembly errors and improve the assembly process of bacterial genomes. The biological patterns observed in genomic sequences and the application of a priori information can allow the identification of misassembled regions, and the reorganization and improvement of the overall de novo genome assembly. GFinisher starts generating a Fuzzy GC skew graphs for each contig in an assembly and follows breaking down the contigs in critical points in order to reassemble and close them using jFGap. This has been successfully applied to dataset from 96 genome assemblies, decreasing the number of contigs by up to 86%. GFinisher can easily optimize assemblies of prokaryotic draft genomes and can be used to improve the assembly programs based on nucleotide sequence patterns in the genome. The software and source code are available at http://gfinisher.sourceforge.net/.

  8. A review of bioinformatic methods for forensic DNA analyses.

    PubMed

    Liu, Yao-Yuan; Harbison, SallyAnn

    2018-03-01

    Short tandem repeats, single nucleotide polymorphisms, and whole mitochondrial analyses are three classes of markers which will play an important role in the future of forensic DNA typing. The arrival of massively parallel sequencing platforms in forensic science reveals new information such as insights into the complexity and variability of the markers that were previously unseen, along with amounts of data too immense for analyses by manual means. Along with the sequencing chemistries employed, bioinformatic methods are required to process and interpret this new and extensive data. As more is learnt about the use of these new technologies for forensic applications, development and standardization of efficient, favourable tools for each stage of data processing is being carried out, and faster, more accurate methods that improve on the original approaches have been developed. As forensic laboratories search for the optimal pipeline of tools, sequencer manufacturers have incorporated pipelines into sequencer software to make analyses convenient. This review explores the current state of bioinformatic methods and tools used for the analyses of forensic markers sequenced on the massively parallel sequencing (MPS) platforms currently most widely used. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. GFinisher: a new strategy to refine and finish bacterial genome assemblies.

    PubMed

    Guizelini, Dieval; Raittz, Roberto T; Cruz, Leonardo M; Souza, Emanuel M; Steffens, Maria B R; Pedrosa, Fabio O

    2016-10-10

    Despite the development in DNA sequencing technology, improving the number and the length of reads, the process of reconstruction of complete genome sequences, the so called genome assembly, is still complex. Only 13% of the prokaryotic genome sequencing projects have been completed. Draft genome sequences deposited in public databases are fragmented in contigs and may lack the full gene complement. The aim of the present work is to identify assembly errors and improve the assembly process of bacterial genomes. The biological patterns observed in genomic sequences and the application of a priori information can allow the identification of misassembled regions, and the reorganization and improvement of the overall de novo genome assembly. GFinisher starts generating a Fuzzy GC skew graphs for each contig in an assembly and follows breaking down the contigs in critical points in order to reassemble and close them using jFGap. This has been successfully applied to dataset from 96 genome assemblies, decreasing the number of contigs by up to 86%. GFinisher can easily optimize assemblies of prokaryotic draft genomes and can be used to improve the assembly programs based on nucleotide sequence patterns in the genome. The software and source code are available at http://gfinisher.sourceforge.net/.

  10. Simian virus 40 major late promoter: an upstream DNA sequence required for efficient in vitro transcription.

    PubMed Central

    Brady, J; Radonovich, M; Thoren, M; Das, G; Salzman, N P

    1984-01-01

    We have previously identified an 11-base DNA sequence, 5'-G-G-T-A-C-C-T-A-A-C-C-3' (simian virus 40 [SV40] map position 294 to 304), which is important in the control of SV40 late RNA expression in vitro and in vivo (Brady et al., Cell 31:625-633, 1982). We report here the identification of another domain of the SV40 late promoter. A series of mutants with deletions extending from SV40 map position 0 to 300 was prepared by nuclease BAL 31 treatment. The cloned templates were then analyzed for efficiency and accuracy of late SV40 RNA expression in the Manley in vitro transcription system. Our studies showed that, in addition to the promoter domain near map position 300, there are essential DNA sequences between nucleotide positions 74 and 95 that are required for efficient expression of late SV40 RNA. Included in this SV40 DNA sequence were two of the six GGGCGG SV40 repeat sequences and an 11-nucleotide segment which showed strong homology with the upstream sequences required for the efficient in vitro and in vivo expression of the histone H2A gene. This upstream promoter sequence supported transcription with the same efficiency even when it was moved 72 nucleotides closer to the major late cap site. In vitro promoter competition analysis demonstrated that the upstream promoter sequence, independent of the 294 to 304 promoter element, is capable of binding polymerase-transcription factors required for SV40 late gene transcription. Finally, we show that DNA sequences which control the specificity of RNA initiation at nucleotide 325 lie downstream of map position 294. Images PMID:6321950

  11. Nucleotide sequence of Hungarian grapevine chrome mosaic nepovirus RNA1.

    PubMed Central

    Le Gall, O; Candresse, T; Brault, V; Dunez, J

    1989-01-01

    The nucleotide sequence of the RNA1 of hungarian grapevine chrome mosaic virus, a nepovirus very closely related to tomato black ring virus, has been determined from cDNA clones. It is 7212 nucleotides in length excluding the 3' terminal poly(A) tail and contains a large open reading frame extending from nucleotides 216 to 6971. The presumably encoded polyprotein is 2252 amino acids in length with a molecular weight of 250 kDa. The primary structure of the polyprotein was compared with that of other viral polyproteins, revealing the same general genetic organization as that of other picorna-like viruses (comoviruses, potyviruses and picornaviruses), except that an additional protein is suspected to occupy the N-terminus of the polyprotein. PMID:2798128

  12. A novel representation of the conformational structure of transfer RNAs. Correlation of the folding patterns of the polynucleotide chain with the base sequence and the nucleotide backbone torsions.

    PubMed Central

    Srinivasan, A R; Yathindra, N

    1977-01-01

    A novel description of the conformational characteristics of all the individual nucleotides and the phosphodiesters in tRNAs is presented in the form of a circular plot. This representation furnishes information of the base sequence with the folding patterns of the polynucleotide chain as one traverses along the circumference and with the individual nucleotide and phosphodiester linkage torsions along the radii. The circular plot obtained for yeast tRNAPhe strikingly distinguishes the helical and the loop regions. The variation of the different nucleotide torsions along the entire chain length and their effect on the secondary helical and tertiary loop regions become readily apparent. PMID:339206

  13. Synthetic oligonucleotide separations by mixed-mode reversed-phase/weak anion-exchange liquid chromatography.

    PubMed

    Zimmermann, Aleksandra; Greco, Roberto; Walker, Isabel; Horak, Jeannie; Cavazzini, Alberto; Lämmerhofer, Michael

    2014-08-08

    Synthetic oligonucleotides gain increasing importance in new therapeutic concepts and as probes in biological sciences. If pharmaceutical-grade purities are required, chromatographic purification using ion-pair reversed-phase chromatography is commonly carried out. However, separation selectivity for structurally closely related impurities is often insufficient, especially at high sample loads. In this study, a "mixed-mode" reversed-phase/weak anion exchanger stationary phase has been investigated as an alternative tool for chromatographic separation of synthetic oligonucleotides with minor sequence variations. The employed mixed-mode phase shows great flexibility in method development. It has been run in various gradient elution modes, viz. one, two or three parameter (mixed) gradients (altering buffer pH, buffer concentration, and organic modifier) to find optimal elution conditions and gain further insight into retention mechanisms. Compared to ion-pair reversed-phase and mere anion-exchange separation, enhanced selectivities were observed with the mixed-mode phase for 20-23 nucleotide (nt) long oligonucleotides with similar sequences. Oligonucleotides differing by 1, 2 or 3 nucleotides in length could be readily resolved and separation factors for single nucleotide replacements declined in the order Cytosine (C)/Guanine (G)>Adenine (A)/Guanine∼Guanine/Thymine (T)>Adenine/Cytosine∼Cytosine/Thymine>Adenine/Thymine. Selectivities were larger when the modification was at the 3' terminal-end, declined when it was in the middle of the sequence and was smallest when it was located at the 5' terminus. Due to the lower surface area of the 200Å pore size mixed-mode stationary phase compared to the corresponding 100Å material, lower retention times with equal selectivities under milder elution conditions were achievable. Considering high sample loading capacities of the mixed-mode anion-exchanger phase, it should have great potential for chromatographic oligonucleotide separation and purification. Copyright © 2014 Elsevier B.V. All rights reserved.

  14. A Bioluminometric Method of DNA Sequencing

    NASA Technical Reports Server (NTRS)

    Ronaghi, Mostafa; Pourmand, Nader; Stolc, Viktor; Arnold, Jim (Technical Monitor)

    2001-01-01

    Pyrosequencing is a bioluminometric single-tube DNA sequencing method that takes advantage of co-operativity between four enzymes to monitor DNA synthesis. In this sequencing-by-synthesis method, a cascade of enzymatic reactions yields detectable light, which is proportional to incorporated nucleotides. Pyrosequencing has the advantages of accuracy, flexibility and parallel processing. It can be easily automated. Furthermore, the technique dispenses with the need for labeled primers, labeled nucleotides and gel-electrophoresis. In this chapter, the use of this technique for different applications is discussed.

  15. Information Entropy Analysis of the H1N1 Genetic Code

    NASA Astrophysics Data System (ADS)

    Martwick, Andy

    2010-03-01

    During the current H1N1 pandemic, viral samples are being obtained from large numbers of infected people world-wide and are being sequenced on the NCBI Influenza Virus Resource Database. The information entropy of the sequences was computed from the probability of occurrence of each nucleotide base at every position of each set of sequences using Shannon's definition of information entropy, [ H=∑bpb,2( 1pb ) ] where H is the observed information entropy at each nucleotide position and pb is the probability of the base pair of the nucleotides A, C, G, U. Information entropy of the current H1N1 pandemic is compared to reference human and swine H1N1 entropy. As expected, the current H1N1 entropy is in a low entropy state and has a very large mutation potential. Using the entropy method in mature genes we can identify low entropy regions of nucleotides that generally correlate to critical protein function.

  16. Statistical Analysis of Readthrough Levels for Nonsense Mutations in Mammalian Cells Reveals a Major Determinant of Response to Gentamicin

    PubMed Central

    Floquet, Célia; Hatin, Isabelle; Rousset, Jean-Pierre; Bidou, Laure

    2012-01-01

    The efficiency of translation termination depends on the nature of the stop codon and the surrounding nucleotides. Some molecules, such as aminoglycoside antibiotics (gentamicin), decrease termination efficiency and are currently being evaluated for diseases caused by premature termination codons. However, the readthrough response to treatment is highly variable and little is known about the rules governing readthrough level and response to aminoglycosides. In this study, we carried out in-depth statistical analysis on a very large set of nonsense mutations to decipher the elements of nucleotide context responsible for modulating readthrough levels and gentamicin response. We quantified readthrough for 66 sequences containing a stop codon, in the presence and absence of gentamicin, in cultured mammalian cells. We demonstrated that the efficiency of readthrough after treatment is determined by the complex interplay between the stop codon and a larger sequence context. There was a strong positive correlation between basal and induced readthrough levels, and a weak negative correlation between basal readthrough level and gentamicin response (i.e. the factor of increase from basal to induced readthrough levels). The identity of the stop codon did not affect the response to gentamicin treatment. In agreement with a previous report, we confirm that the presence of a cytosine in +4 position promotes higher basal and gentamicin-induced readthrough than other nucleotides. We highlight for the first time that the presence of a uracil residue immediately upstream from the stop codon is a major determinant of the response to gentamicin. Moreover, this effect was mediated by the nucleotide itself, rather than by the amino-acid or tRNA corresponding to the −1 codon. Finally, we point out that a uracil at this position associated with a cytosine at +4 results in an optimal gentamicin-induced readthrough, which is the therapeutically relevant variable. PMID:22479203

  17. Inferring epidemiological dynamics of infectious diseases using Tajima's D statistic on nucleotide sequences of pathogens.

    PubMed

    Kim, Kiyeon; Omori, Ryosuke; Ito, Kimihito

    2017-12-01

    The estimation of the basic reproduction number is essential to understand epidemic dynamics, and time series data of infected individuals are usually used for the estimation. However, such data are not always available. Methods to estimate the basic reproduction number using genealogy constructed from nucleotide sequences of pathogens have been proposed so far. Here, we propose a new method to estimate epidemiological parameters of outbreaks using the time series change of Tajima's D statistic on the nucleotide sequences of pathogens. To relate the time evolution of Tajima's D to the number of infected individuals, we constructed a parsimonious mathematical model describing both the transmission process of pathogens among hosts and the evolutionary process of the pathogens. As a case study we applied this method to the field data of nucleotide sequences of pandemic influenza A (H1N1) 2009 viruses collected in Argentina. The Tajima's D-based method estimated basic reproduction number to be 1.55 with 95% highest posterior density (HPD) between 1.31 and 2.05, and the date of epidemic peak to be 10th July with 95% HPD between 22nd June and 9th August. The estimated basic reproduction number was consistent with estimation by birth-death skyline plot and estimation using the time series of the number of infected individuals. These results suggested that Tajima's D statistic on nucleotide sequences of pathogens could be useful to estimate epidemiological parameters of outbreaks. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  18. Genome Survey Sequencing of Luffa Cylindrica L. and Microsatellite High Resolution Melting (SSR-HRM) Analysis for Genetic Relationship of Luffa Genotypes

    PubMed Central

    An, Jianyu; Yin, Mengqi; Zhang, Qin; Gong, Dongting; Jia, Xiaowen; Guan, Yajing; Hu, Jin

    2017-01-01

    Luffa cylindrica (L.) Roem. is an economically important vegetable crop in China. However, the genomic information on this species is currently unknown. In this study, for the first time, a genome survey of L. cylindrica was carried out using next-generation sequencing (NGS) technology. In total, 43.40 Gb sequence data of L. cylindrica, about 54.94× coverage of the estimated genome size of 789.97 Mb, were obtained from HiSeq 2500 sequencing, in which the guanine plus cytosine (GC) content was calculated to be 37.90%. The heterozygosity of genome sequences was only 0.24%. In total, 1,913,731 contigs (>200 bp) with 525 bp N50 length and 1,410,117 scaffolds (>200 bp) with 885.01 Mb total length were obtained. From the initial assembled L. cylindrica genome, 431,234 microsatellites (SSRs) (≥5 repeats) were identified. The motif types of SSR repeats included 62.88% di-nucleotide, 31.03% tri-nucleotide, 4.59% tetra-nucleotide, 0.96% penta-nucleotide and 0.54% hexa-nucleotide. Eighty genomic SSR markers were developed, and 51/80 primers could be used in both “Zheda 23” and “Zheda 83”. Nineteen SSRs were used to investigate the genetic diversity among 32 accessions through SSR-HRM analysis. The unweighted pair group method analysis (UPGMA) dendrogram tree was built by calculating the SSR-HRM raw data. SSR-HRM could be effectively used for genotype relationship analysis of Luffa species. PMID:28891982

  19. Pstl repeat: a family of short interspersed nucleotide element (SINE)-like sequences in the genomes of cattle, goat, and buffalo.

    PubMed

    Sheikh, Faruk G; Mukhopadhyay, Sudit S; Gupta, Prabhakar

    2002-02-01

    The PstI family of elements are short, highly repetitive DNA sequences interspersed throughout the genome of the Bovidae. We have cloned and sequenced some members of the PstI family from cattle, goat, and buffalo. These elements are approximately 500 bp, have a copy number of 2 x 10(5) - 4 x 10(5), and comprise about 4% of the haploid genome. Studies of nucleotide sequence homology indicate that the buffalo and goat PstI repeats (type II) are similar types of short interspersed nucleotide element (SINE) sequences, but the cattle PstI repeat (type I) is considerably more divergent. Additionally, the goat PstI sequence showed significant sequence homology with bovine serine tRNA, and is therefore likely derived from serine tRNA. Interestingly, Southern hybridization suggests that both types of SINEs (I and II) are present in all the species of Bovidae. Dendrogram analysis indicates that cattle PstI SINE is similar to bovine Alu-like SINEs. Goat and buffalo SINEs formed a separate cluster, suggesting that these two types of SINEs evolved separately in the genome of the Bovidae.

  20. DNA sequencing using polymerase substrate-binding kinetics

    PubMed Central

    Previte, Michael John Robert; Zhou, Chunhong; Kellinger, Matthew; Pantoja, Rigo; Chen, Cheng-Yao; Shi, Jin; Wang, BeiBei; Kia, Amirali; Etchin, Sergey; Vieceli, John; Nikoomanzar, Ali; Bomati, Erin; Gloeckner, Christian; Ronaghi, Mostafa; He, Molly Min

    2015-01-01

    Next-generation sequencing (NGS) has transformed genomic research by decreasing the cost of sequencing. However, whole-genome sequencing is still costly and complex for diagnostics purposes. In the clinical space, targeted sequencing has the advantage of allowing researchers to focus on specific genes of interest. Routine clinical use of targeted NGS mandates inexpensive instruments, fast turnaround time and an integrated and robust workflow. Here we demonstrate a version of the Sequencing by Synthesis (SBS) chemistry that potentially can become a preferred targeted sequencing method in the clinical space. This sequencing chemistry uses natural nucleotides and is based on real-time recording of the differential polymerase/DNA-binding kinetics in the presence of correct or mismatch nucleotides. This ensemble SBS chemistry has been implemented on an existing Illumina sequencing platform with integrated cluster amplification. We discuss the advantages of this sequencing chemistry for targeted sequencing as well as its limitations for other applications. PMID:25612848

  1. VarDetect: a nucleotide sequence variation exploratory tool

    PubMed Central

    Ngamphiw, Chumpol; Kulawonganunchai, Supasak; Assawamakin, Anunchai; Jenwitheesuk, Ekachai; Tongsima, Sissades

    2008-01-01

    Background Single nucleotide polymorphisms (SNPs) are the most commonly studied units of genetic variation. The discovery of such variation may help to identify causative gene mutations in monogenic diseases and SNPs associated with predisposing genes in complex diseases. Accurate detection of SNPs requires software that can correctly interpret chromatogram signals to nucleotides. Results We present VarDetect, a stand-alone nucleotide variation exploratory tool that automatically detects nucleotide variation from fluorescence based chromatogram traces. Accurate SNP base-calling is achieved using pre-calculated peak content ratios, and is enhanced by rules which account for common sequence reading artifacts. The proposed software tool is benchmarked against four other well-known SNP discovery software tools (PolyPhred, novoSNP, Genalys and Mutation Surveyor) using fluorescence based chromatograms from 15 human genes. These chromatograms were obtained from sequencing 16 two-pooled DNA samples; a total of 32 individual DNA samples. In this comparison of automatic SNP detection tools, VarDetect achieved the highest detection efficiency. Availability VarDetect is compatible with most major operating systems such as Microsoft Windows, Linux, and Mac OSX. The current version of VarDetect is freely available at . PMID:19091032

  2. A Laboratory Exercise for Genotyping Two Human Single Nucleotide Polymorphisms

    ERIC Educational Resources Information Center

    Fernando, James; Carlson, Bradley; LeBard, Timothy; McCarthy, Michael; Umali, Finianne; Ashton, Bryce; Rose, Ferrill F., Jr.

    2016-01-01

    The dramatic decrease in the cost of sequencing a human genome is leading to an era in which a wide range of students will benefit from having an understanding of human genetic variation. Since over 90% of sequence variation between humans is in the form of single nucleotide polymorphisms (SNPs), a laboratory exercise has been devised in order to…

  3. OrthoANI: An improved algorithm and software for calculating average nucleotide identity.

    PubMed

    Lee, Imchang; Ouk Kim, Yeong; Park, Sang-Cheol; Chun, Jongsik

    2016-02-01

    Species demarcation in Bacteria and Archaea is mainly based on overall genome relatedness, which serves a framework for modern microbiology. Current practice for obtaining these measures between two strains is shifting from experimentally determined similarity obtained by DNA-DNA hybridization (DDH) to genome-sequence-based similarity. Average nucleotide identity (ANI) is a simple algorithm that mimics DDH. Like DDH, ANI values between two genome sequences may be different from each other when reciprocal calculations are compared. We compared 63 690 pairs of genome sequences and found that the differences in reciprocal ANI values are significantly high, exceeding 1 % in some cases. To resolve this problem of not being symmetrical, a new algorithm, named OrthoANI, was developed to accommodate the concept of orthology for which both genome sequences were fragmented and only orthologous fragment pairs taken into consideration for calculating nucleotide identities. OrthoANI is highly correlated with ANI (using BLASTn) and the former showed approximately 0.1 % higher values than the latter. In conclusion, OrthoANI provides a more robust and faster means of calculating average nucleotide identity for taxonomic purposes. The standalone software tools are freely available at http://www.ezbiocloud.net/sw/oat.

  4. Resolving the tips of the tree of life: How much mitochondrialdata doe we need?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bonett, Ronald M.; Macey, J. Robert; Boore, Jeffrey L.

    2005-04-29

    Mitochondrial (mt) DNA sequences are used extensively to reconstruct evolutionary relationships among recently diverged animals,and have constituted the most widely used markers for species- and generic-level relationships for the last decade or more. However, most studies to date have employed relatively small portions of the mt-genome. In contrast, complete mt-genomes primarily have been used to investigate deep divergences, including several studies of the amount of mt sequence necessary to recover ancient relationships. We sequenced and analyzed 24 complete mt-genomes from a group of salamander species exhibiting divergences typical of those in many species-level studies. We present the first comprehensive investigationmore » of the amount of mt sequence data necessary to consistently recover the mt-genome tree at this level, using parsimony and Bayesian methods. Both methods of phylogenetic analysis revealed extremely similar results. A surprising number of well supported, yet conflicting, relationships were found in trees based on fragments less than {approx}2000 nucleotides (nt), typical of the vast majority of the thousands of mt-based studies published to date. Large amounts of data (11,500+ nt) were necessary to consistently recover the whole mt-genome tree. Some relationships consistently were recovered with fragments of all sizes, but many nodes required the majority of the mt-genome to stabilize, particularly those associated with short internal branches. Although moderate amounts of data (2000-3000 nt) were adequate to recover mt-based relationships for which most nodes were congruent with the whole mt-genome tree, many thousands of nucleotides were necessary to resolve rapid bursts of evolution. Recent advances in genomics are making collection of large amounts of sequence data highly feasible, and our results provide the basis for comparative studies of other closely related groups to optimize mt sequence sampling and phylogenetic resolution at the ''tips'' of the Tree of Life.« less

  5. DNA polymerase preference determines PCR priming efficiency.

    PubMed

    Pan, Wenjing; Byrne-Steele, Miranda; Wang, Chunlin; Lu, Stanley; Clemmons, Scott; Zahorchak, Robert J; Han, Jian

    2014-01-30

    Polymerase chain reaction (PCR) is one of the most important developments in modern biotechnology. However, PCR is known to introduce biases, especially during multiplex reactions. Recent studies have implicated the DNA polymerase as the primary source of bias, particularly initiation of polymerization on the template strand. In our study, amplification from a synthetic library containing a 12 nucleotide random portion was used to provide an in-depth characterization of DNA polymerase priming bias. The synthetic library was amplified with three commercially available DNA polymerases using an anchored primer with a random 3' hexamer end. After normalization, the next generation sequencing (NGS) results of the amplified libraries were directly compared to the unamplified synthetic library. Here, high throughput sequencing was used to systematically demonstrate and characterize DNA polymerase priming bias. We demonstrate that certain sequence motifs are preferred over others as primers where the six nucleotide sequences at the 3' end of the primer, as well as the sequences four base pairs downstream of the priming site, may influence priming efficiencies. DNA polymerases in the same family from two different commercial vendors prefer similar motifs, while another commercially available enzyme from a different DNA polymerase family prefers different motifs. Furthermore, the preferred priming motifs are GC-rich. The DNA polymerase preference for certain sequence motifs was verified by amplification from single-primer templates. We incorporated the observed DNA polymerase preference into a primer-design program that guides the placement of the primer to an optimal location on the template. DNA polymerase priming bias was characterized using a synthetic library amplification system and NGS. The characterization of DNA polymerase priming bias was then utilized to guide the primer-design process and demonstrate varying amplification efficiencies among three commercially available DNA polymerases. The results suggest that the interaction of the DNA polymerase with the primer:template junction during the initiation of DNA polymerization is very important in terms of overall amplification bias and has broader implications for both the primer design process and multiplex PCR.

  6. Expression of a recombinant human sperm-agglutinating mini-antibody in tobacco (Nicotiana tabacum L.).

    PubMed

    Xu, Bingfang; Copolla, Michael; Herr, John C; Timko, Michael P

    2007-01-01

    The murine monoclonal antibody (mAB) S19 recognizes an N-linked carbohydrate antigen designated sperm agglutination antigen-1 (SAGA1) located on the membrane protein CD52. This antigen is added to the sperm surface during epididymal maturation. Binding of the S19 mAB to SAGA-1 causes the rapid agglutination of sperm and blocks pre-fertilization events. Previous studies indicated that the S19 mAB may be a potential specific spermicidal agent (termed a spermistatic) capable of replacing current spermicidal products that contain harsh detergents with harmful side effects. The nucleotide sequences encoding the heavy (H) and light (L) chains of the S19 antibody were cloned. A chimeric gene was constructed using the nucleotide sequences encoding the variable regions of both the H and L chains, and this gene (scFv1 9) was expressed in transgenic tobacco (Nicotiana tabacum L.) to produce a recombinant anti-sperm antibody (RASA). Highest levels of RASA expression were observed in BY-2 plant cell suspension cultures and regenerated N. tabacum cv. Xanthi plants transformant in which the RASA coding sequences were expressed under the control of the Cauliflower Mosaic Virus 35S promoter containing a double-enhancer sequence (2X CaMV 35S). Subsequent modifications of the transgene including the addition of a 5'-untranslated sequence from the tobacco etch virus (TEV leader sequence), N-terminal fusion of the coding region with an endoplasmic reticulum targeting signal of patatin (pat) and C-terminal fusion with the endoplasmic reticulum retention signal peptide KDEL showed further enhancement of RASA expression. The plant-expressed RASA formed intrachain disulfide bonds and was primarily soluble in the cytoplasmic fraction of the cells. Introduction of a poly-histidine (6xHIS) tag in the recombinant RASA protein allowed for rapid purification of the recombinant protein using Ni-NTA chromatography. Optimization of scale-up production and purification of this plant-derived recombinant protein should provide large quantities of an inexpensive spermistatic plantibody.

  7. The role of heterologous chloroplast sequence elements in transgene integration and expression.

    PubMed

    Ruhlman, Tracey; Verma, Dheeraj; Samson, Nalapalli; Daniell, Henry

    2010-04-01

    Heterologous regulatory elements and flanking sequences have been used in chloroplast transformation of several crop species, but their roles and mechanisms have not yet been investigated. Nucleotide sequence identity in the photosystem II protein D1 (psbA) upstream region is 59% across all taxa; similar variation was consistent across all genes and taxa examined. Secondary structure and predicted Gibbs free energy values of the psbA 5' untranslated region (UTR) among different families reflected this variation. Therefore, chloroplast transformation vectors were made for tobacco (Nicotiana tabacum) and lettuce (Lactuca sativa), with endogenous (Nt-Nt, Ls-Ls) or heterologous (Nt-Ls, Ls-Nt) psbA promoter, 5' UTR and 3' UTR, regulating expression of the anthrax protective antigen (PA) or human proinsulin (Pins) fused with the cholera toxin B-subunit (CTB). Unique lettuce flanking sequences were completely eliminated during homologous recombination in the transplastomic tobacco genomes but not unique tobacco sequences. Nt-Ls or Ls-Nt transplastomic lines showed reduction of 80% PA and 97% CTB-Pins expression when compared with endogenous psbA regulatory elements, which accumulated up to 29.6% total soluble protein PA and 72.0% total leaf protein CTB-Pins, 2-fold higher than Rubisco. Transgene transcripts were reduced by 84% in Ls-Nt-CTB-Pins and by 72% in Nt-Ls-PA lines. Transcripts containing endogenous 5' UTR were stabilized in nonpolysomal fractions. Stromal RNA-binding proteins were preferentially associated with endogenous psbA 5' UTR. A rapid and reproducible regeneration system was developed for lettuce commercial cultivars by optimizing plant growth regulators. These findings underscore the need for sequencing complete crop chloroplast genomes, utilization of endogenous regulatory elements and flanking sequences, as well as optimization of plant growth regulators for efficient chloroplast transformation.

  8. The Role of Heterologous Chloroplast Sequence Elements in Transgene Integration and Expression1[W][OA

    PubMed Central

    Ruhlman, Tracey; Verma, Dheeraj; Samson, Nalapalli; Daniell, Henry

    2010-01-01

    Heterologous regulatory elements and flanking sequences have been used in chloroplast transformation of several crop species, but their roles and mechanisms have not yet been investigated. Nucleotide sequence identity in the photosystem II protein D1 (psbA) upstream region is 59% across all taxa; similar variation was consistent across all genes and taxa examined. Secondary structure and predicted Gibbs free energy values of the psbA 5′ untranslated region (UTR) among different families reflected this variation. Therefore, chloroplast transformation vectors were made for tobacco (Nicotiana tabacum) and lettuce (Lactuca sativa), with endogenous (Nt-Nt, Ls-Ls) or heterologous (Nt-Ls, Ls-Nt) psbA promoter, 5′ UTR and 3′ UTR, regulating expression of the anthrax protective antigen (PA) or human proinsulin (Pins) fused with the cholera toxin B-subunit (CTB). Unique lettuce flanking sequences were completely eliminated during homologous recombination in the transplastomic tobacco genomes but not unique tobacco sequences. Nt-Ls or Ls-Nt transplastomic lines showed reduction of 80% PA and 97% CTB-Pins expression when compared with endogenous psbA regulatory elements, which accumulated up to 29.6% total soluble protein PA and 72.0% total leaf protein CTB-Pins, 2-fold higher than Rubisco. Transgene transcripts were reduced by 84% in Ls-Nt-CTB-Pins and by 72% in Nt-Ls-PA lines. Transcripts containing endogenous 5′ UTR were stabilized in nonpolysomal fractions. Stromal RNA-binding proteins were preferentially associated with endogenous psbA 5′ UTR. A rapid and reproducible regeneration system was developed for lettuce commercial cultivars by optimizing plant growth regulators. These findings underscore the need for sequencing complete crop chloroplast genomes, utilization of endogenous regulatory elements and flanking sequences, as well as optimization of plant growth regulators for efficient chloroplast transformation. PMID:20130101

  9. Recombination-dependent replication and gene conversion homogenize repeat sequences and diversify plastid genome structure.

    PubMed

    Ruhlman, Tracey A; Zhang, Jin; Blazier, John C; Sabir, Jamal S M; Jansen, Robert K

    2017-04-01

    There is a misinterpretation in the literature regarding the variable orientation of the small single copy region of plastid genomes (plastomes). The common phenomenon of small and large single copy inversion, hypothesized to occur through intramolecular recombination between inverted repeats (IR) in a circular, single unit-genome, in fact, more likely occurs through recombination-dependent replication (RDR) of linear plastome templates. If RDR can be primed through both intra- and intermolecular recombination, then this mechanism could not only create inversion isomers of so-called single copy regions, but also an array of alternative sequence arrangements. We used Illumina paired-end and PacBio single-molecule real-time (SMRT) sequences to characterize repeat structure in the plastome of Monsonia emarginata (Geraniaceae). We used OrgConv and inspected nucleotide alignments to infer ancestral nucleotides and identify gene conversion among repeats and mapped long (>1 kb) SMRT reads against the unit-genome assembly to identify alternative sequence arrangements. Although M. emarginata lacks the canonical IR, we found that large repeats (>1 kilobase; kb) represent ∼22% of the plastome nucleotide content. Among the largest repeats (>2 kb), we identified GC-biased gene conversion and mapping filtered, long SMRT reads to the M. emarginata unit-genome assembly revealed alternative, substoichiometric sequence arrangements. We offer a model based on RDR and gene conversion between long repeated sequences in the M. emarginata plastome and provide support that both intra-and intermolecular recombination between large repeats, particularly in repeat-rich plastomes, varies unit-genome structure while homogenizing the nucleotide sequence of repeats. © 2017 Botanical Society of America.

  10. Genetic Characteristics of Coronaviruses from Korean Bats in 2016.

    PubMed

    Lee, Saemi; Jo, Seong-Deok; Son, Kidong; An, Injung; Jeong, Jipseol; Wang, Seung-Jun; Kim, Yongkwan; Jheong, Weonhwa; Oem, Jae-Ku

    2018-01-01

    Bats have increasingly been recognized as the natural reservoir of severe acute respiratory syndrome (SARS), coronavirus, and other coronaviruses found in mammals. However, little research has been conducted on bat coronaviruses in South Korea. In this study, bat samples (332 oral swabs, 245 fecal samples, 38 urine samples, and 57 bat carcasses) were collected at 33 natural bat habitat sites in South Korea. RT-PCR and sequencing were performed for specific coronavirus genes to identify the bat coronaviruses in different bat samples. Coronaviruses were detected in 2.7% (18/672) of the samples: 13 oral swabs from one species of the family Rhinolophidae, and four fecal samples and one carcass (intestine) from three species of the family Vespertiliodae. To determine the genetic relationships of the 18 sequences obtained in this study and previously known coronaviruses, the nucleotide sequences of a 392-nt region of the RNA-dependent RNA polymerase (RdRp) gene were analyzed phylogenetically. Thirteen sequences belonging to SARS-like betacoronaviruses showed the highest nucleotide identity (97.1-99.7%) with Bat-CoV-JTMC15 reported in China. The other five sequences were most similar to MERS-like betacoronaviruses. Four nucleotide sequences displayed the highest identity (94.1-95.1%) with Bat-CoV-HKU5 from Hong Kong. The one sequence from a carcass showed the highest nucleotide identity (99%) with Bat-CoV-SC2013 from China. These results suggest that careful surveillance of coronaviruses from bats should be continued, because animal and human infections may result from the genetic variants present in bat coronavirus reservoirs.

  11. Optimization of antifungal production by an alkaliphilic and halotolerant actinomycete, Streptomyces sp. SY-BS5, using response surface methodology.

    PubMed

    Souagui, Y; Tritsch, D; Grosdemange-Billiard, C; Kecha, M

    2015-06-01

    Optimization of medium components and physicochemical parameters for antifungal production by an alkaliphilic and salt-tolerant actinomycete designated Streptomyces sp. SY-BS5; isolated from an arid region in south of Algeria. The strain showed broad-spectrum activity against pathogenic and toxinogenic fungi. Identification of the actinomycete strain was realized on the basis of 16S rRNA gene sequencing. Antifungal production was optimized following one-factor-at-a-time (OFAT) and response surface methodology (RSM) approaches. The most suitable medium for growth and antifungal production was found using one-factor-at-a-time methodology. The individual and interaction effects of three nutritional variables, carbon source (glucose), nitrogen source (yeast extract) and sodium chloride (NaCl) were optimized by Box-Behnken design. Finally, culture conditions for the antifungal production, pH and temperature were studied and determined. Analysis of the 16S rRNA gene sequence (1454 nucleotides) assigned this strain to Streptomyces genus with 99% similarity with Streptomyces cyaneofuscatus JCM4364(T), the most closely related. The results of the optimization study show that concentrations 3.476g/L of glucose, 3.876g/L of yeast extract and 41.140g/L of NaCl are responsible for the enhancement of antifungal production by Streptomyces sp. SY-BS5. The preferable culture conditions for antifungal production were pH 10, temperature 30°C for 09 days. This study proved that RSM is usual and powerful tool for the optimization of antifungal production from actinomycetes. Copyright © 2015 Elsevier Masson SAS. All rights reserved.

  12. Nucleotide sequence of the gag gene and gag-pol junction of feline leukemia virus.

    PubMed Central

    Laprevotte, I; Hampe, A; Sherr, C J; Galibert, F

    1984-01-01

    The nucleotide sequence of the gag gene of feline leukemia virus and its flanking sequences were determined and compared with the corresponding sequences of two strains of feline sarcoma virus and with that of the Moloney strain of murine leukemia virus. A high degree of nucleotide sequence homology between the feline leukemia virus and murine leukemia virus gag genes was observed, suggesting that retroviruses of domestic cats and laboratory mice have a common, proximal evolutionary progenitor. The predicted structure of the complete feline leukemia virus gag gene precursor suggests that the translation of nonglycosylated and glycosylated gag gene polypeptides is initiated at two different AUG codons. These initiator codons fall in the same reading frame and are separated by a 222-base-pair segment which encodes an amino terminal signal peptide. The nucleotide sequence predicts the order of amino acids in each of the individual gag-coded proteins (p15, p12, p30, p10), all of which derive from the gag gene precursor. Stable stem-and-loop secondary structures are proposed for two regions of viral RNA. The first falls within sequences at the 5' end of the viral genome, together with adjacent palindromic sequences which may play a role in dimer linkage of RNA subunits. The second includes coding sequences at the gag-pol junction and is proposed to be involved in translation of the pol gene product. Sequence analysis of the latter region shows that the gag and pol genes are translated in different reading frames. Classical consensus splice donor and acceptor sequences could not be localized to regions which would permit synthesis of the expected gag-pol precursor protein. Alternatively, we suggest that the pol gene product (RNA-dependent DNA polymerase) could be translated by a frameshift suppressing mechanism which could involve cleavage modification of stems and loops in a manner similar to that observed in tRNA processing. PMID:6328019

  13. Evaluation of targeted exome sequencing for 28 protein-based blood group systems, including the homologous gene systems, for blood group genotyping.

    PubMed

    Schoeman, Elizna M; Lopez, Genghis H; McGowan, Eunike C; Millard, Glenda M; O'Brien, Helen; Roulis, Eileen V; Liew, Yew-Wah; Martin, Jacqueline R; McGrath, Kelli A; Powley, Tanya; Flower, Robert L; Hyland, Catherine A

    2017-04-01

    Blood group single nucleotide polymorphism genotyping probes for a limited range of polymorphisms. This study investigated whether massively parallel sequencing (also known as next-generation sequencing), with a targeted exome strategy, provides an extended blood group genotype and the extent to which massively parallel sequencing correctly genotypes in homologous gene systems, such as RH and MNS. Donor samples (n = 28) that were extensively phenotyped and genotyped using single nucleotide polymorphism typing, were analyzed using the TruSight One Sequencing Panel and MiSeq platform. Genes for 28 protein-based blood group systems, GATA1, and KLF1 were analyzed. Copy number variation analysis was used to characterize complex structural variants in the GYPC and RH systems. The average sequencing depth per target region was 66.2 ± 39.8. Each sample harbored on average 43 ± 9 variants, of which 10 ± 3 were used for genotyping. For the 28 samples, massively parallel sequencing variant sequences correctly matched expected sequences based on single nucleotide polymorphism genotyping data. Copy number variation analysis defined the Rh C/c alleles and complex RHD hybrids. Hybrid RHD*D-CE-D variants were correctly identified, but copy number variation analysis did not confidently distinguish between D and CE exon deletion versus rearrangement. The targeted exome sequencing strategy employed extended the range of blood group genotypes detected compared with single nucleotide polymorphism typing. This single-test format included detection of complex MNS hybrid cases and, with copy number variation analysis, defined RH hybrid genes along with the RHCE*C allele hitherto difficult to resolve by variant detection. The approach is economical compared with whole-genome sequencing and is suitable for a red blood cell reference laboratory setting. © 2017 AABB.

  14. Identification and nucleotide sequence analysis of the repetitive DNA element in the genome of fish lymphocystis disease virus.

    PubMed

    Schnitzler, P; Delius, H; Scholz, J; Touray, M; Orth, E; Darai, G

    1987-12-01

    The genome of the fish lymphocystis disease virus (FLDV) was screened for the existence of repetitive DNA sequences using a defined and complete gene library of the viral genome (98 kbp) by DNA-DNA hybridization, heteroduplex analysis, and restriction fine mapping. A repetitive DNA sequence was detected at the coordinates 0.034 to 0.057 and 0.718 to 0.736 map units (m.u.) of the FLDV genome. The first region (0.034 to 0.057 m.u.) corresponds to the 5' terminus of the EcoRI FLDV DNA fragment B (0.034 to 0.165 m.u.) and the second region (0.718 to 0.736 m.u.) is identical to the EcoRI DNA fragment M of the viral genome. The DNA nucleotide sequence of the EcoRI FLDV DNA fragment M was determined. This analysis revealed the presence of many short direct and inverted repetitions, e.g., a 18-mer direct repetition (TTTAAAATTTAATTAA) that started at nucleotide positions 812 and 942 and a 14-mer inverted repeat (TTAAATTTAAATTT) at nucleotide positions 820 and 959. Only short open reading frames were detected within this region. The DNA repetitions are discussed as sequences that play a possible regulatory role for virus replication. Furthermore, hybridization experiments revealed that the repetitive DNA sequences are conserved in the genome of different strains of fish lymphocystis disease virus isolated from two species of Pleuronectidae (flounder and dab).

  15. RNAcode: Robust discrimination of coding and noncoding regions in comparative sequence data

    PubMed Central

    Washietl, Stefan; Findeiß, Sven; Müller, Stephan A.; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L.; Stadler, Peter F.; Goldman, Nick

    2011-01-01

    With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied “out of the box,” without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as “noncoding.” RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode. PMID:21357752

  16. Detection of Splice Sites Using Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Varadwaj, Pritish; Purohit, Neetesh; Arora, Bhumika

    Automatic identification and annotation of exon and intron region of gene, from DNA sequences has been an important research area in field of computational biology. Several approaches viz. Hidden Markov Model (HMM), Artificial Intelligence (AI) based machine learning and Digital Signal Processing (DSP) techniques have extensively and independently been used by various researchers to cater this challenging task. In this work, we propose a Support Vector Machine based kernel learning approach for detection of splice sites (the exon-intron boundary) in a gene. Electron-Ion Interaction Potential (EIIP) values of nucleotides have been used for mapping character sequences to corresponding numeric sequences. Radial Basis Function (RBF) SVM kernel is trained using EIIP numeric sequences. Furthermore this was tested on test gene dataset for detection of splice site by window (of 12 residues) shifting. Optimum values of window size, various important parameters of SVM kernel have been optimized for a better accuracy. Receiver Operating Characteristic (ROC) curves have been utilized for displaying the sensitivity rate of the classifier and results showed 94.82% accuracy for splice site detection on test dataset.

  17. RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data.

    PubMed

    Washietl, Stefan; Findeiss, Sven; Müller, Stephan A; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L; Stadler, Peter F; Goldman, Nick

    2011-04-01

    With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied "out of the box," without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as "noncoding." RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.

  18. High-resolution mapping, characterization, and optimization of autonomously replicating sequences in yeast

    PubMed Central

    Liachko, Ivan; Youngblood, Rachel A.; Keich, Uri; Dunham, Maitreya J.

    2013-01-01

    DNA replication origins are necessary for the duplication of genomes. In addition, plasmid-based expression systems require DNA replication origins to maintain plasmids efficiently. The yeast autonomously replicating sequence (ARS) assay has been a valuable tool in dissecting replication origin structure and function. However, the dearth of information on origins in diverse yeasts limits the availability of efficient replication origin modules to only a handful of species and restricts our understanding of origin function and evolution. To enable rapid study of origins, we have developed a sequencing-based suite of methods for comprehensively mapping and characterizing ARSs within a yeast genome. Our approach finely maps genomic inserts capable of supporting plasmid replication and uses massively parallel deep mutational scanning to define molecular determinants of ARS function with single-nucleotide resolution. In addition to providing unprecedented detail into origin structure, our data have allowed us to design short, synthetic DNA sequences that retain maximal ARS function. These methods can be readily applied to understand and modulate ARS function in diverse systems. PMID:23241746

  19. Methods of DNA sequencing by hybridization based on optimizing concentration of matrix-bound oligonucleotide and device for carrying out same

    DOEpatents

    Khrapko, Konstantin R [Moscow, RU; Khorlin, Alexandr A [Moscow, RU; Ivanov, Igor B [Moskovskaya, RU; Ershov, Gennady M [Moscow, RU; Lysov, Jury P [Moscow, RU; Florentiev, Vladimir L [Moscow, RU; Mirzabekov, Andrei D [Moscow, RU

    1996-09-03

    A method for sequencing DNA by hybridization that includes the following steps: forming an array of oligonucleotides at such concentrations that either ensure the same dissociation temperature for all fully complementary duplexes or allows hybridization and washing of such duplexes to be conducted at the same temperature; hybridizing said oligonucleotide array with labeled test DNA; washing in duplex dissociation conditions; identifying single-base substitutions in the test DNA by analyzing the distribution of the dissociation temperatures and reconstructing the DNA nucleotide sequence based on the above analysis. A device for carrying out the method comprises a solid substrate and a matrix rigidly bound to the substrate. The matrix contains the oligonucleotide array and consists of a multiplicity of gel portions. Each gel portion contains one oligonucleotide of desired length. The gel portions are separated from one another by interstices and have a thickness not exceeding 30 .mu.m.

  20. Formation of cis-diamminedichloroplatinum(II) 1,2-intrastrand cross-links on DNA is flanking-sequence independent.

    PubMed

    Burstyn, J N; Heiger-Bernays, W J; Cohen, S M; Lippard, S J

    2000-11-01

    Mapping of cis-diamminedichloroplatinum(II) (cis-DDP, cisplatin) DNA adducts over >3000 nucleotides was carried out using a replication blockage assay. The sites of inhibition of modified T4 DNA polymerase, also referred to as stop sites, were analyzed to determine the effects of local sequence context on the distribution of intrastrand cisplatin cross-links. In a 3120 base fragment from replicative form M13mp18 DNA containing 24.6% guanine, 25.5% thymine, 26.9% adenine and 23.0% cytosine, 166 individual stop sites were observed at a bound platinum/nucleotide ratio of 1-2 per thousand. The majority of stop sites (90%) occurred at G(n>2) sequences and the remainder were located at sites containing an AG dinucleotide. For all of the GG sites present in the mapped sequences, including those with Gn(>)2, 89% blocked replication, whereas for the AG sites only 17% blocked replication. These blockage sites were independent of flanking nucleotides in a sequence of N(1)G*G*N(2) where N(1), N(2) = A, C, G, T and G*G* indicates a 1,2-intrastrand platinum cross-link. The absence of long-range sequence dependence was confirmed by monitoring the reaction of cisplatin with a plasmid containing an 800 bp insert of the human telomere repeat sequence (TTAGGG)(n). Platination reactions monitored at several formal platinum/nucleotide ratios or as a function of time reveal that the telomere insert was not preferentially damaged by cisplatin. Both replication blockage and telomere-insert plasmid platination experiments indicate that cisplatin 1,2-intrastrand adducts do not form preferentially at G-rich sequences in vitro.

  1. Sequence analysis of porcine kobuvirus VP1 region detected in pigs in Japan and Thailand.

    PubMed

    Okitsu, Shoko; Khamrin, Pattara; Thongprachum, Aksara; Hidaka, Satoshi; Kongkaew, Sompreeya; Kongkaew, Apisek; Maneekarn, Niwat; Mizuguchi, Masashi; Hayakawa, Satoshi; Ushijima, Hiroshi

    2012-04-01

    Porcine kobuvirus is a new candidate species of the genus Kobuvirus in the family Picornaviridae, and information is still limited. The identification of porcine kobuvirus has been performed by the sequence analyses of the 3D region of the viruses. Therefore, the purpose of this study was to characterize the molecular properties of VP1 nucleotide sequences of the porcine kobuviruses isolated from porcine stool samples in Japan during 2009 and Thailand between 2006 and 2008. In addition, previous identification of a unique porcine kobuvirus; Japanese H023/2009/JP, which is a bovine kobuvirus-like strain based on sequence analysis of the 3D region, was also included in this study. All of the strains were amplified by the VP1-specific primer pair: the amplicons were subjected to direct sequencing and compared with the VP1 nucleotide sequences of reference strains. The VP1 sequences of strains from the GenBank database revealed high nucleotide sequence identity at 84.3-100%. On the other hand, the nucleotide identities among the 15 porcine kobuvirus strains analyzed in this study ranged from 78.8 to 99.8%. The results revealed that diversity of the strains in this study were higher than those of the strains in previous studies. Furthermore, it was found that the VP1 region of the bovine kobuvirus-like strain, H023/2009/JP, clustered with nine porcine kobuvirus strains that were isolated in Thailand and Japan. Since this strain was previously found to be closely related to bovine kobuviruses in the 3D gene region, it may be a natural recombinant.

  2. Terminator oligo blocking efficiently eliminates rRNA from Drosophila small RNA sequencing libraries.

    PubMed

    Wickersheim, Michelle L; Blumenstiel, Justin P

    2013-11-01

    A large number of methods are available to deplete ribosomal RNA reads from high-throughput RNA sequencing experiments. Such methods are critical for sequencing Drosophila small RNAs between 20 and 30 nucleotides because size selection is not typically sufficient to exclude the highly abundant class of 30 nucleotide 2S rRNA. Here we demonstrate that pre-annealing terminator oligos complimentary to Drosophila 2S rRNA prior to 5' adapter ligation and reverse transcription efficiently depletes 2S rRNA sequences from the sequencing reaction in a simple and inexpensive way. This depletion is highly specific and is achieved with minimal perturbation of miRNA and piRNA profiles.

  3. The ura5 gene of the ascomycete Sordaria macrospora: molecular cloning, characterization and expression in Escherichia coli.

    PubMed

    Le Chevanton, L; Leblon, G

    1989-04-15

    We cloned the ura5 gene coding for the orotate phosphoribosyl transferase from the ascomycete Sordaria macrospora by heterologous probing of a Sordaria genomic DNA library with the corresponding Podospora anserina sequence. The Sordaria gene was expressed in an Escherichia coli pyrE mutant strain defective for the same enzyme, and expression was shown to be promoted by plasmid sequences. The nucleotide sequence of the 1246-bp DNA fragment encompassing the region of homology with the Podospora gene has been determined. This sequence contains an open reading frame of 699 nucleotides. The deduced amino acid sequence shows 72% similarity with the corresponding Podospora protein.

  4. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  5. LISTA, a comprehensive compilation of nucleotide sequences encoding proteins from the yeast Saccharomyces.

    PubMed Central

    Linder, P; Dölz, R; Mossé, M O; Lazowska, J; Slonimski, P P

    1993-01-01

    The amount of nucleotide sequence data is increasing exponentially. We therefore made an effort to make a comprehensive database (LISTA) for the yeast Saccharomyces cerevisiae. Each sequence has been attributed a single genetic name and in the case of allelic duplicated sequences, synonyms are given, if necessary. For the nomenclature we have introduced a standard principle for naming gene sequences based on priority rules. We have also applied a simple method to distinguish duplicated sequences of one and the same gene from non-allelic sequences of duplicated genes. By using these principles we have sorted out a lot of confusion in the literature and databanks. Along with the genetic name, the mnemonic from the EMBL databank, the codon bias, reference of the publication of the sequence and the EMBL accession numbers are included in each entry. PMID:8332521

  6. A computational proposal for designing structured RNA pools for in vitro selection of RNAs.

    PubMed

    Kim, Namhee; Gan, Hin Hark; Schlick, Tamar

    2007-04-01

    Although in vitro selection technology is a versatile experimental tool for discovering novel synthetic RNA molecules, finding complex RNA molecules is difficult because most RNAs identified from random sequence pools are simple motifs, consistent with recent computational analysis of such sequence pools. Thus, enriching in vitro selection pools with complex structures could increase the probability of discovering novel RNAs. Here we develop an approach for engineering sequence pools that links RNA sequence space regions with corresponding structural distributions via a "mixing matrix" approach combined with a graph theory analysis. We define five classes of mixing matrices motivated by covariance mutations in RNA; these constructs define nucleotide transition rates and are applied to chosen starting sequences to yield specific nonrandom pools. We examine the coverage of sequence space as a function of the mixing matrix and starting sequence via clustering analysis. We show that, in contrast to random sequences, which are associated only with a local region of sequence space, our designed pools, including a structured pool for GTP aptamers, can target specific motifs. It follows that experimental synthesis of designed pools can benefit from using optimized starting sequences, mixing matrices, and pool fractions associated with each of our constructed pools as a guide. Automation of our approach could provide practical tools for pool design applications for in vitro selection of RNAs and related problems.

  7. The human myelin oligodendrocyte glycoprotein (MOG) gene: Complete nucleotide sequence and structural characterization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Paule Roth, M.; Malfroy, L.; Offer, C.

    1995-07-20

    Human myelin oligodendrocyte glycoprotein (MOG), a myelin component of the central nervous system, is a candidate target antigen for autoimmune-mediated demyelination. We have isolated and sequenced part of a cosmid clone that contains the entire human MOG gene. The primary nuclear transcript, extending from the putative start of transcription to the site of poly(A) addition, is 15,561 nucleotides in length. The human MOG gene contains 8 exons, separated by 7 introns; canonical intron/exon boundary sites are observed at each junction. The introns vary in size from 242 to 6484 bp and contain numerous repetitive DNA elements, including 14 Alu sequencesmore » within 3 introns. Another Alu element is located in the 3{prime}-untranslated region of the gene. Alu sequences were classified with respect to subfamily assignment. Seven hundred sixty-three nucleotides 5{prime} of the transcription start and 1214 nucleotides 3{prime} of the poly(A) addition sites were also sequenced. The 5{prime}-flanking region revealed the presence of several consensus sequences that could be relevant in the transcription of the MOG gene, in particular binding sites in common with other myelin gene promoters. Two polymorphic intragenic dinucleotide (CA){sub n} and tetranucleotide (TAAA){sub n} repeats were identified and may provide genetic marker tools for association and linkage studies. 50 refs., 3 figs., 3 tabs.« less

  8. cDNA cloning of the human peroxisomal enoyl-CoA hydratase: 3-Hydroxyacyl-CoA dehydrogenase bifunctional enzyme and localization to chromosome 3q26. 3-3q28: A free left Alu arm is inserted in the 3[prime] noncoding region

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hoefler, G.; Forstner, M.; Hulla, W.

    1994-01-01

    Enoyl-CoA hydratase:3-hydroxyacyl-CoA dehydrogenase bifunctional enzyme is one of the four enzymes of the peroxisomal, [beta]-oxidation pathway. Here, the authors report the full-length human cDNA sequence and the localization of the corresponding gene on chromosome 3q26.3-3q28. The cDNA sequence spans 3779 nucleotides with an open reading frame of 2169 nucleotides. The tripeptide SKL at the carboxy terminus, known to serve as a peroxisomal targeting signal, is present. DNA sequence comparison of the coding region showed an 80% homology between human and rat bifunctional enzyme cDNA. The 3[prime] noncoding sequence contains 117 nucleotides homologous to an Alu repeat. Based on sequence comparison,more » they propose that these nucleotides are a free left Alu arm with 86% homology to the Alu-J family. RNA analysis shows one band with highest intensity in liver and kidney. This cDNA will allow in-depth studies of molecular defects in patients with defective peroxisomal bifunctional enzyme. Moreover, it will also provide a means for studying the regulation of peroxisomal [beta]-oxidation in humans. 33 refs., 5 figs.« less

  9. Phylogenetic Analysis of Myobia musculi (Schranck, 1781) by Using the 18S Small Ribosomal Subunit Sequence

    PubMed Central

    Feldman, Sanford H; Ntenda, Abraham M

    2011-01-01

    We used high-fidelity PCR to amplify 2 overlapping regions of the ribosomal gene complex from the rodent fur mite Myobia musculi. The amplicons encompassed a large portion of the mite's ribosomal gene complex spanning 3128 nucleotides containing the entire 18S rRNA, internal transcribed spacer (ITS) 1, 5.8S rRNA, ITS2, and a portion of the 5′-end of the 28S rRNA. M. musculi’s 179-nucleotide 5.8S rRNA nucleotide sequence was not conserved, so this region was identified by conservation of rRNA secondary structure. Maximum likelihood and Bayesian inference phylogenetic analyses were performed by using multiple sequence alignment consisting of 1524 nucleotides of M. musculi 18S rRNA and homologous sequences from 42 prostigmatid mites and the tick Dermacentor andersoni. The phylograms produced by both methods were in agreement regarding terminal, secondary, and some tertiary phylogenetic relationships among mites. Bayesian inference discriminated most infraordinal relationships between Eleutherengona and Parasitengona mites in the suborder Anystina. Basal relationships between suborders Anystina and Eupodina historically determined by comparing differences in anatomic characteristics were less well-supported by our molecular analysis. Our results recapitulated similar 18S rRNA sequence analyses recently reported. Our study supports M. musculi as belonging to the suborder Anystina, infraorder Eleutherenona, and superfamily Cheyletoidea. PMID:22330574

  10. Mapping RNA Structure In Vitro with SHAPE Chemistry and Next-Generation Sequencing (SHAPE-Seq).

    PubMed

    Watters, Kyle E; Lucks, Julius B

    2016-01-01

    Mapping RNA structure with selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) chemistry has proven to be a versatile method for characterizing RNA structure in a variety of contexts. SHAPE reagents covalently modify RNAs in a structure-dependent manner to create adducts at the 2'-OH group of the ribose backbone at nucleotides that are structurally flexible. The positions of these adducts are detected using reverse transcriptase (RT) primer extension, which stops one nucleotide before the modification, to create a pool of cDNAs whose lengths reflect the location of SHAPE modification. Quantification of the cDNA pools is used to estimate the "reactivity" of each nucleotide in an RNA molecule to the SHAPE reagent. High reactivities indicate nucleotides that are structurally flexible, while low reactivities indicate nucleotides that are inflexible. These SHAPE reactivities can then be used to infer RNA structures by restraining RNA structure prediction algorithms. Here, we provide a state-of-the-art protocol describing how to perform in vitro RNA structure probing with SHAPE chemistry using next-generation sequencing to quantify cDNA pools and estimate reactivities (SHAPE-Seq). The use of next-generation sequencing allows for higher throughput, more consistent data analysis, and multiplexing capabilities. The technique described herein, SHAPE-Seq v2.0, uses a universal reverse transcription priming site that is ligated to the RNA after SHAPE modification. The introduced priming site allows for the structural analysis of an RNA independent of its sequence.

  11. Transcript-specific, single-nucleotide polymorphism discovery and linkage analysis in hexaploid bread wheat (Triticum aestivum L.).

    PubMed

    Allen, Alexandra M; Barker, Gary L A; Berry, Simon T; Coghill, Jane A; Gwilliam, Rhian; Kirby, Susan; Robinson, Phil; Brenchley, Rachel C; D'Amore, Rosalinda; McKenzie, Neil; Waite, Darren; Hall, Anthony; Bevan, Michael; Hall, Neil; Edwards, Keith J

    2011-12-01

    Food security is a global concern and substantial yield increases in cereal crops are required to feed the growing world population. Wheat is one of the three most important crops for human and livestock feed. However, the complexity of the genome coupled with a decline in genetic diversity within modern elite cultivars has hindered the application of marker-assisted selection (MAS) in breeding programmes. A crucial step in the successful application of MAS in breeding programmes is the development of cheap and easy to use molecular markers, such as single-nucleotide polymorphisms. To mine selected elite wheat germplasm for intervarietal single-nucleotide polymorphisms, we have used expressed sequence tags derived from public sequencing programmes and next-generation sequencing of normalized wheat complementary DNA libraries, in combination with a novel sequence alignment and assembly approach. Here, we describe the development and validation of a panel of 1114 single-nucleotide polymorphisms in hexaploid bread wheat using competitive allele-specific polymerase chain reaction genotyping technology. We report the genotyping results of these markers on 23 wheat varieties, selected to represent a broad cross-section of wheat germplasm including a number of elite UK varieties. Finally, we show that, using relatively simple technology, it is possible to rapidly generate a linkage map containing several hundred single-nucleotide polymorphism markers in the doubled haploid mapping population of Avalon × Cadenza. © 2011 The Authors. Plant Biotechnology Journal © 2011 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd.

  12. Natural History of Human Respiratory Syncytial Virus Inferred from Phylogenetic Analysis of the Attachment (G) Glycoprotein with a 60-Nucleotide Duplication

    PubMed Central

    Trento, Alfonsina; Viegas, Mariana; Galiano, Mónica; Videla, Cristina; Carballal, Guadalupe; Mistchenko, Alicia S.; Melero, José A.

    2006-01-01

    A total of 47 clinical samples were identified during an active surveillance program of respiratory infections in Buenos Aires (BA) (1999 to 2004) that contained sequences of human respiratory syncytial virus (HRSV) with a 60-nucleotide duplication in the attachment (G) protein gene. This duplication was analogous to that previously described for other three viruses also isolated in Buenos Aires in 1999 (A. Trento et al., J. Gen. Virol. 84:3115-3120, 2003). Phylogenetic analysis indicated that BA sequences with that duplication shared a common ancestor (dated about 1998) with other HRSV G sequences reported worldwide after 1999. The duplicated nucleotide sequence was an exact copy of the preceding 60 nucleotides in early viruses, but both copies of the duplicated segment accumulated nucleotide substitutions in more recent viruses at a rate apparently higher than in other regions of the G protein gene. The evolution of the viruses with the duplicated G segment apparently followed the overall evolutionary pattern previously described for HRSV, and this genotype has replaced other prevailing antigenic group B genotypes in Buenos Aires and other places. Thus, the duplicated segment represents a natural tag that can be used to track the dissemination and evolution of HRSV in an unprecedented setting. We have taken advantage of this situation to reexamine the molecular epidemiology of HRSV and to explore the natural history of this important human pathogen. PMID:16378999

  13. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms

    PubMed Central

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450

  14. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    PubMed

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  15. Molecular cloning and nucleotide sequence of a transforming gene detected by transfection of chicken B-cell lymphoma DNA

    NASA Astrophysics Data System (ADS)

    Goubin, Gerard; Goldman, Debra S.; Luce, Judith; Neiman, Paul E.; Cooper, Geoffrey M.

    1983-03-01

    A transforming gene detected by transfection of chicken B-cell lymphoma DNA has been isolated by molecular cloning. It is homologous to a conserved family of sequences present in normal chicken and human DNAs but is not related to transforming genes of acutely transforming retroviruses. The nucleotide sequence of the cloned transforming gene suggests that it encodes a protein that is partially homologous to the amino terminus of transferrin and related proteins although only about one tenth the size of transferrin.

  16. Nucleotide Sequence Analysis of RNA Synthesized from Rabbit Globin Complementary DNA

    PubMed Central

    Poon, Raymond; Paddock, Gary V.; Heindell, Howard; Whitcome, Philip; Salser, Winston; Kacian, Dan; Bank, Arthur; Gambino, Roberto; Ramirez, Francesco

    1974-01-01

    Rabbit globin complementary DNA made with RNA-dependent DNA polymerase (reverse transcriptase) was used as template for in vitro synthesis of 32P-labeled RNA. The sequences of the nucleotides in most of the fragments resulting from combined ribonuclease T1 and alkaline phosphatase digestion have been determined. Several fragments were long enough to fit uniquely with the α or β globin amino-acid sequences. These data demonstrate that the cDNA was copied from globin mRNA and contained no detectable contaminants. Images PMID:4139714

  17. Nucleotide Sequence of the blaRTG-2 (CARB-5) Gene and Phylogeny of a New Group of Carbenicillinases

    PubMed Central

    Choury, Daniele; Szajnert, Marie-France; Joly-Guillou, Marie-Laure; Azibi, Kemal; Delpech, Marc; Paul, Gérard

    2000-01-01

    We determined the nucleotide sequence of the bla gene for the Acinetobacter calcoaceticus β-lactamase previously described as CARB-5. Alignment of the deduced amino acid sequence with those of known β-lactamases revealed that CARB-5 possesses an RTG triad in box VII, as described for the Proteus mirabilis GN79 enzyme, instead of the RSG consensus characteristic of the other carbenicillinases. Phylogenetic studies showed that these RTG enzymes constitute a new, separate group, possibly ancestors of the carbenicillinase family. PMID:10722515

  18. Mosaic organization of DNA nucleotides

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Havlin, S.; Simons, M.; Stanley, H. E.; Goldberger, A. L.

    1994-01-01

    Long-range power-law correlations have been reported recently for DNA sequences containing noncoding regions. We address the question of whether such correlations may be a trivial consequence of the known mosaic structure ("patchiness") of DNA. We analyze two classes of controls consisting of patchy nucleotide sequences generated by different algorithms--one without and one with long-range power-law correlations. Although both types of sequences are highly heterogenous, they are quantitatively distinguishable by an alternative fluctuation analysis method that differentiates local patchiness from long-range correlations. Application of this analysis to selected DNA sequences demonstrates that patchiness is not sufficient to account for long-range correlation properties.

  19. Asystasia mosaic Madagascar virus: a novel bipartite begomovirus infecting the weed Asystasia gangetica in Madagascar.

    PubMed

    De Bruyn, Alexandre; Harimalala, Mireille; Hoareau, Murielle; Ranomenjanahary, Sahondramalala; Reynaud, Bernard; Lefeuvre, Pierre; Lett, Jean-Michel

    2015-06-01

    Here, we describe for the first time the complete genome sequence of a new bipartite begomovirus in Madagascar isolated from the weed Asystasia gangetica (Acanthaceae), for which we propose the tentative name asystasia mosaic Madagascar virus (AMMGV). DNA-A and -B nucleotide sequences of AMMGV were only distantly related to known begomovirus sequence and shared highest nucleotide sequence identity of 72.9 % (DNA-A) and 66.9 % (DNA-B) with a recently described bipartite begomovirus infecting Asystasia sp. in West Africa. Phylogenetic analysis demonstrated that this novel virus from Madagascar belongs to a new lineage of Old World bipartite begomoviruses.

  20. Nucleotide sequence of a chickpea chlorotic stunt virus relative that infects pea and faba bean in China.

    PubMed

    Zhou, Cui-Ji; Xiang, Hai-Ying; Zhuo, Tao; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui

    2012-07-01

    We determined the genome sequence of a new polerovirus that infects field pea and faba bean in China. Its entire nucleotide sequence (6021 nt) was most closely related (83.3% identity) to that of an Ethiopian isolate of chickpea chlorotic stunt virus (CpCSV-Eth). With the exception of the coat protein (encoded by ORF3), amino acid sequence identities of all gene products of this virus to those of CpCSV-Eth and other poleroviruses were <90%. This suggests that it is a new member of the genus Polerovirus, and the name pea mild chlorosis virus is proposed.

  1. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.

    PubMed

    Bolleman, Jerven T; Mungall, Christopher J; Strozzi, Francesco; Baran, Joachim; Dumontier, Michel; Bonnal, Raoul J P; Buels, Robert; Hoehndorf, Robert; Fujisawa, Takatomo; Katayama, Toshiaki; Cock, Peter J A

    2016-06-13

    Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.

  2. Nucleotide sequences of immunoglobulin eta genes of chimpanzee and orangutan: DNA molecular clock and hominoid evolution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sakoyama, Y.; Hong, K.J.; Byun, S.M.

    To determine the phylogenetic relationships among hominoids and the dates of their divergence, the complete nucleotide sequences of the constant region of the immunoglobulin eta-chain (C/sub eta1/) genes from chimpanzee and orangutan have been determined. These sequences were compared with the human eta-chain constant-region sequence. A molecular clock (silent molecular clock), measured by the degree of sequence divergence at the synonymous (silent) positions of protein-encoding regions, was introduced for the present study. From the comparison of nucleotide sequences of ..cap alpha../sub 1/-antitrypsin and ..beta..- and delta-globulin genes between humans and Old World monkeys, the silent molecular clock was calibrated: themore » mean evolutionary rate of silent substitution was determined to be 1.56 x 10/sup -9/ substitutions per site per year. Using the silent molecular clock, the mean divergence dates of chimpanzee and orangutan from the human lineage were estimated as 6.4 +/- 2.6 million years and 17.3 +/- 4.5 million years, respectively. It was also shown that the evolutionary rate of primate genes is considerably slower than those of other mammalian genes.« less

  3. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

    DOE PAGES

    Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco; ...

    2016-06-13

    Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less

  4. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale.

    PubMed

    Liu, Siyang; Huang, Shujia; Rao, Junhua; Ye, Weijian; Krogh, Anders; Wang, Jun

    2015-01-01

    Comprehensive recognition of genomic variation in one individual is important for understanding disease and developing personalized medication and treatment. Many tools based on DNA re-sequencing exist for identification of single nucleotide polymorphisms, small insertions and deletions (indels) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction of population-scale pan-genomes. Our study also highlights the usefulness of the de novo assembly strategy for definition of genome structure.

  5. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco

    Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less

  6. NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

    PubMed

    Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N

    2016-11-01

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.

  7. Assessing Diversity of DNA Structure-Related Sequence Features in Prokaryotic Genomes

    PubMed Central

    Huang, Yongjie; Mrázek, Jan

    2014-01-01

    Prokaryotic genomes are diverse in terms of their nucleotide and oligonucleotide composition as well as presence of various sequence features that can affect physical properties of the DNA molecule. We present a survey of local sequence patterns which have a potential to promote non-canonical DNA conformations (i.e. different from standard B-DNA double helix) and interpret the results in terms of relationships with organisms' habitats, phylogenetic classifications, and other characteristics. Our present work differs from earlier similar surveys not only by investigating a wider range of sequence patterns in a large number of genomes but also by using a more realistic null model to assess significant deviations. Our results show that simple sequence repeats and Z-DNA-promoting patterns are generally suppressed in prokaryotic genomes, whereas palindromes and inverted repeats are over-represented. Representation of patterns that promote Z-DNA and intrinsic DNA curvature increases with increasing optimal growth temperature (OGT), and decreases with increasing oxygen requirement. Additionally, representations of close direct repeats, palindromes and inverted repeats exhibit clear negative trends with increasing OGT. The observed relationships with environmental characteristics, particularly OGT, suggest possible evolutionary scenarios of structural adaptation of DNA to particular environmental niches. PMID:24408877

  8. Predicting protein-binding regions in RNA using nucleotide profiles and compositions.

    PubMed

    Choi, Daesik; Park, Byungkyu; Chae, Hanju; Lee, Wook; Han, Kyungsook

    2017-03-14

    Motivated by the increased amount of data on protein-RNA interactions and the availability of complete genome sequences of several organisms, many computational methods have been proposed to predict binding sites in protein-RNA interactions. However, most computational methods are limited to finding RNA-binding sites in proteins instead of protein-binding sites in RNAs. Predicting protein-binding sites in RNA is more challenging than predicting RNA-binding sites in proteins. Recent computational methods for finding protein-binding sites in RNAs have several drawbacks for practical use. We developed a new support vector machine (SVM) model for predicting protein-binding regions in mRNA sequences. The model uses sequence profiles constructed from log-odds scores of mono- and di-nucleotides and nucleotide compositions. The model was evaluated by standard 10-fold cross validation, leave-one-protein-out (LOPO) cross validation and independent testing. Since actual mRNA sequences have more non-binding regions than protein-binding regions, we tested the model on several datasets with different ratios of protein-binding regions to non-binding regions. The best performance of the model was obtained in a balanced dataset of positive and negative instances. 10-fold cross validation with a balanced dataset achieved a sensitivity of 91.6%, a specificity of 92.4%, an accuracy of 92.0%, a positive predictive value (PPV) of 91.7%, a negative predictive value (NPV) of 92.3% and a Matthews correlation coefficient (MCC) of 0.840. LOPO cross validation showed a lower performance than the 10-fold cross validation, but the performance remains high (87.6% accuracy and 0.752 MCC). In testing the model on independent datasets, it achieved an accuracy of 82.2% and an MCC of 0.656. Testing of our model and other state-of-the-art methods on a same dataset showed that our model is better than the others. Sequence profiles of log-odds scores of mono- and di-nucleotides were much more powerful features than nucleotide compositions in finding protein-binding regions in RNA sequences. But, a slight performance gain was obtained when using the sequence profiles along with nucleotide compositions. These are preliminary results of ongoing research, but demonstrate the potential of our approach as a powerful predictor of protein-binding regions in RNA. The program and supporting data are available at http://bclab.inha.ac.kr/RBPbinding .

  9. Nucleic acids encoding antifungal polypeptides and uses thereof

    DOEpatents

    Altier, Daniel J.; Ellanskaya, I. A.; Gilliam, Jacob T.; Hunter-Cevera, Jennie; Presnail, James K; Schepers, Eric; Simmons, Carl R.; Torok, Tamas; Yalpani, Nasser

    2010-11-02

    Compositions and methods for protecting a plant from a pathogen, particularly a fungal pathogen, are provided. Compositions include an amino acid sequence, and variants and fragments thereof, for an antipathogenic polypeptide that was isolated from a fungal fermentation broth. Nucleic acid molecules that encode the antipathogenic polypeptides of the invention, and antipathogenic domains thereof, are also provided. A method for inducing pathogen resistance in a plant using the nucleotide sequences disclosed herein is further provided. The method comprises introducing into a plant an expression cassette comprising a promoter operably linked to a nucleotide sequence that encodes an antipathogenic polypeptide of the invention. Compositions comprising an antipathogenic polypeptide or a transformed microorganism comprising a nucleic acid of the invention in combination with a carrier and methods of using these compositions to protect a plant from a pathogen are further provided. Transformed plants, plant cells, seeds, and microorganisms comprising a nucleotide sequence that encodes an antipathogenic polypeptide of the invention are also disclosed.

  10. RNA Editing in Plant Mitochondria

    NASA Astrophysics Data System (ADS)

    Hiesel, Rudolf; Wissinger, Bernd; Schuster, Wolfgang; Brennicke, Axel

    1989-12-01

    Comparative sequence analysis of genomic and complementary DNA clones from several mitochondrial genes in the higher plant Oenothera revealed nucleotide sequence divergences between the genomic and the messenger RNA-derived sequences. These sequence alterations could be most easily explained by specific post-transcriptional nucleotide modifications. Most of the nucleotide exchanges in coding regions lead to altered codons in the mRNA that specify amino acids better conserved in evolution than those encoded by the genomic DNA. Several instances show that the genomic arginine codon CGG is edited in the mRNA to the tryptophan codon TGG in amino acid positions that are highly conserved as tryptophan in the homologous proteins of other species. This editing suggests that the standard genetic code is used in plant mitochondria and resolves the frequent coincidence of CGG codons and tryptophan in different plant species. The apparently frequent and non-species-specific equivalency of CGG and TGG codons in particular suggests that RNA editing is a common feature of all higher plant mitochondria.

  11. Simian immunodeficiency viruses from African green monkeys display unusual genetic diversity.

    PubMed Central

    Johnson, P R; Fomsgaard, A; Allan, J; Gravell, M; London, W T; Olmsted, R A; Hirsch, V M

    1990-01-01

    African green monkeys are asymptomatic carriers of simian immunodeficiency viruses (SIV), commonly called SIVagm. As many as 50% of African green monkeys in the wild may be SIV seropositive. This high seroprevalence rate and the potential for genetic variation of lentiviruses suggested to us that African green monkeys may harbor widely differing genotypes of SIVagm. To investigate this hypothesis, we determined the entire nucleotide sequence of an infectious proviral molecular clone of SIVagm (155-4) and partial sequences (long terminal repeat and Gag) of three other distinct SIVagm isolates (90, gri-1, and ver-1). Comparisons among the SIVagm isolates revealed extreme diversity at the nucleotide and amino acid levels. Long terminal repeat nucleotide sequences varied up to 35% and Gag protein sequences varied up to 30%. The variability among SIVagm isolates exceeded the variability among any other group of primate lentiviruses. Our data suggest that SIVagm has been in the African green monkey population for a long time and may be the oldest primate lentivirus group in existence. PMID:2304139

  12. Molecular characterization of chikungunya virus from Andhra Pradesh, India & phylogenetic relationship with Central African isolates.

    PubMed

    M Naresh Kumar, C V; Anthony Johnson, A M; R Sai Gopal, D V

    2007-12-01

    Chikungunya virus has caused numerous large outbreaks in India. Suspected blood samples from the epidemic were collected and characterized for the identification of the responsible causative from Rayalaseema region of Andhra Pradesh. RT-PCR was used for screening of suspected blood samples. Primers were designed to amplify partial E1 gene and the amplified fragment was cloned and sequenced. The sequence was analyzed and compared with other geographical isolates to find the phylogenetic relationship. The sequence was submitted to the Gen bank DNA database (accession DQ888620). Comparative nucleotide homology analysis of the AP Ra-CTR isolate with the other isolates revealed 94.7+/-3.6 per cent of homology of CHIKAPRa-CTR with other isolates of Chikungunya virus at nucleotide level and 96.8+/-3.2 per cent of homology at amino acid level. The current epidemic was caused by the Central African genotype of CHIKV, grouped in Central Africa cluster in phylogenetic trees generated based on nucleotide and amino acid sequences.

  13. 3D RNA and functional interactions from evolutionary couplings

    PubMed Central

    Weinreb, Caleb; Riesselman, Adam; Ingraham, John B.; Gross, Torsten; Sander, Chris; Marks, Debora S.

    2016-01-01

    Summary Non-coding RNAs are ubiquitous, but the discovery of new RNA gene sequences far outpaces research on their structure and functional interactions. We mine the evolutionary sequence record to derive precise information about function and structure of RNAs and RNA-protein complexes. As in protein structure prediction, we use maximum entropy global probability models of sequence co-variation to infer evolutionarily constrained nucleotide-nucleotide interactions within RNA molecules, and nucleotide-amino acid interactions in RNA-protein complexes. The predicted contacts allow all-atom blinded 3D structure prediction at good accuracy for several known RNA structures and RNA-protein complexes. For unknown structures, we predict contacts in 160 non-coding RNA families. Beyond 3D structure prediction, evolutionary couplings help identify important functional interactions, e.g., at switch points in riboswitches and at a complex nucleation site in HIV. Aided by accelerating sequence accumulation, evolutionary coupling analysis can accelerate the discovery of functional interactions and 3D structures involving RNA. PMID:27087444

  14. Telling apart Felidae and Ursidae from the distribution of nucleotides in mitochondrial DNA

    NASA Astrophysics Data System (ADS)

    Rovenchak, Andrij

    2018-02-01

    Rank-frequency distributions of nucleotide sequences in mitochondrial DNA are defined in a way analogous to the linguistic approach, with the highest-frequent nucleobase serving as a whitespace. For such sequences, entropy and mean length are calculated. These parameters are shown to discriminate the species of the Felidae (cats) and Ursidae (bears) families. From purely numerical values we are able to see in particular that giant pandas are bears while koalas are not. The observed linear relation between the parameters is explained using a simple probabilistic model. The approach based on the non-additive generalization of the Bose distribution is used to analyze the frequency spectra of the nucleotide sequences. In this case, the separation of families is not very sharp. Nevertheless, the distributions for Felidae have on average longer tails comparing to Ursidae.

  15. MSuPDA: A Memory Efficient Algorithm for Sequence Alignment.

    PubMed

    Khan, Mohammad Ibrahim; Kamal, Md Sarwar; Chowdhury, Linkon

    2016-03-01

    Space complexity is a million dollar question in DNA sequence alignments. In this regard, memory saving under pushdown automata can help to reduce the occupied spaces in computer memory. Our proposed process is that anchor seed (AS) will be selected from given data set of nucleotide base pairs for local sequence alignment. Quick splitting techniques will separate the AS from all the DNA genome segments. Selected AS will be placed to pushdown automata's (PDA) input unit. Whole DNA genome segments will be placed into PDA's stack. AS from input unit will be matched with the DNA genome segments from stack of PDA. Match, mismatch and indel of nucleotides will be popped from the stack under the control unit of pushdown automata. During the POP operation on stack, it will free the memory cell occupied by the nucleotide base pair.

  16. OligArch: A software tool to allow artificially expanded genetic information systems (AEGIS) to guide the autonomous self-assembly of long DNA constructs from multiple DNA single strands.

    PubMed

    Bradley, Kevin M; Benner, Steven A

    2014-01-01

    Synthetic biologists wishing to self-assemble large DNA (L-DNA) constructs from small DNA fragments made by automated synthesis need fragments that hybridize predictably. Such predictability is difficult to obtain with nucleotides built from just the four standard nucleotides. Natural DNA's peculiar combination of strong and weak G:C and A:T pairs, the context-dependence of the strengths of those pairs, unimolecular strand folding that competes with desired interstrand hybridization, and non-Watson-Crick interactions available to standard DNA, all contribute to this unpredictability. In principle, adding extra nucleotides to the genetic alphabet can improve the predictability and reliability of autonomous DNA self-assembly, simply by increasing the information density of oligonucleotide sequences. These extra nucleotides are now available as parts of artificially expanded genetic information systems (AEGIS), and tools are now available to generate entirely standard DNA from AEGIS DNA during PCR amplification. Here, we describe the OligArch (for "oligonucleotide architecting") software, an application that permits synthetic biologists to engineer optimally self-assembling DNA constructs from both six- and eight-letter AEGIS alphabets. This software has been used to design oligonucleotides that self-assemble to form complete genes from 20 or more single-stranded synthetic oligonucleotides. OligArch is therefore a key element of a scalable and integrated infrastructure for the rapid and designed engineering of biology.

  17. A Sequence-Independent Strategy for Detection and Cloning of Circular DNA Virus Genomes by Using Multiply Primed Rolling-Circle Amplification

    PubMed Central

    Rector, Annabel; Tachezy, Ruth; Van Ranst, Marc

    2004-01-01

    The discovery of novel viruses has often been accomplished by using hybridization-based methods that necessitate the availability of a previously characterized virus genome probe or knowledge of the viral nucleotide sequence to construct consensus or degenerate PCR primers. In their natural replication cycle, certain viruses employ a rolling-circle mechanism to propagate their circular genomes, and multiply primed rolling-circle amplification (RCA) with φ29 DNA polymerase has recently been applied in the amplification of circular plasmid vectors used in cloning. We employed an isothermal RCA protocol that uses random hexamer primers to amplify the complete genomes of papillomaviruses without the need for prior knowledge of their DNA sequences. We optimized this RCA technique with extracted human papillomavirus type 16 (HPV-16) DNA from W12 cells, using a real-time quantitative PCR assay to determine amplification efficiency, and obtained a 2.4 × 104-fold increase in HPV-16 DNA concentration. We were able to clone the complete HPV-16 genome from this multiply primed RCA product. The optimized protocol was subsequently applied to a bovine fibropapillomatous wart tissue sample. Whereas no papillomavirus DNA could be detected by restriction enzyme digestion of the original sample, multiply primed RCA enabled us to obtain a sufficient amount of papillomavirus DNA for restriction enzyme analysis, cloning, and subsequent sequencing of a novel variant of bovine papillomavirus type 1. The multiply primed RCA method allows the discovery of previously unknown papillomaviruses, and possibly also other circular DNA viruses, without a priori sequence information. PMID:15113879

  18. High Degree of Interlaboratory Reproducibility of Human Immunodeficiency Virus Type 1 Protease and Reverse Transcriptase Sequencing of Plasma Samples from Heavily Treated Patients

    PubMed Central

    Shafer, Robert W.; Hertogs, Kurt; Zolopa, Andrew R.; Warford, Ann; Bloor, Stuart; Betts, Bradley J.; Merigan, Thomas C.; Harrigan, Richard; Larder, Brendon A.

    2001-01-01

    We assessed the reproducibility of human immunodeficiency virus type 1 (HIV-1) reverse transcriptase (RT) and protease sequencing using cryopreserved plasma aliquots obtained from 46 heavily treated HIV-1-infected individuals in two laboratories using dideoxynucleotide sequencing. The rates of complete sequence concordance between the two laboratories were 99.1% for the protease sequence and 99.0% for the RT sequence. Approximately 90% of the discordances were partial, defined as one laboratory detecting a mixture and the second laboratory detecting only one of the mixture's components. Only 0.1% of the nucleotides were completely discordant between the two laboratories, and these were significantly more likely to occur in plasma samples with lower plasma HIV-1 RNA levels. Nucleotide mixtures were detected at approximately 1% of the nucleotide positions, and in every case in which one laboratory detected a mixture, the second laboratory either detected the same mixture or detected one of the mixture's components. The high rate of concordance in detecting mixtures and the fact that most discordances between the two laboratories were partial suggest that most discordances were caused by variation in sampling of the HIV-1 quasispecies by PCR rather than by technical errors in the sequencing process itself. PMID:11283081

  19. Molecular cloning and nucleotide sequence of the alpha and beta subunits of allophycocyanin from the cyanelle genome of Cyanophora paradoxa.

    PubMed Central

    Bryant, D A; de Lorimier, R; Lambert, D H; Dubbs, J M; Stirewalt, V L; Stevens, S E; Porter, R D; Tam, J; Jay, E

    1985-01-01

    The genes for the alpha- and beta-subunit apoproteins of allophycocyanin (AP) were isolated from the cyanelle genome of Cyanophora paradoxa and subjected to nucleotide sequence analysis. The AP beta-subunit apoprotein gene was localized to a 7.8-kilobase-pair Pst I restriction fragment from cyanelle DNA by hybridization with a tetradecameric oligonucleotide probe. Sequence analysis using that oligonucleotide and its complement as primers for the dideoxy chain-termination sequencing method confirmed the presence of both AP alpha- and beta-subunit genes on this restriction fragment. Additional oligonucleotide primers were synthesized as sequencing progressed and were used to determine rapidly the nucleotide sequence of a 1336-base-pair region of this cloned fragment. This strategy allowed the sequencing to be completed without a detailed restriction map and without extensive and time-consuming subcloning. The sequenced region contains two open reading frames whose deduced amino acid sequences are 81-85% homologous to cyanobacterial and red algal AP subunits whose amino acid sequences have been determined. The two open reading frames are in the same orientation and are separated by 39 base pairs. AP alpha is 5' to AP beta and both coding sequences are preceded by a polypurine, Shine-Dalgarno-type sequence. Sequences upstream from AP alpha closely resemble the Escherichia coli consensus promoter sequences and also show considerable homology to promoter sequences for several chloroplast-encoded psbA genes. A 56-base-pair palindromic sequence downstream from the AP beta gene could play a role in the termination of transcription or translation. The allophycocyanin apoprotein subunit genes are located on the large single-copy region of the cyanelle genome. PMID:2987916

  20. Begomoviruses infecting weeds in Cuba: increased host range and a novel virus infecting Sida rhombifolia.

    PubMed

    Fiallo-Olivé, Elvira; Navas-Castillo, Jesús; Moriones, Enrique; Martínez-Zubiaur, Yamila

    2012-01-01

    As a result of surveys conducted during the last few years to search for wild reservoirs of begomoviruses in Cuba, we detected a novel bipartite begomovirus, sida yellow mottle virus (SiYMoV), infecting Sida rhombifolia plants. The complete genome sequence was obtained, showing that DNA-A was 2622 nucleotides (nt) in length and that it was most closely related (87.6% nucleotide identity) to DNA-A of an isolate of sida golden mosaic virus (SiGMV) that infects snap beans (Phaseolus vulgaris) in Florida. The DNA-B sequence was 2600 nt in length and shared the highest nucleotide identity (75.1%) with corchorus yellow spot virus (CoYSV). Phylogenetic relationship analysis showed that both DNA components of SiYMoV were grouped in the Abutilon clade, along with begomoviruses from Florida and the Caribbean islands. We also present here the complete nucleotide sequence of a novel strain of sida yellow vein virus found infecting Malvastrum coromandelianum and an isolate of euphorbia mosaic virus that was found for the first time infecting Euphorbia heterophylla in Cuba.

  1. [Determination of genetic bases of auxotrophy in Yersinia pestis ssp. caucasica strains].

    PubMed

    Odinokov, G N; Eroshenko, G A; Kukleva, L M; Shavina, N Iu; Krasnov, Ia M; Kutyrev, V V

    2012-04-01

    Based on the results of computer analysis of nucleotide sequences in strains Yersinia pestis and Y. pseudotuberculosis recorded in the files of NCBI GenBank database, differences between genes argA, aroG, aroF, thiH, and thiG of strain Pestoides F (subspecies caucasica) were found, compared to other strains of plaque agent and pseudotuberculosis microbe. Using PCR with calculated primers and the method of sequence analysis, the structure of variable regions of these genes was studied in 96 natural Y. pestis and Y. pseudotuberculosis strains. It was shown that all examined strains of subspecies caucasica, unlike strains of plague-causing agent of other subspecies and pseudotubercolosis microbe, had identical mutations in genes argA (integration of the insertion sequence IS100), aroG (insertion of ten nucleotides), aroF (inserion of IS100), thiH (insertion of nucleotide T), and thiG (deletion of 13 nucleotides). These mutations are the reason for the absence in strains belonging to this subspecies of the ability to synthesize arginine, phenylalanine, tyrosine, and vitamin B1 (thiamine), and cause their auxotrophy for these growth factors.

  2. Comprehensive analysis of the T-cell receptor beta chain gene in rhesus monkey by high throughput sequencing

    PubMed Central

    Li, Zhoufang; Liu, Guangjie; Tong, Yin; Zhang, Meng; Xu, Ying; Qin, Li; Wang, Zhanhui; Chen, Xiaoping; He, Jiankui

    2015-01-01

    Profiling immune repertoires by high throughput sequencing enhances our understanding of immune system complexity and immune-related diseases in humans. Previously, cloning and Sanger sequencing identified limited numbers of T cell receptor (TCR) nucleotide sequences in rhesus monkeys, thus their full immune repertoire is unknown. We applied multiplex PCR and Illumina high throughput sequencing to study the TCRβ of rhesus monkeys. We identified 1.26 million TCRβ sequences corresponding to 643,570 unique TCRβ sequences and 270,557 unique complementarity-determining region 3 (CDR3) gene sequences. Precise measurements of CDR3 length distribution, CDR3 amino acid distribution, length distribution of N nucleotide of junctional region, and TCRV and TCRJ gene usage preferences were performed. A comprehensive profile of rhesus monkey immune repertoire might aid human infectious disease studies using rhesus monkeys. PMID:25961410

  3. [The genetical evolution of the full length genes of 5 EV 71 strains from 5 Shenzhen patients with hand-food-mouth disease associated with EV71 infection].

    PubMed

    Liu, Wei-long; Yang, Gui-lin; Wei, Qing; Zhang, Ming-xia; Chen, Xin-chun; Liu, Ying-xia; Gao, Yang; Zhou, Bo-ping

    2011-02-01

    To investigate the characteristics of molecular epidemiology and molecular evolution of 5 EV 71 (enterovirus 71, EV71) strains from 5 Shenzhen patients with hand-food-mouth disease associated with EV 71 infection. 5 EV 71 strains were isolated, and sequenced to analyzed the full length gene sequences in order to compare nucleotide and amino acid homology with other EV71 strains from other regions and countries as well as previous strains across the world through bioinformatics software. 5 strains of EV 71 belonged to sub-genotype C4 by analysis of nucleotide sequences of VP1 and VP4 of EV 71. The differences of nucleotide and amino acid sequences were much small with nucleotide homology of 93% and amino acid homology of 98% among these 5 strains. A phylogenetic tree analysis indicated that 2008 Shenzhen epidemic strains were the most close to 2004 Shenzhen circulating strains, and also much close to 1998 Shenzhen epidemic strains and 2008 Fuyang Anhui strains. The dead strain was very close to 2008 Fuyang Anhui epidemic strains. It can be speculated that this epidemic strains of EV 71 probably originate from the same ancient strain in the history, may from 1998 Shenzhen strain.

  4. TFBSshape: a motif database for DNA shape features of transcription factor binding sites.

    PubMed

    Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W; Gordân, Raluca; Rohs, Remo

    2014-01-01

    Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein-DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone.

  5. TFBSshape: a motif database for DNA shape features of transcription factor binding sites

    PubMed Central

    Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W.; Gordân, Raluca; Rohs, Remo

    2014-01-01

    Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein–DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone. PMID:24214955

  6. SNPGenie: estimating evolutionary parameters to detect natural selection using pooled next-generation sequencing data.

    PubMed

    Nelson, Chase W; Moncla, Louise H; Hughes, Austin L

    2015-11-15

    New applications of next-generation sequencing technologies use pools of DNA from multiple individuals to estimate population genetic parameters. However, no publicly available tools exist to analyse single-nucleotide polymorphism (SNP) calling results directly for evolutionary parameters important in detecting natural selection, including nucleotide diversity and gene diversity. We have developed SNPGenie to fill this gap. The user submits a FASTA reference sequence(s), a Gene Transfer Format (.GTF) file with CDS information and a SNP report(s) in an increasing selection of formats. The program estimates nucleotide diversity, distance from the reference and gene diversity. Sites are flagged for multiple overlapping reading frames, and are categorized by polymorphism type: nonsynonymous, synonymous, or ambiguous. The results allow single nucleotide, single codon, sliding window, whole gene and whole genome/population analyses that aid in the detection of positive and purifying natural selection in the source population. SNPGenie version 1.2 is a Perl program with no additional dependencies. It is free, open-source, and available for download at https://github.com/hugheslab/snpgenie. nelsoncw@email.sc.edu or austin@biol.sc.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. Altering the spectrum of immunoglobulin V gene somatic hypermutation by modifying the active site of AID.

    PubMed

    Wang, Meng; Rada, Cristina; Neuberger, Michael S

    2010-01-18

    High-affinity antibodies are generated by somatic hypermutation with nucleotide substitutions introduced into the IgV in a semirandom fashion, but with intrinsic mutational hotspots strategically located to optimize antibody affinity maturation. The process is dependent on activation-induced deaminase (AID), an enzyme that can deaminate deoxycytidine in DNA in vitro, where its activity is sensitive to the identity of the 5'-flanking nucleotide. As a critical test of whether such DNA deamination activity underpins antibody diversification and to gain insight into the extent to which the antibody mutation spectrum is dependent on the intrinsic substrate specificity of AID, we investigated whether it is possible to change the IgV mutation spectrum by altering AID's active site such that it prefers a pyrimidine (rather than a purine) flanking the targeted deoxycytidine. Consistent with the DNA deamination mechanism, B cells expressing the modified AID proteins yield altered IgV mutation spectra (exhibiting a purine-->pyrimidine shift in flanking nucleotide preference) and altered hotspots. However, AID-catalyzed deamination of IgV targets in vitro does not yield the same degree of hotspot dominance to that observed in vivo, indicating the importance of features beyond AID's active site and DNA local sequence environment in determining in vivo hotspot dominance.

  8. Genetic Code Optimization for Cotranslational Protein Folding: Codon Directional Asymmetry Correlates with Antiparallel Betasheets, tRNA Synthetase Classes.

    PubMed

    Seligmann, Hervé; Warthi, Ganesh

    2017-01-01

    A new codon property, codon directional asymmetry in nucleotide content (CDA), reveals a biologically meaningful genetic code dimension: palindromic codons (first and last nucleotides identical, codon structure XZX) are symmetric (CDA = 0), codons with structures ZXX/XXZ are 5'/3' asymmetric (CDA = - 1/1; CDA = - 0.5/0.5 if Z and X are both purines or both pyrimidines, assigning negative/positive (-/+) signs is an arbitrary convention). Negative/positive CDAs associate with (a) Fujimoto's tetrahedral codon stereo-table; (b) tRNA synthetase class I/II (aminoacylate the 2'/3' hydroxyl group of the tRNA's last ribose, respectively); and (c) high/low antiparallel (not parallel) betasheet conformation parameters. Preliminary results suggest CDA-whole organism associations (body temperature, developmental stability, lifespan). Presumably, CDA impacts spatial kinetics of codon-anticodon interactions, affecting cotranslational protein folding. Some synonymous codons have opposite CDA sign (alanine, leucine, serine, and valine), putatively explaining how synonymous mutations sometimes affect protein function. Correlations between CDA and tRNA synthetase classes are weaker than between CDA and antiparallel betasheet conformation parameters. This effect is stronger for mitochondrial genetic codes, and potentially drives mitochondrial codon-amino acid reassignments. CDA reveals information ruling nucleotide-protein relations embedded in reversed (not reverse-complement) sequences (5'-ZXX-3'/5'-XXZ-3').

  9. Characterization of a tandemly repeated DNA sequence family originally derived by retroposition of tRNA(Glu) in the newt.

    PubMed

    Nagahashi, S; Endoh, H; Suzuki, Y; Okada, N

    1991-11-20

    A previous report from this laboratory showed that in vitro transcription of total genomic DNA of the newt Cynopus pyrrhogaster resulted in a discrete sized 8 S RNA, which represented highly repetitive and transcribable sequences with a glutamic acid tRNA-like structure in the newt genome. We isolated four independent clones from a newt genomic library and determined the complete sequences of three 2000 to 2400 base-pair PstI fragments spanning the 8 S RNA gene. The glutamic acid tRNA-related segment in the 8 S RNA gene contains the CCA sequence expected as the 3' terminus of a tRNA molecule. Further, the 11 nucleotides located 13 nucleotides upstream from one of the two transcription initiation sites of the 8 S RNA were found to be repeated in the region upstream from the termination site, suggesting that the original unit, which is shorter than the 8 S RNA, was retrotransposed via cDNA intermediates from the PolIII transcript. In the upstream region of the 8 S RNA gene, a 360 nucleotide unit containing the glutamic acid tRNA-related segment was found to be duplicated (clones NE1 and NE10) or triplicated (clone NE3). Except for the difference in the number of the 360 nucleotide unit, the three sequences of the 2000 to 2400 base-pair PstI fragment were essentially the same with only a few mutations and minor deletions. Inverse polymerase chain reaction and sequence determination of the products, together with a Southern hybridization experiment, demonstrated that the family consists of a tandemly repeated unit of 3300, 3700 or 4100 base-pairs. Thus during evolution, this family in the newt was created by retroposition via cDNA intermediates, followed by duplication or triplication of the 360 nucleotide unit and multiplication of the 3300 to 4100 base-pair region at the DNA level.

  10. The complete nucleotide sequence of RNA 3 of a peach isolate of Prunus necrotic ringspot virus.

    PubMed

    Hammond, R W; Crosslin, J M

    1995-04-01

    The complete nucleotide sequence of RNA 3 of the PE-5 peach isolate of Prunus necrotic ringspot ilarvirus (PNRSV) was obtained from cloned cDNA. The RNA sequence is 1941 nucleotides and contains two open reading frames (ORFs). ORF 1 consisted of 284 amino acids with a calculated molecular weight of 31,729 Da and ORF 2 contained 224 amino acids with a calculated molecular weight of 25,018 Da. ORF 2 corresponds to the coat protein gene. Expression of ORF 2 engineered into a pTrcHis vector in Escherichia coli results in a fusion polypeptide of approximately 28 kDa which cross-reacts with PNRSV polyclonal antiserum. Analysis of the coat protein amino acid sequence reveals a putative "zinc-finger" domain at the amino-terminal portion of the protein. Two tetranucleotide AUGC motifs occur in the 3'-UTR of the RNA and may function in coat protein binding and genome activation. ORF 1 homologies to other ilarviruses and alfalfa mosaic virus are confined to limited regions of conserved amino acids. The translated amino acid sequence of the coat protein gene shows 92% similarity to one isolate of apple mosaic virus, a closely related member of the ilarvirus group of plant viruses, but only 66% similarity to the amino acid sequence of the coat protein gene of a second isolate. These relationships are also reflected at the nucleotide sequence level. These results in one instance confirm the close similarities observed at the biophysical and serological levels between these two viruses, but on the other hand call into question the nomenclature used to describe these viruses.

  11. An extended sequence specificity for UV-induced DNA damage.

    PubMed

    Chung, Long H; Murray, Vincent

    2018-01-01

    The sequence specificity of UV-induced DNA damage was determined with a higher precision and accuracy than previously reported. UV light induces two major damage adducts: cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). Employing capillary electrophoresis with laser-induced fluorescence and taking advantages of the distinct properties of the CPDs and 6-4PPs, we studied the sequence specificity of UV-induced DNA damage in a purified DNA sequence using two approaches: end-labelling and a polymerase stop/linear amplification assay. A mitochondrial DNA sequence that contained a random nucleotide composition was employed as the target DNA sequence. With previous methodology, the UV sequence specificity was determined at a dinucleotide or trinucleotide level; however, in this paper, we have extended the UV sequence specificity to a hexanucleotide level. With the end-labelling technique (for 6-4PPs), the consensus sequence was found to be 5'-GCTC*AC (where C* is the breakage site); while with the linear amplification procedure, it was 5'-TCTT*AC. With end-labelling, the dinucleotide frequency of occurrence was highest for 5'-TC*, 5'-TT* and 5'-CC*; whereas it was 5'-TT* for linear amplification. The influence of neighbouring nucleotides on the degree of UV-induced DNA damage was also examined. The core sequences consisted of pyrimidine nucleotides 5'-CTC* and 5'-CTT* while an A at position "1" and C at position "2" enhanced UV-induced DNA damage. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.

  12. A new family of satellite DNA sequences as a major component of centromeric heterochromatin in owls (Strigiformes).

    PubMed

    Yamada, Kazuhiko; Nishida-Umehara, Chizuko; Matsuda, Yoichi

    2004-03-01

    We isolated a new family of satellite DNA sequences from HaeIII- and EcoRI-digested genomic DNA of the Blakiston's fish owl ( Ketupa blakistoni). The repetitive sequences were organized in tandem arrays of the 174 bp element, and localized to the centromeric regions of all macrochromosomes, including the Z and W chromosomes, and microchromosomes. This hybridization pattern was consistent with the distribution of C-band-positive centromeric heterochromatin, and the satellite DNA sequences occupied 10% of the total genome as a major component of centromeric heterochromatin. The sequences were homogenized between macro- and microchromosomes in this species, and therefore intraspecific divergence of the nucleotide sequences was low. The 174 bp element cross-hybridized to the genomic DNA of six other Strigidae species, but not to that of the Tytonidae, suggesting that the satellite DNA sequences are conserved in the same family but fairly divergent between the different families in the Strigiformes. Secondly, the centromeric satellite DNAs were cloned from eight Strigidae species, and the nucleotide sequences of 41 monomer fragments were compared within and between species. Molecular phylogenetic relationships of the nucleotide sequences were highly correlated with both the taxonomy based on morphological traits and the phylogenetic tree constructed by DNA-DNA hybridization. These results suggest that the satellite DNA sequence has evolved by concerted evolution in the Strigidae and that it is a good taxonomic and phylogenetic marker to examine genetic diversity between Strigiformes species.

  13. Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions.

    PubMed

    Nishizawa, M; Nishizawa, K

    2000-10-01

    The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the 'between gene' GC content heterogeneity, which is linked to 'isochores', is a principal factor associated with the bias in substitution patterns in human, 'within gene' heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed.

  14. Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions

    PubMed Central

    Nishizawa, Manami; Nishizawa, Kazuhisa

    2000-01-01

    The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the ‘between gene’ GC content heterogeneity, which is linked to ‘isochores’, is a principal factor associated with the bias in substitution patterns in human, ‘within gene’ heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed. PMID:11000273

  15. Novel Mutations in pncA Gene of Pyrazinamide Resistant Clinical Isolates of Mycobacterium tuberculosis.

    PubMed

    Kahbazi, Manijeh; Sarmadian, Hossein; Ahmadi, Azam; Didgar, Farshideh; Sadrnia, Maryam; Poolad, Toktam; Arjomandzadegan, Mohammad

    2018-04-16

    In clinical isolates of Mycobacterium tuberculosis (MTB), resistance to pyrazinamide occurs by mutations in any positions of the pncA gene (NC_000962.3) especially in nucleotides 359 and 374. In this study we examined the pncA gene sequence in clinical isolates of MTB. Genomic DNA of 33 clinical isolates of MTB was extracted by the Chelex100 method. The polymerase chain reactions (PCR) were performed using specific primers for amplification of 744 bp amplicon comprising the coding sequences (CDS) of the pncA gene. PCR products were sequenced by an automated sequencing Bioscience system. Additionally, semi Nested-allele specific (sNASP) and polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) methods were carried out for verification of probable mutations in nucleotides 359 and 374. Sequencing results showed that from 33 MTB clinical isolates, nine pyrazinamide-resistant isolates have mutations. Furthermore, no mutation was detected in 24 susceptible strains in the entire 561 bp of the pncA gene. Moreover, new mutations of G→A at position 3 of the pncA gene were identified in some of the resistant isolates. Results showed that the sNASP method could detect mutations in nucleotide 359 and 374 of the pncA gene, but the PCR-RFLP method by the SacII enzyme could not detect these mutations. In conclusion, the identification of new mutations in the pncA gene confirmed the probable occurrence of mutations in any nucleotides of the pncA gene sequence in resistant isolates of MTB.

  16. 37 CFR 41.37 - Appeal brief.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... limitation is described in the specification as filed. If there is a drawing or amino acid or nucleotide material sequence, and at least one limitation is illustrated in a drawing or amino acid or nucleotide...

  17. 37 CFR 41.37 - Appeal brief.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... limitation is described in the specification as filed. If there is a drawing or amino acid or nucleotide material sequence, and at least one limitation is illustrated in a drawing or amino acid or nucleotide...

  18. Genetic characterization of L-Zagreb mumps vaccine strain.

    PubMed

    Ivancic, Jelena; Gulija, Tanja Kosutic; Forcic, Dubravko; Baricevic, Marijana; Jug, Renata; Mesko-Prejac, Majda; Mazuran, Renata

    2005-04-01

    Eleven mumps vaccine strains, all containing live attenuated virus, have been used throughout the world. Although L-Zagreb mumps vaccine has been licensed since 1972, only its partial nucleotide sequence was previously determined (accession numbers , and ). Therefore, we sequenced the entire genome of L-Zagreb vaccine strain (Institute of Immunology Inc., Zagreb, Croatia). In order to investigate the genetic stability of the vaccine, sequences of both L-Zagreb master seed and currently produced vaccine batch were determined and no difference between them was observed. A phylogenetic analysis based on SH gene sequence has shown that L-Zagreb strain does not belong to any of established mumps genotypes and that it is most similar to old, laboratory preserved European strains (1950s-1970s). L-Zagreb nucleotide and deduced protein sequences were compared with other mumps virus sequences obtained from the GenBank. Emphasis was put on functionally important protein regions and known antigenic epitopes. The extensive comparisons of nucleotide and deduced protein sequences between L-Zagreb vaccine strain and other previously determined mumps virus sequences have shown that while the functional regions of HN, V, and L proteins are well conserved among various mumps strains, there can be a substantial amino acid difference in antigenic epitopes of all proteins and in functional regions of F protein. No molecular pattern was identified that can be used as a distinction marker between virulent and attenuated strains.

  19. Isolation of laccase gene-specific sequences from white rot and brown rot fungi by PCR.

    PubMed Central

    D'Souza, T M; Boominathan, K; Reddy, C A

    1996-01-01

    Degenerate primers corresponding to the consensus sequences of the copper-binding regions in the N-terminal domains of known basidiomycete laccases were used to isolate laccase gene-specific sequences from strains representing nine genera of wood rot fungi. All except three gave the expected PCR product of about 200 bp. Computer searches of the databases identified the sequence of each of the PCR products analyzed as a laccase gene sequence, suggesting the specificity of the primers. PCR products of the white rot fungi Ganoderma lucidum, Phlebia brevispora, and Trametes versicolor showed 65 to 74% nucleotide sequence similarity to each other; the similarity in deduced amino acid sequences was 83 to 91%. The PCR products of Lentinula edodes and Lentinus tigrinus, on the other hand, showed relatively low nucleotide and amino acid similarities (58 to 64 and 62 to 81%, respectively); however, these similarities were still much higher than when compared with the corresponding regions in the laccases of the ascomycete fungi Aspergillus nidulans and Neurospora crassa. A few of the white rot fungi, as well as Gloeophyllum trabeum, a brown rot fungus, gave a 144-bp PCR fragment which had a nucleotide sequence similarity of 60 to 71%. Demonstration of laccase activity in G. trabeum and several other brown rot fungi was of particular interest because these organisms were not previously shown to produce laccases. PMID:8837429

  20. Species composition of the genus Saprolegnia in fin fish aquaculture environments, as determined by nucleotide sequence analysis of the nuclear rDNA ITS regions.

    PubMed

    de la Bastide, Paul Y; Leung, Wai Lam; Hintz, William E

    2015-01-01

    The ITS region of the rDNA gene was compared for Saprolegnia spp. in order to improve our understanding of nucleotide sequence variability within and between species of this genus, determine species composition in Canadian fin fish aquaculture facilities, and to assess the utility of ITS sequence variability in genetic marker development. From a collection of more than 400 field isolates, ITS region nucleotide sequences were studied and it was determined that there was sufficient consistent inter-specific variation to support the designation of species identity based on ITS sequence data. This non-subjective approach to species identification does not rely upon transient morphological features. Phylogenetic analyses comparing our ITS sequences and species designations with data from previous studies generally supported the clade scheme of Diéguez-Uribeondo et al. (2007) and found agreement with the molecular taxonomic cluster system of Sandoval-Sierra et al. (2014). Our Canadian ITS sequence collection will thus contribute to the public database and assist the clarification of Saprolegnia spp. taxonomy. The analysis of ITS region sequence variability facilitated genus- and species-level identification of unknown samples from aquaculture facilities and provided useful information on species composition. A unique ITS-RFLP for the identification of S. parasitica was also described. Copyright © 2014 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.

  1. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

    PubMed

    Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

    2017-04-15

    Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. Copyright © 2017 Sinclair et al.

  2. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

    PubMed Central

    Sinclair, Robert M.; Ravantti, Janne J.

    2017-01-01

    ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly, we detected similarity at the nucleotide level between capsid protein-coding regions from viruses infecting cells belonging to all three domains of life, reproducing a previously established structure-based classification of icosahedral viral capsids. PMID:28122979

  3. Dynamics of actin evolution in dinoflagellates.

    PubMed

    Kim, Sunju; Bachvaroff, Tsvetan R; Handy, Sara M; Delwiche, Charles F

    2011-04-01

    Dinoflagellates have unique nuclei and intriguing genome characteristics with very high DNA content making complete genome sequencing difficult. In dinoflagellates, many genes are found in multicopy gene families, but the processes involved in the establishment and maintenance of these gene families are poorly understood. Understanding the dynamics of gene family evolution in dinoflagellates requires comparisons at different evolutionary scales. Studies of closely related species provide fine-scale information relative to species divergence, whereas comparisons of more distantly related species provides broad context. We selected the actin gene family as a highly expressed conserved gene previously studied in dinoflagellates. Of the 142 sequences determined in this study, 103 were from the two closely related species, Dinophysis acuminata and D. caudata, including full length and partial cDNA sequences as well as partial genomic amplicons. For these two Dinophysis species, at least three types of sequences could be identified. Most copies (79%) were relatively similar and in nucleotide trees, the sequences formed two bushy clades corresponding to the two species. In comparisons within species, only eight to ten nucleotide differences were found between these copies. The two remaining types formed clades containing sequences from both species. One type included the most similar sequences in between-species comparisons with as few as 12 nucleotide differences between species. The second type included the most divergent sequences in comparisons between and within species with up to 93 nucleotide differences between sequences. In all the sequences, most variation occurred in synonymous sites or the 5' UnTranslated Region (UTR), although there was still limited amino acid variation between most sequences. Several potential pseudogenes were found (approximately 10% of all sequences depending on species) with incomplete open reading frames due to frameshifts or early stop codons. Overall, variation in the actin gene family fits best with the "birth and death" model of evolution based on recent duplications, pseudogenes, and incomplete lineage sorting. Divergence between species was similar to variation within species, so that actin may be too conserved to be useful for phylogenetic estimation of closely related species.

  4. Structured oligonucleotides for target indexing to allow single-vessel PCR amplification and solid support microarray hybridization.

    PubMed

    Girard, Laurie D; Boissinot, Karel; Peytavi, Régis; Boissinot, Maurice; Bergeron, Michel G

    2015-02-07

    The combination of molecular diagnostic technologies is increasingly used to overcome limitations on sensitivity, specificity or multiplexing capabilities, and provide efficient lab-on-chip devices. Two such techniques, PCR amplification and microarray hybridization are used serially to take advantage of the high sensitivity and specificity of the former combined with high multiplexing capacities of the latter. These methods are usually performed in different buffers and reaction chambers. However, these elaborate methods have high complexity and cost related to reagent requirements, liquid storage and the number of reaction chambers to integrate into automated devices. Furthermore, microarray hybridizations have a sequence dependent efficiency not always predictable. In this work, we have developed the concept of a structured oligonucleotide probe which is activated by cleavage from polymerase exonuclease activity. This technology is called SCISSOHR for Structured Cleavage Induced Single-Stranded Oligonucleotide Hybridization Reaction. The SCISSOHR probes enable indexing the target sequence to a tag sequence. The SCISSOHR technology also allows the combination of nucleic acid amplification and microarray hybridization in a single vessel in presence of the PCR buffer only. The SCISSOHR technology uses an amplification probe that is irreversibly modified in presence of the target, releasing a single-stranded DNA tag for microarray hybridization. Each tag is composed of a 3-nucleotide sequence-dependent segment and a unique "target sequence-independent" 14-nucleotide segment allowing for optimal hybridization with minimal cross-hybridization. We evaluated the performance of five (5) PCR buffers to support microarray hybridization, compared to a conventional hybridization buffer. Finally, as a proof of concept, we developed a multiplexed assay for the amplification, detection, and identification of three (3) DNA targets. This new technology will facilitate the design of lab-on-chip microfluidic devices, while also reducing consumable costs. At term, it will allow the cost-effective automation of highly multiplexed assays for detection and identification of genetic targets.

  5. Rapid molecular diagnostics of severe primary immunodeficiency determined by using targeted next-generation sequencing.

    PubMed

    Yu, Hui; Zhang, Victor Wei; Stray-Pedersen, Asbjørg; Hanson, Imelda Celine; Forbes, Lisa R; de la Morena, M Teresa; Chinn, Ivan K; Gorman, Elizabeth; Mendelsohn, Nancy J; Pozos, Tamara; Wiszniewski, Wojciech; Nicholas, Sarah K; Yates, Anne B; Moore, Lindsey E; Berge, Knut Erik; Sorte, Hanne; Bayer, Diana K; ALZahrani, Daifulah; Geha, Raif S; Feng, Yanming; Wang, Guoli; Orange, Jordan S; Lupski, James R; Wang, Jing; Wong, Lee-Jun

    2016-10-01

    Primary immunodeficiency diseases (PIDDs) are inherited disorders of the immune system. The most severe form, severe combined immunodeficiency (SCID), presents with profound deficiencies of T cells, B cells, or both at birth. If not treated promptly, affected patients usually do not live beyond infancy because of infections. Genetic heterogeneity of SCID frequently delays the diagnosis; a specific diagnosis is crucial for life-saving treatment and optimal management. We developed a next-generation sequencing (NGS)-based multigene-targeted panel for SCID and other severe PIDDs requiring rapid therapeutic actions in a clinical laboratory setting. The target gene capture/NGS assay provides an average read depth of approximately 1000×. The deep coverage facilitates simultaneous detection of single nucleotide variants and exonic copy number variants in one comprehensive assessment. Exons with insufficient coverage (<20× read depth) or high sequence homology (pseudogenes) are complemented by amplicon-based sequencing with specific primers to ensure 100% coverage of all targeted regions. Analysis of 20 patient samples with low T-cell receptor excision circle numbers on newborn screening or a positive family history or clinical suspicion of SCID or other severe PIDD identified deleterious mutations in 14 of them. Identified pathogenic variants included both single nucleotide variants and exonic copy number variants, such as hemizygous nonsense, frameshift, and missense changes in IL2RG; compound heterozygous changes in ATM, RAG1, and CIITA; homozygous changes in DCLRE1C and IL7R; and a heterozygous nonsense mutation in CHD7. High-throughput deep sequencing analysis with complete clinical validation greatly increases the diagnostic yield of severe primary immunodeficiency. Establishing a molecular diagnosis enables early immune reconstitution through prompt therapeutic intervention and guides management for improved long-term quality of life. Copyright © 2016 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  6. Continuously tunable nucleic acid hybridization probes.

    PubMed

    Wu, Lucia R; Wang, Juexiao Sherry; Fang, John Z; Evans, Emily R; Pinto, Alessandro; Pekker, Irena; Boykin, Richard; Ngouenet, Celine; Webster, Philippa J; Beechem, Joseph; Zhang, David Yu

    2015-12-01

    In silico-designed nucleic acid probes and primers often do not achieve favorable specificity and sensitivity tradeoffs on the first try, and iterative empirical sequence-based optimization is needed, particularly in multiplexed assays. We present a novel, on-the-fly method of tuning probe affinity and selectivity by adjusting the stoichiometry of auxiliary species, which allows for independent and decoupled adjustment of the hybridization yield for different probes in multiplexed assays. Using this method, we achieved near-continuous tuning of probe effective free energy. To demonstrate our approach, we enforced uniform capture efficiency of 31 DNA molecules (GC content, 0-100%), maximized the signal difference for 11 pairs of single-nucleotide variants and performed tunable hybrid capture of mRNA from total RNA. Using the Nanostring nCounter platform, we applied stoichiometric tuning to simultaneously adjust yields for a 24-plex assay, and we show multiplexed quantitation of RNA sequences and variants from formalin-fixed, paraffin-embedded samples.

  7. Complete nucleotide sequence of pig (Sus scrofa) mitochondrial genome and dating evolutionary divergence within Artiodactyla.

    PubMed

    Lin, C S; Sun, Y L; Liu, C Y; Yang, P C; Chang, L C; Cheng, I C; Mao, S J; Huang, M C

    1999-08-05

    The complete nucleotide sequence of the pig (Sus scrofa) mitochondrial genome, containing 16613bp, is presented in this report. The genome is not a specific length because of the presence of the variable numbers of tandem repeats, 5'-CGTGCGTACA in the displacement loop (D-loop). Genes responsible for 12S and 16S rRNAs, 22 tRNAs, and 13 protein-coding regions are found. The genome carries very few intergenic nucleotides with several instances of overlap between protein-coding or tRNA genes, except in the D-loop region. For evaluating the possible evolutionary relationships between Artiodactyla and Cetacea, the nucleotide substitutions and amino acid sequences of 13 protein-coding genes were aligned by pairwise comparisons of the pig, cow, and fin whale. By comparing these sequences, we suggest that there is a closer relationship between the pig and cow than that between either of these species and fin whale. In addition, the accumulation of transversions and gaps in pig 12S and 16S rRNA genes was compared with that in other eutherian species, including cow, fin whale, human, horse, and harbor seal. The results also reveal a close phylogenetic relationship between pig and cow, as compared to fin whale and others. Thus, according to the sequence differences of mitochondrial rRNA genes in eutherian species, the evolutionary separation of pig and cow occurred about 53-60 million years ago.

  8. Structural analysis of the human U3 ribonucleoprotein particle reveal a conserved sequence available for base pairing with pre-rRNA.

    PubMed Central

    Parker, K A; Steitz, J A

    1987-01-01

    The human U3 ribonucleoprotein (RNP) has been analyzed to determine its protein constituents, sites of protein-RNA interaction, and RNA secondary structure. By using anti-U3 RNP antibodies and extracts prepared from HeLa cells labeled in vivo, the RNP was found to contain four nonphosphorylated proteins of 36, 30, 13, and 12.5 kilodaltons and two phosphorylated proteins of 74 and 59 kilodaltons. U3 nucleotides 72-90, 106-121, 154-166, and 190-217 must contain sites that interact with proteins since these regions are immunoprecipitated after treatment of the RNP with RNase A or T1. The secondary structure was probed with specific nucleases and by chemical modification with single-strand-specific reagents that block subsequent reverse transcription. Regions that are single stranded (and therefore potentially able to interact with a substrate RNA) include an evolutionarily conserved sequence at nucleotides 104-112 and nonconserved sequences at nucleotides 65-74, 80-84, and 88-93. Nucleotides 159-168 do not appear to be highly accessible, thus making it unlikely that this U3 sequence base pairs with sequences near the 5.8S rRNA-internal transcribed spacer II junction, as previously proposed. Alternative functions of the U3 RNP are discussed, including the possibility that U3 may participate in a processing event near the 3' end of 28S rRNA. Images PMID:2959855

  9. Characterization of durum wheat high molecular weight glutenin subunits Bx20 and By20 sequences by a molecular and proteomic approach.

    PubMed

    Santagati, Vito Davide; Sestili, Francesco; Lafiandra, Domenico; D'Ovidio, Renato; Rogniaux, Helene; Masci, Stefania

    2016-07-01

    Wheat high molecular weight glutenin subunit variation is important because of its great influence on glutenin polymer structure, that is related to dough technological properties. Among the different subunits, the pair Bx20 and By20 is known to have a negative effect on quality, but the reasons are not clear: Bx20 has two cysteines, which theoretically make this subunit a chain extender of the glutenin polymer, just like the other Bx subunits, showing four cysteines, two of which should be involved in intra-molecular disulfide bonds. By20 has never been characterized so far at molecular level. Here we report the nucleotide sequences of Bx20 and By20 genes isolated from the durum wheat cultivar 'Lira 45' and the validation of the corresponding deduced amino acid sequences by using MALDI-TOF and LC-MS/MS. Four nucleotide differences were identified in the Bx20 gene with respect to the deduced sequence present in NCBI, causing two amino acid substitutions. For the By20 subunit, nucleotide and amino acid sequences revealed a great similarity to By15, both at gene and protein levels, showing five nucleotide changes generating two amino acid differences. No evidence of post-translational modifications has been found. Hypotheses are formulated in regard to relationships with technological quality. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  10. Open reading frames in a 4556 nucleotide sequence within MDV-1 BamHI-D DNA fragment: evidence for splicing of mRNA from a new viral glycoprotein gene.

    PubMed

    Becker, Y; Asher, Y; Tabor, E; Davidson, I; Malkinson, M

    1994-01-01

    A DNA segment of the MDV-1 BamHI-D fragment was sequenced, and the open reading frames (ORFs) present in the 4556 nucleotide fragment were analyzed by computer programs. Computer analysis identified 19 putative ORFs in the sequence ranging from a coding capacity of 37 amino acids (aa) (ORF-1a) to 684aa (ORF-1). The special properties of four ORFs (1a, 1, 2, and 3) were investigated. Two adjacent ORFs, ORF-1a and ORF-1, were found by computer analysis to have the properties of two introns encoding a glycoprotein: ORF-1a encodes an aa sequence with the properties of a signal peptide, and ORF-1 encodes a polypeptide with a membrane anchor domain and putative N-glycosylation sites in the aa sequence. ORF-1a and ORF-1 were found to be transcribed in MDV-1-infected cells. Two RNA transcripts were detected: a precursor RNA and its spliced form. Both are transcribed from a promoter located 5' to ORF-1a, and splice donor and acceptor sites are used to splice the mRNA after cleavage of a 71-nucleotide sequence. This finding suggest that ORF-1a and ORF-1 are two introns of a new MDV-1 glycoprotein gene. The DNA sequence containing ORF-1 was transiently expressed in COS-1 cells, and the viral protein produced in these cells was found to react with anti-MDV serotype-1 Antigen B-specific monoclonal antibodies. These studies indicate that the protein encoded by ORF-1 has antigenic properties resembling Antigen B of MDV-1. A gene homologous to ORF-1 was detected in the genome of both MDV-2(SB1) and MDV-3(HVT), which serve as commercial vaccine strains. Two additional ORFs were noted in the 4556 nucleotide sequence: ORF-2, which encodes a 333 aa polypeptide initiating in the UL and terminating in the TRL prior to the putative origin of replication, and ORF-3, which encodes a 155 aa polypeptide that is partly homologous to the phosphoprotein pp38 encoded by the BamHI-H sequence. The 65 N-terminal aa of the two gene products are identical, both being derived from the nucleotide sequences in the TRL and IRL, respectively. Additional homologous aa sequences are the hydrophobic aa domain in the middle of both proteins. The functions of ORF-2, ORF-3, and additional ORFs are under study.

  11. A Novel Method to Predict Highly Expressed Genes Based on Radius Clustering and Relative Synonymous Codon Usage.

    PubMed

    Tran, Tuan-Anh; Vo, Nam Tri; Nguyen, Hoang Duc; Pham, Bao The

    2015-12-01

    Recombinant proteins play an important role in many aspects of life and have generated a huge income, notably in the industrial enzyme business. A gene is introduced into a vector and expressed in a host organism-for example, E. coli-to obtain a high productivity of target protein. However, transferred genes from particular organisms are not usually compatible with the host's expression system because of various reasons, for example, codon usage bias, GC content, repetitive sequences, and secondary structure. The solution is developing programs to optimize for designing a nucleotide sequence whose origin is from peptide sequences using properties of highly expressed genes (HEGs) of the host organism. Existing data of HEGs determined by practical and computer-based methods do not satisfy for qualifying and quantifying. Therefore, the demand for developing a new HEG prediction method is critical. We proposed a new method for predicting HEGs and criteria to evaluate gene optimization. Codon usage bias was weighted by amplifying the difference between HEGs and non-highly expressed genes (non-HEGs). The number of predicted HEGs is 5% of the genome. In comparison with Puigbò's method, the result is twice as good as Puigbò's one, in kernel ratio and kernel sensitivity. Concerning transcription/translation factor proteins (TF), the proposed method gives low TF sensitivity, while Puigbò's method gives moderate one. In summary, the results indicated that the proposed method can be a good optional applying method to predict optimized genes for particular organisms, and we generated an HEG database for further researches in gene design.

  12. On the phylogenetic placement of human T cell leukemia virus type 1 sequences associated with an Andean mummy.

    PubMed

    Coulthart, Michael B; Posada, David; Crandall, Keith A; Dekaban, Gregory A

    2006-03-01

    Recently, the putative finding of ancient human T cell leukemia virus type 1 (HTLV-1) long terminal repeat (LTR) DNA sequences in association with a 1500-year-old Chilean mummy has stirred vigorous debate. The debate is based partly on the inherent uncertainties associated with phylogenetic reconstruction when only short sequences of closely related genotypes are available. However, a full analysis of what phylogenetic information is present in the mummy data has not previously been published, leaving open the question of what precisely is the range of admissible interpretation. To fulfill this need, we re-analyzed the mummy data in a new way. We first performed phylogenetic analysis of 188 published LTR DNA sequences from extant strains belonging to the HTLV-1 Cosmopolitan clade, using the method of statistical parsimony which is designed both to optimize phylogenetic resolution among sequences with little evolutionary divergence, and to permit precise mapping of individual sequence mutations onto branches of a divergence network. We then deduced possible phylogenetic positions for the two main categories of published Chilean mummy sequences, based on their published 157-nucleotide LTR sequences. The possible phylogenetic placements for one of the mummy sequence categories are consistent with a modern origin. However, one of these placements for the other mummy sequence category falls very close to the root of the Cosmopolitan clade, consistent with an ancient origin for both this mummy sequence and the Cosmopolitan clade.

  13. Large scale DNA microsequencing device

    DOEpatents

    Foote, Robert S.

    1997-01-01

    A microminiature sequencing apparatus and method provide means for simultaneously obtaining sequences of plural polynucleotide strands. The apparatus comprises a microchip into which plural channels have been etched using standard lithographic procedures and chemical wet etching. The channels include a reaction well and a separating section. Enclosing the channels is accomplished by bonding a transparent cover plate over the apparatus. A first oligonucleotide strand is chemically affixed to the apparatus through an alkyl chain. Subsequent nucleotides are selected by complementary base pair bonding. A target nucleotide strand is used to produce a family of labelled sequencing strands in each channel which are separated in the separating section. During or following separation the sequences are determined using appropriate detection means.

  14. Large scale DNA microsequencing device

    DOEpatents

    Foote, Robert S.

    1999-01-01

    A microminiature sequencing apparatus and method provide means for simultaneously obtaining sequences of plural polynucleotide strands. The apparatus comprises a microchip into which plural channels have been etched using standard lithographic procedures and chemical wet etching. The channels include a reaction well and a separating section. Enclosing the channels is accomplished by bonding a transparent cover plate over the apparatus. A first oligonucleotide strand is chemically affixed to the apparatus through an alkyl chain. Subsequent nucleotides are selected by complementary base pair bonding. A target nucleotide strand is used to produce a family of labelled sequencing strands in each channel which are separated in the separating section. During or following separation the sequences are determined using appropriate detection means.

  15. Complete sequence analysis reveals two distinct poleroviruses infecting cucurbits in China.

    PubMed

    Xiang, Hai-ying; Shang, Qiao-xia; Han, Cheng-gui; Li, Da-wei; Yu, Jia-lin

    2008-01-01

    The complete RNA genomes of a Chinese isolate of cucurbit aphid-borne yellows virus (CABYV-CHN) and a new polerovirus tentatively referred to as melon aphid-borne yellows virus (MABYV) were determined. The entire genome of CABYV-CHN shared 89.0% nucleotide sequence identity with the French CABYV isolate. In contrast, nucleotide sequence identities between MABYV and CABYV and other poleroviruses were in the range of 50.7-74.2%, with amino acid sequence identities ranging from 24.8 to 82.9% for individual gene products. We propose that CABYV-CHN is a strain of CABYV and that MABYV is a member of a tentative distinct species within the genus Polerovirus.

  16. The complete genomic sequence of a tentative new polerovirus identified in barley in South Korea.

    PubMed

    Zhao, Fumei; Lim, Seungmo; Yoo, Ran Hee; Igori, Davaajargal; Kim, Sang-Min; Kwak, Do Yeon; Kim, Sun Lim; Lee, Bong Choon; Moon, Jae Sun

    2016-07-01

    The complete nucleotide sequence of a new barley polerovirus, tentatively named barley virus G (BVG), which was isolated in Gimje, South Korea, has been determined using an RNA sequencing technique combined with polymerase chain reaction methods. The viral genomic RNA of BVG is 5,620 nucleotides long and contains six typical open reading frames commonly observed in other poleroviruses. Sequence comparisons revealed that BVG is most closely related to maize yellow dwarf virus-RMV, with the highest amino acid identities being less than 90 % for all of the corresponding proteins. These results suggested that BVG is a member of a new species in the genus Polerovirus.

  17. Complete nucleotide sequence of spring beauty latent virus, a bromovirus infectious to Arabidopsis thaliana.

    PubMed

    Fujisaki, K; Hagihara, F; Kaido, M; Mise, K; Okuno, T

    2003-01-01

    Spring beauty latent virus (SBLV), a bromovirus, systemically and efficiently infected Arabidopsis thaliana, whereas the well-studied bromoviruses brome mosaic virus (BMV) and cowpea chlorotic mottle virus (CCMV) did not infect and poorly infected A. thaliana, respectively. We constructed biologically active cDNA clones of SBLV genomic RNAs and determined their complete nucleotide sequences. Interestingly, SBLV RNA3 contains both the box B motif in the intercistronic region, as does BMV, and the subgenomic promoter-like sequence in the 5' noncoding region, as does CCMV. Sequence comparisons of SBLV, BMV, CCMV, and broad bean mottle virus demonstrated that SBLV is closely related to BMV and CCMV.

  18. Large scale DNA microsequencing device

    DOEpatents

    Foote, R.S.

    1999-08-31

    A microminiature sequencing apparatus and method provide means for simultaneously obtaining sequences of plural polynucleotide strands. The apparatus comprises a microchip into which plural channels have been etched using standard lithographic procedures and chemical wet etching. The channels include a reaction well and a separating section. Enclosing the channels is accomplished by bonding a transparent cover plate over the apparatus. A first oligonucleotide strand is chemically affixed to the apparatus through an alkyl chain. Subsequent nucleotides are selected by complementary base pair bonding. A target nucleotide strand is used to produce a family of labelled sequencing strands in each channel which are separated in the separating section. During or following separation the sequences are determined using appropriate detection means. 11 figs.

  19. Nucleotide sequence of a cluster of early and late genes in a conserved segment of the vaccinia virus genome.

    PubMed Central

    Plucienniczak, A; Schroeder, E; Zettlmeissl, G; Streeck, R E

    1985-01-01

    The nucleotide sequence of a 7.6 kb vaccinia DNA segment from a genomic region conserved among different orthopox virus has been determined. This segment contains a tight cluster of 12 partly overlapping open reading frames most of which can be correlated with previously identified early and late proteins and mRNAs. Regulatory signals used by vaccinia virus have been studied. Presumptive promoter regions are rich in A, T and carry the consensus sequences TATA and AATAA spaced at 20-24 base pairs. Tandem repeats of a CTATTC consensus sequence are proposed to be involved in the termination of early transcription. PMID:2987815

  20. A three-nucleotide helix I is sufficient for full activity of a hammerhead ribozyme: advantages of an asymmetric design.

    PubMed Central

    Tabler, M; Homann, M; Tzortzakaki, S; Sczakiel, G

    1994-01-01

    Trans-cleaving hammerhead ribozymes with long target-specific antisense sequences flanking the catalytic domain share some features with conventional antisense RNA and are therefore termed 'catalytic antisense RNAs'. Sequences 5' to the catalytic domain form helix I and sequences 3' to it form helix III when complexed with the target RNA. A catalytic antisense RNA of more than 400 nucleotides, and specific for the human immunodeficiency virus type 1 (HIV-1), was systematically truncated within the arm that constituted originally a helix I of 128 base pairs. The resulting ribozymes formed helices I of 13, 8, 5, 3, 2, 1 and 0 nucleotides, respectively, and a helix III of about 280 nucleotides. When their in vitro cleavage activity was compared with the original catalytic antisense RNA, it was found that a helix I of as little as three nucleotides was sufficient for full endonucleolytic activity. The catalytically active constructs inhibited HIV-1 replication about four-fold more effectively than the inactive ones when tested in human cells. A conventional hammerhead ribozyme having helices of just 8 nucleotides on either side failed to cleave the target RNA in vitro when tested under the conditions for catalytic antisense RNA. Cleavage activity could only be detected after heat-treatment of the ribozyme substrate mixture which indicates that hammerhead ribozymes with short arms do not associate as efficiently to the target RNA as catalytic antisense RNA. The requirement of just a three-nucleotide helix I allows simple PCR-based generation strategies for asymmetric hammerhead ribozymes. Advantages of an asymmetric design will be discussed. Images PMID:7937118

  1. The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition

    NASA Astrophysics Data System (ADS)

    Štambuk, Nikola

    The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.

  2. Long-term excretion of vaccine-derived poliovirus by a healthy child.

    PubMed

    Martín, Javier; Odoom, Kofi; Tuite, Gráinne; Dunn, Glynis; Hopewell, Nicola; Cooper, Gill; Fitzharris, Catherine; Butler, Karina; Hall, William W; Minor, Philip D

    2004-12-01

    A child was found to be excreting type 1 vaccine-derived poliovirus (VDPV) with a 1.1% sequence drift from Sabin type 1 vaccine strain in the VP1 coding region 6 months after he was immunized with oral live polio vaccine. Seventeen type 1 poliovirus isolates were recovered from stools taken from this child during the following 4 months. Contrary to expectation, the child was not deficient in humoral immunity and showed high levels of serum neutralization against poliovirus. Selected virus isolates were characterized in terms of their antigenic properties, virulence in transgenic mice, sensitivity for growth at high temperatures, and differences in nucleotide sequence from the Sabin type 1 strain. The VDPV isolates showed mutations at key nucleotide positions that correlated with the observed reversion to biological properties typical of wild polioviruses. A number of capsid mutations mapped at known antigenic sites leading to changes in the viral antigenic structure. Estimates of sequence evolution based on the accumulation of nucleotide changes in the VP1 coding region detected a "defective" molecular clock running at an apparent faster speed of 2.05% nucleotide changes per year versus 1% shown in previous studies. Remarkably, when compared to several type 1 VDPV strains of different origins, isolates from this child showed a much higher proportion of nonsynonymous versus synonymous nucleotide changes in the capsid coding region. This anomaly could explain the high VP1 sequence drift found and the ability of these virus strains to replicate in the gut for a longer period than expected.

  3. The emergence and evolution of life in a "fatty acid world" based on quantum mechanics.

    PubMed

    Tamulis, Arvydas; Grigalavicius, Mantas

    2011-02-01

    Quantum mechanical based electron correlation interactions among molecules are the source of the weak hydrogen and Van der Waals bonds that are critical to the self-assembly of artificial fatty acid micelles. Life on Earth or elsewhere could have emerged in the form of self-reproducing photoactive fatty acid micelles, which gradually evolved into nucleotide-containing micelles due to the enhanced ability of nucleotide-coupled sensitizer molecules to absorb visible light. Comparison of the calculated absorption spectra of micelles with and without nucleotides confirmed this idea and supports the idea of the emergence and evolution of nucleotides in minimal cells of a so-called Fatty Acid World. Furthermore, the nucleotide-caused wavelength shift and broadening of the absorption pattern potentially gives these molecules an additional valuable role, other than a purely genetic one in the early stages of the development of life. From the information theory point of view, the nucleotide sequences in such micelles carry positional information providing better electron transport along the nucleotide-sensitizer chain and, in addition, providing complimentary copies of that information for the next generation. Nucleotide sequences, which in the first period of evolution of fatty acid molecules were useful just for better absorbance of the light in the longer wavelength region, later in the PNA or RNA World, took on the role of genetic information storage.

  4. Isolation of a full-length CC-NBS-LRR resistance gene analog candidate from sugar pine showing low nucleotide diversity.

    Treesearch

    K.D. Jermstad; L.A. Sheppard; B.B. Kinloch; A. Delfino-Mix; E.S. Ersoz; K.V. Krutovsky; D.B Neale

    2006-01-01

    The nucleotide-binding-site and leucine-rich-repeat (NBS–LRR) class of R proteins is abundant and widely distributed in plants. By using degenerate primers designed on the NBS domain in lettuce, we amplified sequences in sugar pine that shared sequence identity with many of the NBS–LRR class resistance genes catalogued in GenBank. The polymerase chain reaction products...

  5. Identification of largemouth bass virus in the introduced Northern Snakehead inhabiting the Chesapeake Bay watershed.

    PubMed

    Iwanowicz, L; Densmore, C; Hahn, C; McAllister, P; Odenkirk, J

    2013-09-01

    The Northern Snakehead Channa argus is an introduced species that now inhabits the Chesapeake Bay. During a preliminary survey for introduced pathogens possibly harbored by these fish in Virginia waters, a filterable agent was isolated from five specimens that produced cytopathic effects in BF-2 cells. Based on PCR amplification and partial sequencing of the major capsid protein (MCP), DNA polymerase (DNApol), and DNA methyltransferase (Mtase) genes, the isolates were identified as Largemouth Bass virus (LMBV). Nucleotide sequences of the MCP (492 bp) and DNApol (419 pb) genes were 100% identical to those of LMBV. The nucleotide sequence of the Mtase (206 bp) gene was 99.5% identical to that of LMBV, and the single nucleotide substitution did not lead to a predicted amino acid coding change. This is the first report of LMBV from the Northern Snakehead, and provides evidence that noncentrarchid fishes may be susceptible to this virus.

  6. Nucleotide-Specific Contrast for DNA Sequencing by Electron Spectroscopy.

    PubMed

    Mankos, Marian; Persson, Henrik H J; N'Diaye, Alpha T; Shadman, Khashayar; Schmid, Andreas K; Davis, Ronald W

    2016-01-01

    DNA sequencing by imaging in an electron microscope is an approach that holds promise to deliver long reads with low error rates and without the need for amplification. Earlier work using transmission electron microscopes, which use high electron energies on the order of 100 keV, has shown that low contrast and radiation damage necessitates the use of heavy atom labeling of individual nucleotides, which increases the read error rates. Other prior work using scattering electrons with much lower energy has shown to suppress beam damage on DNA. Here we explore possibilities to increase contrast by employing two methods, X-ray photoelectron and Auger electron spectroscopy. Using bulk DNA samples with monomers of each base, both methods are shown to provide contrast mechanisms that can distinguish individual nucleotides without labels. Both spectroscopic techniques can be readily implemented in a low energy electron microscope, which may enable label-free DNA sequencing by direct imaging.

  7. Identification of largemouth bass virus in the introduced Northern snakehead inhabiting the Cheasapeake Bay watershed

    USGS Publications Warehouse

    Iwanowicz, Luke R.; Densmore, Christine L.; Hahn, Cassidy M.; McAllister, Phillip; Odenkirk, John

    2013-01-01

    The Northern Snakehead Channa argus is an introduced species that now inhabits the Chesapeake Bay. During a preliminary survey for introduced pathogens possibly harbored by these fish in Virginia waters, a filterable agent was isolated from five specimens that produced cytopathic effects in BF-2 cells. Based on PCR amplification and partial sequencing of the major capsid protein (MCP), DNA polymerase (DNApol), and DNA methyltransferase (Mtase) genes, the isolates were identified as Largemouth Bass virus (LMBV). Nucleotide sequences of the MCP (492 bp) and DNApol (419 pb) genes were 100% identical to those of LMBV. The nucleotide sequence of the Mtase (206 bp) gene was 99.5% identical to that of LMBV, and the single nucleotide substitution did not lead to a predicted amino acid coding change. This is the first report of LMBV from the Northern Snakehead, and provides evidence that noncentrarchid fishes may be susceptible to this virus.

  8. Systematically frameshifting by deletion of every 4th or 4th and 5th nucleotides during mitochondrial transcription: RNA self-hybridization regulates delRNA expression.

    PubMed

    Seligmann, Hervé

    2016-01-01

    In mitochondria, secondary structures punctuate post-transcriptional RNA processing. Recently described transcripts match the human mitogenome after systematic deletions of every 4th, respectively every 4th and 5th nucleotides, called delRNAs. Here I explore predicted stem-loop hairpin formation by delRNAs, and their associations with delRNA transcription and detected peptides matching their translation. Despite missing 25, respectively 40% of the nucleotides in the original sequence, del-transformed sequences form significantly more secondary structures than corresponding randomly shuffled sequences, indicating biological function, independently of, and in combination with, previously detected delRNA and thereof translated peptides. Self-hybridization decreases delRNA abundances, indicating downregulation. Systematic deletions of the human mitogenome reveal new, unsuspected coding and structural informations. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  9. MSuPDA: A memory efficient algorithm for sequence alignment.

    PubMed

    Khan, Mohammad Ibrahim; Kamal, Md Sarwar; Chowdhury, Linkon

    2015-01-16

    Space complexity is a million dollar question in DNA sequence alignments. In this regards, MSuPDA (Memory Saving under Pushdown Automata) can help to reduce the occupied spaces in computer memory. Our proposed process is that Anchor Seed (AS) will be selected from given data set of Nucleotides base pairs for local sequence alignment. Quick Splitting (QS) techniques will separate the Anchor Seed from all the DNA genome segments. Selected Anchor Seed will be placed to pushdown Automata's (PDA) input unit. Whole DNA genome segments will be placed into PDA's stack. Anchor Seed from input unit will be matched with the DNA genome segments from stack of PDA. Whatever matches, mismatches or Indel, of Nucleotides will be POP from the stack under the control of control unit of Pushdown Automata. During the POP operation on stack it will free the memory cell occupied by the Nucleotide base pair.

  10. Nucleotide sequence of the ribosomal RNA gene of Physarum polycephalum: intron 2 and its flanking regions of the 26S rRNA gene.

    PubMed Central

    Nomiyama, H; Kuhara, S; Kukita, T; Otsuka, T; Sakaki, Y

    1981-01-01

    The 26S ribosomal RNA gene of Physarum polycephalum is interrupted by two introns, and we have previously determined the sequence of one of them (intron 1) (Nomiyama et al. Proc.Natl.Acad.Sci.USA 78, 1376-1380, 1981). In this study we sequenced the second intron (intron 2) of about 0.5 kb length and its flanking regions, and found that one nucleotide at each junction is identical in intron 1 and intron 2, though the junction regions share no other sequence homology. Comparison of the flanking exon sequences to E. coli 23S rRNA sequences shows that conserved sequences are interspersed with tracts having little homology. In particular, the region encompassing the intron 2 interruption site is highly conserved. The E. coli ribosomal protein L1 binding region is also conserved. Images PMID:6171776

  11. Nucleotide sequence of a complementary DNA encoding pea cytosolic copper/zinc superoxide dismutase. [Pisum sativum L

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    White, D.A.; Zilinskas, B.A.

    1991-08-01

    The authors now report the nucleotide sequence of the cytosolic Cu/Zn SOD cloned from a {lambda}gt11 cDNA library constructed from mRNA extracted from leaves of 7- to 10-d pea seedlings (Pisum sativum L.). The clone was isolated using a 22-base synthetic oligonucleotide complementary to the amino acid sequence CGIIGLQG. This sequence, found at the protein's carboxy terminus, is highly conserved among plant cytosolic Cu/Zn SODs but not chloroplastic Cu/Zn SODs. The 738-base pair sequence contains an open reading frame specifying 152 codons and a predicted M{sub r} of 18,024 D. The deduced amino acid sequence is highly homologous (79-82% identity)more » with the sequences of other known plant cytosolic Cu/Zn SODs but less highly conserved (63-65%) when compared with several chloroplastic Cu/Zn SODs including pea (10).« less

  12. Single-cell mRNA sequencing identifies subclonal heterogeneity in anti-cancer drug responses of lung adenocarcinoma cells.

    PubMed

    Kim, Kyu-Tae; Lee, Hye Won; Lee, Hae-Ock; Kim, Sang Cheol; Seo, Yun Jee; Chung, Woosung; Eum, Hye Hyeon; Nam, Do-Hyun; Kim, Junhyong; Joo, Kyeung Min; Park, Woong-Yang

    2015-06-19

    Intra-tumoral genetic and functional heterogeneity correlates with cancer clinical prognoses. However, the mechanisms by which intra-tumoral heterogeneity impacts therapeutic outcome remain poorly understood. RNA sequencing (RNA-seq) of single tumor cells can provide comprehensive information about gene expression and single-nucleotide variations in individual tumor cells, which may allow for the translation of heterogeneous tumor cell functional responses into customized anti-cancer treatments. We isolated 34 patient-derived xenograft (PDX) tumor cells from a lung adenocarcinoma patient tumor xenograft. Individual tumor cells were subjected to single cell RNA-seq for gene expression profiling and expressed mutation profiling. Fifty tumor-specific single-nucleotide variations, including KRAS(G12D), were observed to be heterogeneous in individual PDX cells. Semi-supervised clustering, based on KRAS(G12D) mutant expression and a risk score representing expression of 69 lung adenocarcinoma-prognostic genes, classified PDX cells into four groups. PDX cells that survived in vitro anti-cancer drug treatment displayed transcriptome signatures consistent with the group characterized by KRAS(G12D) and low risk score. Single-cell RNA-seq on viable PDX cells identified a candidate tumor cell subgroup associated with anti-cancer drug resistance. Thus, single-cell RNA-seq is a powerful approach for identifying unique tumor cell-specific gene expression profiles which could facilitate the development of optimized clinical anti-cancer strategies.

  13. A novel HLA-B allele, B*5214, detected in a Taiwanese volunteer bone marrow donor using a sequence-based typing method.

    PubMed

    Chen, M J; Chu, C C; Shyr, M H; Lin, C L; Lin, P Y; Yang, K L

    2010-02-01

    HLA-B*5214, a novel rare allele of HLA-B*52 variant, was found in a Taiwanese volunteer bone marrow donor by sequence-based typing method. The sequence of B*5214 is identical to that of B*520101 in exon 2 but differs from B*520101 in exon 3 at nucleotide positions 419 A-->T and 435 A-->G. Alteration of these two nucleotides resulted an amino acid substitution at amino acid residue 116 Y-->F ( TAC-->TTC) and a silent exchange at residue 121 K-->K (AAA-->AAG).

  14. Nucleotide sequencing and characterization of the genes encoding benzene oxidation enzymes of Pseudomonas putida.

    PubMed Central

    Irie, S; Doi, S; Yorifuji, T; Takagi, M; Yano, K

    1987-01-01

    The nucleotide sequence of the genes from Pseudomonas putida encoding oxidation of benzene to catechol was determined. Five open reading frames were found in the sequence. Four corresponding protein molecules were detected by a DNA-directed in vitro translation system. Escherichia coli cells containing the fragment with the four open reading frames transformed benzene to cis-benzene glycol, which is an intermediate of the oxidation of benzene to catechol. The relation between the product of each cistron and the components of the benzene oxidation enzyme system is discussed. Images PMID:3667527

  15. Nucleotide sequence of the gene determining plasmid-mediated citrate utilization.

    PubMed Central

    Ishiguro, N; Sato, G

    1985-01-01

    The citrate utilization determinant from transposon Tn3411 has been cloned and sequenced, and its polypeptide products have been characterized in minicell experiments. The nucleotide sequence was determined for a 2,047-base-pair BglII restriction endonuclease fragment that includes the citrate determinant. This region contains an open reading frame that would encode a 431-amino-acid very hydrophobic polypeptide and which is preceded by a reasonable ribosomal binding site. However, the single polypeptide found in minicell experiments had an apparent molecular weight of 35,000 on sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Images PMID:2999087

  16. Chromosome specific repetitive DNA sequences

    DOEpatents

    Moyzis, Robert K.; Meyne, Julianne

    1991-01-01

    A method is provided for determining specific nucleotide sequences useful in forming a probe which can identify specific chromosomes, preferably through in situ hybridization within the cell itself. In one embodiment, chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family me This invention is the result of a contract with the Department of Energy (Contract No. W-7405-ENG-36).

  17. Identification of two allelic IgG1 C(H) coding regions (Cgamma1) of cat.

    PubMed

    Kanai, T H; Ueda, S; Nakamura, T

    2000-01-31

    Two types of cDNA encoding IgG1 heavy chain (gamma1) were isolated from a single domestic short-hair cat. Sequence analysis indicated a higher level of similarity of these Cgamma1 sequences to human Cgamma1 sequence (76.9 and 77.0%) than to mouse sequence (70.0 and 69.7%) at the nucleotide level. Predicted primary structures of both the feline Cgamma1 genes, designated as Cgamma1a and Cgamma1b, were similar to that of human Cgamma1 gene, for instance, as to the size of constant domains, the presence of six conserved cysteine residues involved in formation of the domain structure, and the location of a conserved N-linked glycosylation site. Sequence comparison between the two alleles showed that 7 out of 10 nucleotide differences were within the C(H)3 domain coding region, all leading to nonsynonymous changes in amino acid residues. Partial sequence analysis of genomic clones showed three nucleotide substitutions between the two Cgamma1 alleles in the intron between the CH2 and C(H)3 domain coding regions. In 12 domestic short-hair cats used in this study, the frequency of Cgamma1a allele (62.5%) was higher than that of the Cgamma1b allele (37.5%).

  18. Human somatostatin I: sequence of the cDNA.

    PubMed Central

    Shen, L P; Pictet, R L; Rutter, W J

    1982-01-01

    RNA has been isolated from a human pancreatic somatostatinoma and used to prepare a cDNA library. After prescreening, clones containing somatostatin I sequences were identified by hybridization with an anglerfish somatostatin I-cloned cDNA probe. From the nucleotide sequence of two of these clones, we have deduced an essentially full-length mRNA sequence, including the preprosomatostatin coding region, 105 nucleotides from the 5' untranslated region and the complete 150-nucleotide 3' untranslated region. The coding region predicts a 116-amino acid precursor protein (Mr, 12.727) that contains somatostatin-14 and -28 at its COOH terminus. The predicted amino acid sequence of human somatostatin-28 is identical to that of somatostatin-28 isolated from the porcine and ovine species. A comparison of the amino acid sequences of human and anglerfish preprosomatostatin I indicated that the COOH-terminal region encoding somatostatin-14 and the adjacent 6 amino acids are highly conserved, whereas the remainder of the molecule, including the signal peptide region, is more divergent. However, many of the amino acid differences found in the pro region of the human and anglerfish proteins are conservative changes. This suggests that the propeptides have a similar secondary structure, which in turn may imply a biological function for this region of the molecule. Images PMID:6126875

  19. Scanning the Effects of Ethyl Methanesulfonate on the Whole Genome of Lotus japonicus Using Second-Generation Sequencing Analysis

    PubMed Central

    Mohd-Yusoff, Nur Fatihah; Ruperao, Pradeep; Tomoyoshi, Nurain Emylia; Edwards, David; Gresshoff, Peter M.; Biswas, Bandana; Batley, Jacqueline

    2015-01-01

    Genetic structure can be altered by chemical mutagenesis, which is a common method applied in molecular biology and genetics. Second-generation sequencing provides a platform to reveal base alterations occurring in the whole genome due to mutagenesis. A model legume, Lotus japonicus ecotype Miyakojima, was chemically mutated with alkylating ethyl methanesulfonate (EMS) for the scanning of DNA lesions throughout the genome. Using second-generation sequencing, two individually mutated third-generation progeny (M3, named AM and AS) were sequenced and analyzed to identify single nucleotide polymorphisms and reveal the effects of EMS on nucleotide sequences in these mutant genomes. Single-nucleotide polymorphisms were found in every 208 kb (AS) and 202 kb (AM) with a bias mutation of G/C-to-A/T changes at low percentage. Most mutations were intergenic. The mutation spectrum of the genomes was comparable in their individual chromosomes; however, each mutated genome has unique alterations, which are useful to identify causal mutations for their phenotypic changes. The data obtained demonstrate that whole genomic sequencing is applicable as a high-throughput tool to investigate genomic changes due to mutagenesis. The identification of these single-point mutations will facilitate the identification of phenotypically causative mutations in EMS-mutated germplasm. PMID:25660167

  20. SENCA: A Multilayered Codon Model to Study the Origins and Dynamics of Codon Usage

    PubMed Central

    Pouyet, Fanny; Bailly-Bechet, Marc; Mouchiroud, Dominique; Guéguen, Laurent

    2016-01-01

    Gene sequences are the target of evolution operating at different levels, including the nucleotide, codon, and amino acid levels. Disentangling the impact of those different levels on gene sequences requires developing a probabilistic model with three layers. Here we present SENCA (site evolution of nucleotides, codons, and amino acids), a codon substitution model that separately describes 1) nucleotide processes which apply on all sites of a sequence such as the mutational bias, 2) preferences between synonymous codons, and 3) preferences among amino acids. We argue that most synonymous substitutions are not neutral and that SENCA provides more accurate estimates of selection compared with more classical codon sequence models. We study the forces that drive the genomic content evolution, intraspecifically in the core genome of 21 prokaryotes and interspecifically for five Enterobacteria. We retrieve the existence of a universal mutational bias toward AT, and that taking into account selection on synonymous codon usage has consequences on the measurement of selection on nonsynonymous substitutions. We also confirm that codon usage bias is mostly driven by selection on preferred codons. We propose new summary statistics to measure the relative importance of the different evolutionary processes acting on sequences. PMID:27401173

  1. Complete genome sequence of a new begomovirus associated with yellow mosaic disease of Hemidesmus indicus in India.

    PubMed

    Reddy, M Sreekanth; Kanakala, S; Srinivas, K P; Hema, M; Malathi, V G; Sreenivasulu, P

    2014-05-01

    The complete DNA A genome of a virus isolate associated with yellow mosaic disease of a medicinal plant, Hemidesmus indicus, from India was cloned and sequenced. The length of DNA A was 2825 nucleotides, 35 nucleotides longer than the unit genome of monopartite begomoviruses. Comparison of the nucleotide sequence of DNA A of the virus isolate with those of other begomoviruses showed maximum sequence identity of 69 % to DNA A of ageratum yellow vein China virus (AYVCNV; AJ558120) and 68 % with tomato yellow leaf curl virus- LBa4 (TYLCV; EF185318), and it formed a distinct clade in phylogenetic analysis. The genome organization of the present virus isolate was found to be similar to that of Old World monopartite begomoviruses. The genome was considered to be monopartite, because association of DNA B and β satellite DNA components was not detected. Based on its sequence identity (<70 %) to all other begomoviruses known to date and ICTV (International Committee on Taxonomy of Viruses) species demarcating criteria (<89 % identity), it is considered a member of a novel begomovirus species, and the tentative name "Hemidesmus yellow mosaic virus" (HeYMV) is proposed.

  2. INFO-RNA--a server for fast inverse RNA folding satisfying sequence constraints.

    PubMed

    Busch, Anke; Backofen, Rolf

    2007-07-01

    INFO-RNA is a new web server for designing RNA sequences that fold into a user given secondary structure. Furthermore, constraints on the sequence can be specified, e.g. one can restrict sequence positions to a fixed nucleotide or to a set of nucleotides. Moreover, the user can allow violations of the constraints at some positions, which can be advantageous in complicated cases. The INFO-RNA web server allows biologists to design RNA sequences in an automatic manner. It is clearly and intuitively arranged and easy to use. The procedure is fast, as most applications are completed within seconds and it proceeds better and faster than other existing tools. The INFO-RNA web server is freely available at http://www.bioinf.uni-freiburg.de/Software/INFO-RNA/

  3. INFO-RNA—a server for fast inverse RNA folding satisfying sequence constraints

    PubMed Central

    Busch, Anke; Backofen, Rolf

    2007-01-01

    INFO-RNA is a new web server for designing RNA sequences that fold into a user given secondary structure. Furthermore, constraints on the sequence can be specified, e.g. one can restrict sequence positions to a fixed nucleotide or to a set of nucleotides. Moreover, the user can allow violations of the constraints at some positions, which can be advantageous in complicated cases. The INFO-RNA web server allows biologists to design RNA sequences in an automatic manner. It is clearly and intuitively arranged and easy to use. The procedure is fast, as most applications are completed within seconds and it proceeds better and faster than other existing tools. The INFO-RNA web server is freely available at http://www.bioinf.uni-freiburg.de/Software/INFO-RNA/ PMID:17452349

  4. alpha-Amylase gene of Streptomyces limosus: nucleotide sequence, expression motifs, and amino acid sequence homology to mammalian and invertebrate alpha-amylases.

    PubMed Central

    Long, C M; Virolle, M J; Chang, S Y; Chang, S; Bibb, M J

    1987-01-01

    The nucleotide sequence of the coding and regulatory regions of the alpha-amylase gene (aml) of Streptomyces limosus was determined. High-resolution S1 mapping was used to locate the 5' end of the transcript and demonstrated that the gene is transcribed from a unique promoter. The predicted amino acid sequence has considerable identity to mammalian and invertebrate alpha-amylases, but not to those of plant, fungal, or eubacterial origin. Consistent with this is the susceptibility of the enzyme to an inhibitor of mammalian alpha-amylases. The amino-terminal sequence of the extracellular enzyme was determined, revealing the presence of a typical signal peptide preceding the mature form of the alpha-amylase. Images PMID:3500166

  5. Complete genome analysis of jasmine virus T from Jasminum sambac in China.

    PubMed

    Tang, Yajun; Gao, Fangluan; Yang, Zhen; Wu, Zujian; Yang, Liang

    2016-07-01

    The genome of a potyvirus (isolate JaVT_FZ) recovered from jasmine (Jasminum sambac L.) showing yellow ringspot symptoms in Fuzhou, China, was sequenced. JaVT_FZ is closely related to seven other potyviruses with completely sequenced genomes, with which it shares 66-70 % nucleotide and 52-56 % amino acid sequence identity. However, the coat protein (CP) gene shares 82-92 % nucleotide and 90-97 % amino acid sequence identity with those of two partially sequenced potyviruses, named jasmine potyvirus T (JaVT-jasmine) and jasmine yellow mosaic potyvirus (JaYMV-India), respectively. This suggests that JaVT_FZ, JaVT-jasmine and JaYMV-India should be regarded as members of a single potyvirus species, for which the name "Jasmine virus T" has priority.

  6. Nucleotide sequence of the phosphoglycerate kinase gene from the extreme thermophile Thermus thermophilus. Comparison of the deduced amino acid sequence with that of the mesophilic yeast phosphoglycerate kinase.

    PubMed Central

    Bowen, D; Littlechild, J A; Fothergill, J E; Watson, H C; Hall, L

    1988-01-01

    Using oligonucleotide probes derived from amino acid sequencing information, the structural gene for phosphoglycerate kinase from the extreme thermophile, Thermus thermophilus, was cloned in Escherichia coli and its complete nucleotide sequence determined. The gene consists of an open reading frame corresponding to a protein of 390 amino acid residues (calculated Mr 41,791) with an extreme bias for G or C (93.1%) in the codon third base position. Comparison of the deduced amino acid sequence with that of the corresponding mesophilic yeast enzyme indicated a number of significant differences. These are discussed in terms of the unusual codon bias and their possible role in enhanced protein thermal stability. Images Fig. 1. PMID:3052437

  7. Parsimony and Model-Based Analyses of Indels in Avian Nuclear Genes Reveal Congruent and Incongruent Phylogenetic Signals

    PubMed Central

    Yuri, Tamaki; Kimball, Rebecca T.; Harshman, John; Bowie, Rauri C. K.; Braun, Michael J.; Chojnowski, Jena L.; Han, Kin-Lan; Hackett, Shannon J.; Huddleston, Christopher J.; Moore, William S.; Reddy, Sushma; Sheldon, Frederick H.; Steadman, David W.; Witt, Christopher C.; Braun, Edward L.

    2013-01-01

    Insertion/deletion (indel) mutations, which are represented by gaps in multiple sequence alignments, have been used to examine phylogenetic hypotheses for some time. However, most analyses combine gap data with the nucleotide sequences in which they are embedded, probably because most phylogenetic datasets include few gap characters. Here, we report analyses of 12,030 gap characters from an alignment of avian nuclear genes using maximum parsimony (MP) and a simple maximum likelihood (ML) framework. Both trees were similar, and they exhibited almost all of the strongly supported relationships in the nucleotide tree, although neither gap tree supported many relationships that have proven difficult to recover in previous studies. Moreover, independent lines of evidence typically corroborated the nucleotide topology instead of the gap topology when they disagreed, although the number of conflicting nodes with high bootstrap support was limited. Filtering to remove short indels did not substantially reduce homoplasy or reduce conflict. Combined analyses of nucleotides and gaps resulted in the nucleotide topology, but with increased support, suggesting that gap data may prove most useful when analyzed in combination with nucleotide substitutions. PMID:24832669

  8. Genetic diversity and classification of Tibetan yak populations based on the mtDNA COIII gene.

    PubMed

    Song, Q Q; Chai, Z X; Xin, J W; Zhao, S J; Ji, Q M; Zhang, C F; Ma, Z J; Zhong, J C

    2015-03-13

    To determine the level of genetic diversity and phylogenetic relationships among Tibetan yak populations, the mitochondrial DNA cytochrome c oxidase subunit 3 (COIII) genes of 378 yak individuals from 16 populations were analyzed in this study. The results showed that the length of cytochrome c oxidase subunit 3 gene sequences was 781 bp, with nucleotide frequencies of 29.2, 29.4, 26.1, and 15.2% for T, C, A, and G, respectively. A total of 26 haplotypes were identified, with 69 polymorphic sites, including 11 parsimony-informative sites and 58 single-nucleotide polymorphism sites. No deletions/insertions were found in sequence comparison, indicating that nucleotide mutation types were transitions and transversions. Haplotype and nucleotide diversities were 0.562 and 0.00138, respectively, indicating a high level of genetic diversity in Tibetan yak populations. Phylogenetic relationship analysis indicated that Tibetan yak populations are divided into 2 groups.

  9. A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution.

    PubMed

    Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme

    2013-07-01

    The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/. Supplementary data are available at Bioinformatics online.

  10. Three closely related herpesviruses are associated with fibropapillomatosis in marine turtles

    USGS Publications Warehouse

    Quackenbush, S.L.; Work, Thierry M.; Balazs, George H.; Casey, Rufina N.; Rovnak, J.; Chaves, A.; duToit, L.; Baines, J.D.; Parrish, C.R.; Bowser, Paul R.; Casey, James W.

    1998-01-01

    Green turtle fibropapillomatosis is a neoplastic disease of increasingly significant threat to the survivability of this species. Degenerate PCR primers that target highly conserved regions of genes encoding herpesvirus DNA polymerases were used to amplify a DNA sequence from fibropapillomas and fibromas from Hawaiian and Florida green turtles. All of the tumors tested (n= 23) were found to harbor viral DNA, whereas no viral DNA was detected in skin biopsies from tumor-negative turtles. The tissue distribution of the green turtle herpesvirus appears to be generally limited to tumors where viral DNA was found to accumulate at approximately two to five copies per cell and is occasionally detected, only by PCR, in some tissues normally associated with tumor development. In addition, herpesviral DNA was detected in fibropapillomas from two loggerhead and four olive ridley turtles. Nucleotide sequencing of a 483-bp fragment of the turtle herpesvirus DNA polymerase gene determined that the Florida green turtle and loggerhead turtle sequences are identical and differ from the Hawaiian green turtle sequence by five nucleotide changes, which results in two amino acid substitutions. The olive ridley sequence differs from the Florida and Hawaiian green turtle sequences by 15 and 16 nucleotide changes, respectively, resulting in four amino acid substitutions, three of which are unique to the olive ridley sequence. Our data suggest that these closely related turtle herpesviruses are intimately involved in the genesis of fibropapillomatosis.

  11. Detection and characterization of hepatitis A virus circulating in Egypt.

    PubMed

    Hamza, Hazem; Abd-Elshafy, Dina Nadeem; Fayed, Sayed A; Bahgat, Mahmoud Mohamed; El-Esnawy, Nagwa Abass; Abdel-Mobdy, Emam

    2017-07-01

    Hepatitis A virus (HAV) still poses a considerable problem worldwide. In the current study, hepatitis A virus was recovered from wastewater samples collected from three wastewater treatment plants over one year. Using RT-PCR, HAV was detected in 43 out of 68 samples (63.2%) representing both inlet and outlet. Eleven positive samples were subjected to sequencing targeting the VP1-2A junction region. Phylogenetic analysis revealed that all samples belonged to subgenotype IB with few substitutions at the amino acid level. The complete sequence of one isolate (HAV/Egy/BI-11/2015) showed that the similarity at the amino acid level was not reflected at the nucleotide level. However, the deduced amino acid sequence derived from the complete nucleotide sequence showed distinct substitutions in the 2B, 2C, and 3A regions. Recombination analysis revealed a recombination event between X75215 (subgenotype IA) and AF268396 (subgenotype IB) involving a portion of the 2B nonstructural protein coding region (nucleotides 3757-3868) assuming the herein characterized sequence an actual recombinant. Despite the role of recombination in picornaviruses evolution, its involvement in HAV evolution has rarely been reported, and this may be due to the limited available complete HAV sequences. To our knowledge, this represents the first characterized complete sequence of an Egyptian isolate and the described recombination event provides an important update on the circulating HAV strains in Egypt.

  12. Sequence dependence of electron-induced DNA strand breakage revealed by DNA nanoarrays

    PubMed Central

    Keller, Adrian; Rackwitz, Jenny; Cauët, Emilie; Liévin, Jacques; Körzdörfer, Thomas; Rotaru, Alexandru; Gothelf, Kurt V.; Besenbacher, Flemming; Bald, Ilko

    2014-01-01

    The electronic structure of DNA is determined by its nucleotide sequence, which is for instance exploited in molecular electronics. Here we demonstrate that also the DNA strand breakage induced by low-energy electrons (18 eV) depends on the nucleotide sequence. To determine the absolute cross sections for electron induced single strand breaks in specific 13 mer oligonucleotides we used atomic force microscopy analysis of DNA origami based DNA nanoarrays. We investigated the DNA sequences 5′-TT(XYX)3TT with X = A, G, C and Y = T, BrU 5-bromouracil and found absolute strand break cross sections between 2.66 · 10−14 cm2 and 7.06 · 10−14 cm2. The highest cross section was found for 5′-TT(ATA)3TT and 5′-TT(ABrUA)3TT, respectively. BrU is a radiosensitizer, which was discussed to be used in cancer radiation therapy. The replacement of T by BrU into the investigated DNA sequences leads to a slight increase of the absolute strand break cross sections resulting in sequence-dependent enhancement factors between 1.14 and 1.66. Nevertheless, the variation of strand break cross sections due to the specific nucleotide sequence is considerably higher. Thus, the present results suggest the development of targeted radiosensitizers for cancer radiation therapy. PMID:25487346

  13. Designing, optimization and validation of tetra-primer ARMS PCR protocol for genotyping mutations in caprine Fec genes

    PubMed Central

    Ahlawat, Sonika; Sharma, Rekha; Maitra, A.; Roy, Manoranjan; Tantia, M.S.

    2014-01-01

    New, quick, and inexpensive methods for genotyping novel caprine Fec gene polymorphisms through tetra-primer ARMS PCR were developed in the present investigation. Single nucleotide polymorphism (SNP) genotyping needs to be attempted to establish association between the identified mutations and traits of economic importance. In the current study, we have successfully genotyped three new SNPs identified in caprine fecundity genes viz. T(-242)C (BMPR1B), G1189A (GDF9) and G735A (BMP15). Tetra-primer ARMS PCR protocol was optimized and validated for these SNPs with short turn-around time and costs. The optimized techniques were tested on 158 random samples of Black Bengal goat breed. Samples with known genotypes for the described genes, previously tested in duplicate using the sequencing methods, were employed for validation of the assay. Upon validation, complete concordance was observed between the tetra-primer ARMS PCR assays and the sequencing results. These results highlight the ability of tetra-primer ARMS PCR in genotyping of mutations in Fec genes. Any associated SNP could be used to accelerate the improvement of goat reproductive traits by identifying high prolific animals at an early stage of life. Our results provide direct evidence that tetra-primer ARMS-PCR is a rapid, reliable, and cost-effective method for SNP genotyping of mutations in caprine Fec genes. PMID:25606428

  14. Variation in the Nucleotide Sequence of Cottontail Rabbit Papillomavirus a and b Subtypes Affects Wart Regression and Malignant Transformation and Level of Viral Replication in Domestic Rabbits

    PubMed Central

    Salmon, Jérôme; Nonnenmacher, Mathieu; Cazé, Sandrine; Flamant, Patricia; Croissant, Odile; Orth, Gérard; Breitburd, Françoise

    2000-01-01

    We previously reported the partial characterization of two cottontail rabbit papillomavirus (CRPV) subtypes with strikingly divergent E6 and E7 oncoproteins. We report now the complete nucleotide sequences of these subtypes, referred to as CRPVa4 (7,868 nucleotides) and CRPVb (7,867 nucleotides). The CRPVa4 and CRPVb genomes differed at 238 (3%) nucleotide positions, whereas CRPVa4 and the prototype CRPV differed by only 5 nucleotides. The most variable region (7% nucleotide divergence) included the long regulatory region (LRR) and the E6 and E7 genes. A mutation in the stop codon resulted in an 8-amino-acid-longer CRPVb E4 protein, and a nucleotide deletion reduced the coding capacity of the E5 gene from 101 to 25 amino acids. In domestic rabbits homozygous for a specific haplotype of the DRA and DQA genes of the major histocompatibility complex, warts induced by CRPVb DNA or a chimeric genome containing the CRPVb LRR/E6/E7 region showed an early regression, whereas warts induced by CRPVa4 or a chimeric genome containing the CRPVa4 LRR/E6/E7 region persisted and evolved into carcinomas. In contrast, most CRPVa, CRPVb, and chimeric CRPV DNA-induced warts showed no early regression in rabbits homozygous for another DRA-DQA haplotype. Little, if any, viral replication is usually observed in domestic rabbit warts. When warts induced by CRPVa and CRPVb virions and DNA were compared, the number of cells positive for viral DNA or capsid antigens was found to be greater by 1 order of magnitude for specimens induced by CRPVb. Thus, both sequence variation in the LRR/E6/E7 region and the genetic constitution of the host influence the expression of the oncogenic potential of CRPV. Furthermore, intratype variation may overcome to some extent the host restriction of CRPV replication in domestic rabbits. PMID:11044121

  15. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

    PubMed

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske

    2007-02-14

    The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.

  16. Bellerophon: A program to detect chimeric sequences in multiple sequence alignments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip

    2003-12-23

    Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments.

  17. Transcripts of the NADH-dehydrogenase subunit 3 gene are differentially edited in Oenothera mitochondria.

    PubMed Central

    Schuster, W; Wissinger, B; Unseld, M; Brennicke, A

    1990-01-01

    A number of cytosines are altered to be recognized as uridines in transcripts of the nad3 locus in mitochondria of the higher plant Oenothera. Such nucleotide modifications can be found at 16 different sites within the nad3 coding region. Most of these alterations in the mRNA sequence change codon identities to specify amino acids better conserved in evolution. Individual cDNA clones differ in their degree of editing at five nucleotide positions, three of which are silent, while two lead to codon alterations specifying different amino acids. None of the cDNA clones analysed is maximally edited at all possible sites, suggesting slow processing or lowered stringency of editing at these nucleotides. Differentially edited transcripts could be editing intermediates or could code for differing polypeptides. Two edited nucleotides in an open reading frame located upstream of nad3 change two amino acids in the deduced polypeptide. Part of the well-conserved ribosomal protein gene rps12 also encoded downstream of nad3 in other plants, is lost in Oenothera mitochondria by recombination events. The functional rps12 protein must be imported from the cytoplasm since the deleted sequences of this gene are not found in the Oenothera mitochondrial genome. The pseudogene sequence is not edited at any nucleotide position. Images Fig. 3. Fig. 4. Fig. 7. PMID:1688531

  18. Nucleotide sequence and further characterization of the Synechococcus sp. strain PCC 7002 recA gene: complementation of a cyanobacterial recA mutation by the Escherichia coli recA gene.

    PubMed Central

    Murphy, R C; Gasparich, G E; Bryant, D A; Porter, R D

    1990-01-01

    The nucleotide sequence and transcript initiation site of the Synechococcus sp. strain PCC 7002 recA gene have been determined. The deduced amino acid sequence of the RecA protein of this cyanobacterium is 56% identical and 73% similar to the Escherichia coli RecA protein. Northern (RNA) blot analysis indicates that the Synechococcus strain PCC 7002 recA gene is transcribed as a monocistronic transcript 1,200 bases in length. The 5' endpoint of the recA mRNA was mapped by primer extension by using synthetic oligonucleotides of 17 and 27 nucleotides as primers. The nucleotide sequence 5' to the mapped endpoint contained sequence motifs bearing a striking resemblance to the heat shock (sigma 32-specific) promoters of E. coli but did not contain sequences similar to the E. coli SOS operator recognized by the LexA repressor. An insertion mutation introduced into the recA locus of Synechococcus strain PCC 7002 via homologous recombination resulted in the formation of diploids carrying both mutant and wild-type recA alleles. A variety of growth regimens and transformation procedures failed to produce a recA Synechococcus strain PCC 7002 mutant. However, introduction into these diploid cells of the E. coli recA gene in trans on a biphasic shuttle vector resulted in segregation of the cyanobacterial recA alleles, indicating that the E. coli recA gene was able to provide a function required for growth of recA Synechococcus strain PCC 7002 cells. This interpretation is supported by the observation that the E. coli recA gene is maintained in these cells when antibiotic selection for the shuttle vector is removed. Images FIG. 3 FIG. 4 FIG. 6 PMID:2105307

  19. Identification of phylogenetic position in the Chlamydiaceae family for Chlamydia strains released from monkeys and humans with chlamydial pathology.

    PubMed

    Karaulov, Alexander; Aleshkin, Vladimir; Slobodenyuk, Vladimir; Grechishnikova, Olga; Afanasyev, Stanislav; Lapin, Boris; Dzhikidze, Eteri; Nesvizhsky, Yuriy; Evsegneeva, Irina; Voropayeva, Elena; Afanasyev, Maxim; Aleshkin, Andrei; Metelskaya, Valeria; Yegorova, Ekaterina; Bayrakova, Alexandra

    2010-01-01

    Based on the results of the comparative analysis concerning relatedness and evolutional difference of the 16S-23S nucleotide sequences of the middle ribosomal cluster and 23S rRNA I domain, and based on identification of phylogenetic position for Chlamydophila pneumoniae and Chlamydia trichomatis strains released from monkeys, relatedness of the above stated isolates with similar strains released from humans and with strains having nucleotide sequences presented in the GenBank electronic database has been detected for the first time ever. Position of these isolates in the Chlamydiaceae family phylogenetic tree has been identified. The evolutional position of the investigated original Chlamydia and Chlamydophila strains close to analogous strains from the Gen-Bank electronic database has been demonstrated. Differences in the 16S-23S nucleotide sequence of the middle ribosomal cluster and 23S rRNA I domain of plasmid and nonplasmid Chlamydia trachomatis strains released from humans and monkeys relative to different genotype groups (group B-B, Ba, D, Da, E, L1, L2, L2a; intermediate group-F, G, Ga) have been revealed for the first time ever. Abnormality in incA chromosomal gene expression resulting in Chlamydia life development cycle disorder, and decrease of Chlamydia virulence can be related to probable changes in the nucleotide sequence of the gene under consideration.

  20. Complete genome sequence of maize yellow striate virus, a new cytorhabdovirus infecting maize and wheat crops in Argentina.

    PubMed

    Maurino, Fernanda; Dumón, Analía D; Llauger, Gabriela; Alemandri, Vanina; de Haro, Luis A; Mattio, M Fernanda; Del Vas, Mariana; Laguna, Irma Graciela; Giménez Pecci, María de la Paz

    2018-01-01

    A rhabdovirus infecting maize and wheat crops in Argentina was molecularly characterized. Through next-generation sequencing (NGS) of symptomatic leaf samples, the complete genome was obtained of two isolates of maize yellow striate virus (MYSV), a putative new rhabdovirus, differing by only 0.4% at the nucleotide level. The MYSV genome consists of 12,654 nucleotides for maize and wheat virus isolates, and shares 71% nucleotide sequence identity with the complete genome of barley yellow striate mosaic virus (BYSMV, NC028244). Ten open reading frames (ORFs) were predicted in the MYSV genome from the antigenomic strand and were compared with their BYSMV counterparts. The highest amino acid sequence identity of the MYSV and BYSMV proteins was 80% between the L proteins, and the lowest was 37% between the proteins 4. Phylogenetic analysis suggested that the MYSV isolates are new members of the genus Cytorhabdovirus, family Rhabdoviridae. Yellow striate, affecting maize and wheat crops in Argentina, is an emergent disease that presents a potential economic risk for these widely distributed crops.

  1. [A new variant of the simian T-lymphotropic retrovirus type I (STLV-IF) in the Sukhumi colony of hamadryas baboons].

    PubMed

    Chikobaeva, M G; Schatzl, H; Rose, D; Bush, U; Iakovleva, L A; Deinhardt, F; Helm, K; Lapin, B A

    1993-01-01

    Polymerase chain reaction (PCR) was developed for the detection of simian T-lymphotropic virus type 1 (STLV-1) infection of P. hamadryas and direct sequencing using oligo-nucleotide primer pairs specific for the tax and env regions of the related human T-lymphotropic virus type 1 (HTLV-1). Excellent specificity was shown in the detection of STLV-1 provirus in infected baboons by PCR using HTLV-1-derived primers. The nucleotide sequences of env 467bp and tax 159bp of the proviral genome (env position 5700-6137, tax position 7373-7498 HTLV-1, according to Seiki et al., 1983) derived from STLV-1-infected P. hamadryas were analysed using PCR and direct sequencing techniques. Two STLV-1 isolates from different sources (Sukhumi main-SuTLV-1 and forest stocks-STLV-1F) were compared. Two variants of STLV-1 among P. hamadryas with different level of homology to HTLV-1 were wound (83.8% and 95.2%, respectively). A possible role of nucleotide changes in env and tax sequenced fragments and oncogenicity of STLV-1 variants is discussed.

  2. Reference genotype and exome data from an Australian Aboriginal population for health-based research

    PubMed Central

    Tang, Dave; Anderson, Denise; Francis, Richard W.; Syn, Genevieve; Jamieson, Sarra E.; Lassmann, Timo; Blackwell, Jenefer M.

    2016-01-01

    Genetic analyses, including genome-wide association studies and whole exome sequencing (WES), provide powerful tools for the analysis of complex and rare genetic diseases. To date there are no reference data for Aboriginal Australians to underpin the translation of health-based genomic research. Here we provide a catalogue of variants called after sequencing the exomes of 72 Aboriginal individuals to a depth of 20X coverage in ∼80% of the sequenced nucleotides. We determined 320,976 single nucleotide variants (SNVs) and 47,313 insertions/deletions using the Genome Analysis Toolkit. We had previously genotyped a subset of the Aboriginal individuals (70/72) using the Illumina Omni2.5 BeadChip platform and found ~99% concordance at overlapping sites, which suggests high quality genotyping. Finally, we compared our SNVs to six publicly available variant databases, such as dbSNP and the Exome Sequencing Project, and 70,115 of our SNVs did not overlap any of the single nucleotide polymorphic sites in all the databases. Our data set provides a useful reference point for genomic studies on Aboriginal Australians. PMID:27070114

  3. Reference genotype and exome data from an Australian Aboriginal population for health-based research.

    PubMed

    Tang, Dave; Anderson, Denise; Francis, Richard W; Syn, Genevieve; Jamieson, Sarra E; Lassmann, Timo; Blackwell, Jenefer M

    2016-04-12

    Genetic analyses, including genome-wide association studies and whole exome sequencing (WES), provide powerful tools for the analysis of complex and rare genetic diseases. To date there are no reference data for Aboriginal Australians to underpin the translation of health-based genomic research. Here we provide a catalogue of variants called after sequencing the exomes of 72 Aboriginal individuals to a depth of 20X coverage in ∼80% of the sequenced nucleotides. We determined 320,976 single nucleotide variants (SNVs) and 47,313 insertions/deletions using the Genome Analysis Toolkit. We had previously genotyped a subset of the Aboriginal individuals (70/72) using the Illumina Omni2.5 BeadChip platform and found ~99% concordance at overlapping sites, which suggests high quality genotyping. Finally, we compared our SNVs to six publicly available variant databases, such as dbSNP and the Exome Sequencing Project, and 70,115 of our SNVs did not overlap any of the single nucleotide polymorphic sites in all the databases. Our data set provides a useful reference point for genomic studies on Aboriginal Australians.

  4. Nonneutral mitochondrial DNA variation in humans and chimpanzees

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nachman, M.W.; Aquadro, C.F.; Brown, W.M.

    1996-03-01

    We sequenced the NADH dehydrogenase subunit 3 (ND3) gene from a sample of 61 humans, five common chimpanzees, and one gorilla to test whether patterns of mitochondrial DNA (mtDNA) variation are consistent with a neutral model of molecular evolution. Within humans and within chimpanzees, the ratio of replacement to silent nucleotide substitutions was higher than observed in comparisons between species, contrary to neutral expectations. To test the generality of this result, we reanalyzed published human RFLP data from the entire mitochondrial genome. Gains of restriction sites relative to a known human mtDNA sequence were used to infer unambiguous nucleotide substitutions.more » We also compared the complete mtDNA sequences of three humans. Both the RFLP data and the sequence data reveal a higher ratio of replacement to silent nucleotide substitutions within humans than is seen between species. This pattern is observed at most or all human mitochondrial genes and is inconsistent with a strictly neutral model. These data suggest that many mitochondrial protein polymorphisms are slightly deleterious, consistent with studies of human mitochondrial diseases. 59 refs., 2 figs., 8 tabs.« less

  5. Analysis of the genome sequence of the pathogenic Muscovy duck parvovirus strain YY reveals a 14-nucleotide-pair deletion in the inverted terminal repeats.

    PubMed

    Wang, Jianye; Huang, Yu; Zhou, Mingxu; Zhu, Guoqiang

    2016-09-01

    Genomic information about Muscovy duck parvovirus is still limited. In this study, the genome of the pathogenic MDPV strain YY was sequenced. The full-length genome of YY is 5075 nucleotides (nt) long, 57 nt shorter than that of strain FM. Sequence alignment indicates that the 5' and 3' inverted terminal repeats (ITR) of strain YY contain a 14-nucleotide-pair deletion in the stem of the palindromic hairpin structure in comparison to strain FM and FZ91-30. The deleted region contains one "E-box" site and one repeated motif with the sequence "TTCCGGT" or "ACCGGAA". Phylogenetic trees constructed based the protein coding genes concordantly showed that YY, together with nine other MDPV isolates from various places, clustered in a separate branch, distinct from the branch formed by goose parvovirus (GPV) strains. These results demonstrate that, despite the distinctive deletion, the YY strain still belongs to the classical MDPV group. Moreover, the deletion of ITR may contribute to the genome evolution of MDPV under immunization pressure.

  6. Absence of ancient DNA in sub-fossil insect inclusions preserved in 'Anthropocene' Colombian copal.

    PubMed

    Penney, David; Wadsworth, Caroline; Fox, Graeme; Kennedy, Sandra L; Preziosi, Richard F; Brown, Terence A

    2013-01-01

    Insects preserved in copal, the sub-fossilized resin precursor of amber, have potential value in molecular ecological studies of recently-extinct species and of extant species that have never been collected as living specimens. The objective of the work reported in this paper was therefore to determine if ancient DNA is present in insects preserved in copal. We prepared DNA libraries from two stingless bees (Apidae: Meliponini: Trigonisca ameliae) preserved in 'Anthropocene' Colombian copal, dated to 'post-Bomb' and 10,612±62 cal yr BP, respectively, and obtained sequence reads using the GS Junior 454 System. Read numbers were low, but were significantly higher for DNA extracts prepared from crushed insects compared with extracts obtained by a non-destructive method. The younger specimen yielded sequence reads up to 535 nucleotides in length, but searches of these sequences against the nucleotide database revealed very few significant matches. None of these hits was to stingless bees though one read of 97 nucleotides aligned with two non-contiguous segments of the mitochondrial cytochrome oxidase subunit I gene of the East Asia bumblebee Bombus hypocrita. The most significant hit was for 452 nucleotides of a 470-nucleotide read that aligned with part of the genome of the root-nodulating bacterium Bradyrhizobium japonicum. The other significant hits were to proteobacteria and an actinomycete. Searches directed specifically at Apidae nucleotide sequences only gave short and insignificant alignments. All of the reads from the older specimen appeared to be artefacts. We were therefore unable to obtain any convincing evidence for the preservation of ancient DNA in either of the two copal inclusions that we studied, and conclude that DNA is not preserved in this type of material. Our results raise further doubts about claims of DNA extraction from fossil insects in amber, many millions of years older than copal.

  7. Absence of Ancient DNA in Sub-Fossil Insect Inclusions Preserved in ‘Anthropocene’ Colombian Copal

    PubMed Central

    Penney, David; Wadsworth, Caroline; Fox, Graeme; Kennedy, Sandra L.; Preziosi, Richard F.; Brown, Terence A.

    2013-01-01

    Insects preserved in copal, the sub-fossilized resin precursor of amber, have potential value in molecular ecological studies of recently-extinct species and of extant species that have never been collected as living specimens. The objective of the work reported in this paper was therefore to determine if ancient DNA is present in insects preserved in copal. We prepared DNA libraries from two stingless bees (Apidae: Meliponini: Trigonisca ameliae) preserved in ‘Anthropocene’ Colombian copal, dated to ‘post-Bomb’ and 10,612±62 cal yr BP, respectively, and obtained sequence reads using the GS Junior 454 System. Read numbers were low, but were significantly higher for DNA extracts prepared from crushed insects compared with extracts obtained by a non-destructive method. The younger specimen yielded sequence reads up to 535 nucleotides in length, but searches of these sequences against the nucleotide database revealed very few significant matches. None of these hits was to stingless bees though one read of 97 nucleotides aligned with two non-contiguous segments of the mitochondrial cytochrome oxidase subunit I gene of the East Asia bumblebee Bombus hypocrita. The most significant hit was for 452 nucleotides of a 470-nucleotide read that aligned with part of the genome of the root-nodulating bacterium Bradyrhizobium japonicum. The other significant hits were to proteobacteria and an actinomycete. Searches directed specifically at Apidae nucleotide sequences only gave short and insignificant alignments. All of the reads from the older specimen appeared to be artefacts. We were therefore unable to obtain any convincing evidence for the preservation of ancient DNA in either of the two copal inclusions that we studied, and conclude that DNA is not preserved in this type of material. Our results raise further doubts about claims of DNA extraction from fossil insects in amber, many millions of years older than copal. PMID:24039876

  8. Phylogenetic Network for European mtDNA

    PubMed Central

    Finnilä, Saara; Lehtonen, Mervi S.; Majamaa, Kari

    2001-01-01

    The sequence in the first hypervariable segment (HVS-I) of the control region has been used as a source of evolutionary information in most phylogenetic analyses of mtDNA. Population genetic inference would benefit from a better understanding of the variation in the mtDNA coding region, but, thus far, complete mtDNA sequences have been rare. We determined the nucleotide sequence in the coding region of mtDNA from 121 Finns, by conformation-sensitive gel electrophoresis and subsequent sequencing and by direct sequencing of the D loop. Furthermore, 71 sequences from our previous reports were included, so that the samples represented all the mtDNA haplogroups present in the Finnish population. We found a total of 297 variable sites in the coding region, which allowed the compilation of unambiguous phylogenetic networks. The D loop harbored 104 variable sites, and, in most cases, these could be localized within the coding-region networks, without discrepancies. Interestingly, many homoplasies were detected in the coding region. Nucleotide variation in the rRNA and tRNA genes was 6%, and that in the third nucleotide positions of structural genes amounted to 22% of that in the HVS-I. The complete networks enabled the relationships between the mtDNA haplogroups to be analyzed. Phylogenetic networks based on the entire coding-region sequence in mtDNA provide a rich source for further population genetic studies, and complete sequences make it easier to differentiate between disease-causing mutations and rare polymorphisms. PMID:11349229

  9. HIV Resistance Prediction to Reverse Transcriptase Inhibitors: Focus on Open Data.

    PubMed

    Tarasova, Olga; Poroikov, Vladimir

    2018-04-19

    Research and development of new antiretroviral agents are in great demand due to issues with safety and efficacy of the antiretroviral drugs. HIV reverse transcriptase (RT) is an important target for HIV treatment. RT inhibitors targeting early stages of the virus-host interaction are of great interest for researchers. There are a lot of clinical and biochemical data on relationships between the occurring of the single point mutations and their combinations in the pol gene of HIV and resistance of the particular variants of HIV to nucleoside and non-nucleoside reverse transcriptase inhibitors. The experimental data stored in the databases of HIV sequences can be used for development of methods that are able to predict HIV resistance based on amino acid or nucleotide sequences. The data on HIV sequences resistance can be further used for (1) development of new antiretroviral agents with high potential for HIV inhibition and elimination and (2) optimization of antiretroviral therapy. In our communication, we focus on the data on the RT sequences and HIV resistance, which are available on the Internet. The experimental methods, which are applied to produce the data on HIV-1 resistance, the known data on their concordance, are also discussed.

  10. Human jagged polypeptide, encoding nucleic acids and methods of use

    DOEpatents

    Li, Linheng; Hood, Leroy

    2000-01-01

    The present invention provides an isolated polypeptide exhibiting substantially the same amino acid sequence as JAGGED, or an active fragment thereof, provided that the polypeptide does not have the amino acid sequence of SEQ ID NO:5 or SEQ ID NO:6. The invention further provides an isolated nucleic acid molecule containing a nucleotide sequence encoding substantially the same amino acid sequence as JAGGED, or an active fragment thereof, provided that the nucleotide sequence does not encode the amino acid sequence of SEQ ID NO:5 or SEQ ID NO:6. Also provided herein is a method of inhibiting differentiation of hematopoietic progenitor cells by contacting the progenitor cells with an isolated JAGGED polypeptide, or active fragment thereof. The invention additionally provides a method of diagnosing Alagille Syndrome in an individual. The method consists of detecting an Alagille Syndrome disease-associated mutation linked to a JAGGED locus.

  11. Methods of diagnosing alagille syndrome

    DOEpatents

    Li, Linheng; Hood, Leroy; Krantz, Ian D.; Spinner, Nancy B.

    2004-03-09

    The present invention provides an isolated polypeptide exhibiting substantially the same amino acid sequence as JAGGED, or an active fragment thereof, provided that the polypeptide does not have the amino acid sequence of SEQ ID NO:5 or SEQ ID NO:6. The invention further provides an isolated nucleic acid molecule containing a nucleotide sequence encoding substantially the same amino acid sequence as JAGGED, or an active fragment thereof, provided that the nucleotide sequence does not encode the amino acid sequence of SEQ ID NO:5 or SEQ ID NO:6. Also provided herein is a method of inhibiting differentiation of hematopoietic progenitor cells by contacting the progenitor cells with an isolated JAGGED polypeptide, or active fragment thereof. The invention additionally provides a method of diagnosing Alagille Syndrome in an individual. The method consists of detecting an Alagille Syndrome disease-associated mutation linked to a JAGGED locus.

  12. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  13. [Identification and phylogenetic application of unique nucleotide sequence of nad7 intron2 in Rhodiola (Crassulaceae) species].

    PubMed

    Deng, Ke-Jun; Yang, Zu-Jun; Liu, Cheng; Zhao, Wei; Liu, Chang; Feng, Juan; Ren, Zheng-Long

    2007-03-01

    Genetic characterization of 9 populations of Rhodiola crenulata, R. fastigiata and R. sachalinensis (Crassulaceae) species from Sichuan and Jilin Provinces of China, was investigated using the conserved primer of nad7 intron 2. All PCR products about 800 bp long were shorter than other Crassulaceae plants, which were used as molecular markers to identify the Rhodiola species. The sequence of the products indicated that total exon of 53 bp and intron of 738 bp exhibit only 9 nucleotide variations. Blasting the nad7 sequences to GenBank and the phylogenetic analysis showed that the sequence of Rhodiola species was clusted independently, and the length was smaller than all the registered sequences of higher plants. The result suggests that the Rhiodola species had a unique sequence in this gene region, which might be related to the special growth condition.

  14. Single nucleotide polymorphisms from Theobroma cacao expressed sequence tags associated with witches' broom disease in cacao.

    PubMed

    Lima, L S; Gramacho, K P; Carels, N; Novais, R; Gaiotto, F A; Lopes, U V; Gesteira, A S; Zaidan, H A; Cascardo, J C M; Pires, J L; Micheli, F

    2009-07-14

    In order to increase the efficiency of cacao tree resistance to witches' broom disease, which is caused by Moniliophthora perniciosa (Tricholomataceae), we looked for molecular markers that could help in the selection of resistant cacao genotypes. Among the different markers useful for developing marker-assisted selection, single nucleotide polymorphisms (SNPs) constitute the most common type of sequence difference between alleles and can be easily detected by in silico analysis from expressed sequence tag libraries. We report the first detection and analysis of SNPs from cacao-M. perniciosa interaction expressed sequence tags, using bioinformatics. Selection based on analysis of these SNPs should be useful for developing cacao varieties resistant to this devastating disease.

  15. Routine HLA-B genotyping with PCR-sequence-specific oligonucleotides detects a B*52 variant (B*5206).

    PubMed

    Hoelsch, K; Lenggeler, I; Pfannes, W; Knabe, H; Klein, H-G; Woelpl, A

    2005-05-01

    A new human leukocyte antigen (HLA)-B allele was found during routine typing of samples for a German unrelated bone marrow donor registry, the "Aktion Knochenmarkspende Bayern". After first interpretation of data of two independent low-resolution sequence-specific oligonucleotide typing tests, a B*51 variant was suggested. Further analysis via sequence-based typing identified the sequence as new B*52 allele. This new allele officially assigned as B*5206 differs from HLA-B*520102 by one nucleotide exchange in exon 2. The mutation is located at nucleotide position 274, at which a cytosine is substituted by a thymine leading to an amino acid change at protein position 67 from serine (TCC) to phenylalanine (TTC).

  16. Large scale DNA microsequencing device

    DOEpatents

    Foote, R.S.

    1997-08-26

    A microminiature sequencing apparatus and method provide a means for simultaneously obtaining sequences of plural polynucleotide strands. The apparatus cosists of a microchip into which plural channels have been etched using standard lithographic procedures and chemical wet etching. The channels include a reaction well and a separating section. Enclosing the channels is accomplished by bonding a transparent cover plate over the apparatus. A first oligonucleotide strand is chemically affixed to the apparatus through an alkyl chain. Subsequent nucleotides are selected by complementary base pair bonding. A target nucleotide strand is used to produce a family of labelled sequencing strands in each channel which are separated in the separating section. During or following separation the sequences are determined using appropriate detection means. 17 figs.

  17. Isolates of viral hemorrhagic septicemia virus from North America and Europe can be detected and distinguished by DNA probes

    USGS Publications Warehouse

    Batts, W.N.; Arakawa, C.K.; Bernard, J.; Winton, J.R.

    1993-01-01

    Biotinylated DNA probes were constructed to hybndize with speclfic sequences within the messenger RNA (mRNA) of the nucleoprotein (N) gene of vlral hemorrhagic septicemia virus (VHSV) reference strains from Europe (07-71) and North Arnenca (Makah) Probes were synthesized that were complementary to (1) a 29-nucleotide sequence near the center of the N gene conlmon to both the 07-71 and Makah reference strains of the virus (2) a unique 28- nucleotide sequence that followed the open readng frame of the Makah N gene mRNA most of which was absent In the 07-71 strain, and (3) a 22-nucleobde sequence wthin the 07-71 N gene that had 6 nllsmatches \

  18. [Analysis of Conformational Features of Watson-Crick Duplex Fragments by Molecular Mechanics and Quantum Mechanics Methods].

    PubMed

    Poltev, V I; Anisimov, V M; Sanchez, C; Deriabina, A; Gonzalez, E; Garcia, D; Rivas, F; Polteva, N A

    2016-01-01

    It is generally accepted that the important characteristic features of the Watson-Crick duplex originate from the molecular structure of its subunits. However, it still remains to elucidate what properties of each subunit are responsible for the significant characteristic features of the DNA structure. The computations of desoxydinucleoside monophosphates complexes with Na-ions using density functional theory revealed a pivotal role of DNA conformational properties of single-chain minimal fragments in the development of unique features of the Watson-Crick duplex. We found that directionality of the sugar-phosphate backbone and the preferable ranges of its torsion angles, combined with the difference between purines and pyrimidines. in ring bases, define the dependence of three-dimensional structure of the Watson-Crick duplex on nucleotide base sequence. In this work, we extended these density functional theory computations to the minimal' fragments of DNA duplex, complementary desoxydinucleoside monophosphates complexes with Na-ions. Using several computational methods and various functionals, we performed a search for energy minima of BI-conformation for complementary desoxydinucleoside monophosphates complexes with different nucleoside sequences. Two sequences are optimized using ab initio method at the MP2/6-31++G** level of theory. The analysis of torsion angles, sugar ring puckering and mutual base positions of optimized structures demonstrates that the conformational characteristic features of complementary desoxydinucleoside monophosphates complexes with Na-ions remain within BI ranges and become closer to the corresponding characteristic features of the Watson-Crick duplex crystals. Qualitatively, the main characteristic features of each studied complementary desoxydinucleoside monophosphates complex remain invariant when different computational methods are used, although the quantitative values of some conformational parameters could vary lying within the limits typical for the corresponding family. We observe that popular functionals in density functional theory calculations lead to the overestimated distances between base pairs, while MP2 computations and the newer complex functionals produce the structures that have too close atom-atom contacts. A detailed study of some complementary desoxydinucleoside monophosphate complexes with Na-ions highlights the existence of several energy minima corresponding to BI-conformations, in other words, the complexity of the relief pattern of the potential energy surface of complementary desoxydinucleoside monophosphate complexes. This accounts for variability of conformational parameters of duplex fragments with the same base sequence. Popular molecular mechanics force fields AMBER and CHARMM reproduce most of the conformational characteristics of desoxydinucleoside monophosphates and their complementary complexes with Na-ions but fail to reproduce some details of the dependence of the Watson-Crick duplex conformation on the nucleotide sequence.

  19. Evaluation of an Optimal Epidemiological Typing Scheme for Legionella pneumophila with Whole-Genome Sequence Data Using Validation Guidelines

    PubMed Central

    Mentasti, Massimo; Tewolde, Rediat; Aslett, Martin; Harris, Simon R.; Afshar, Baharak; Underwood, Anthony; Harrison, Timothy G.

    2016-01-01

    Sequence-based typing (SBT), analogous to multilocus sequence typing (MLST), is the current “gold standard” typing method for investigation of legionellosis outbreaks caused by Legionella pneumophila. However, as common sequence types (STs) cause many infections, some investigations remain unresolved. In this study, various whole-genome sequencing (WGS)-based methods were evaluated according to published guidelines, including (i) a single nucleotide polymorphism (SNP)-based method, (ii) extended MLST using different numbers of genes, (iii) determination of gene presence or absence, and (iv) a kmer-based method. L. pneumophila serogroup 1 isolates (n = 106) from the standard “typing panel,” previously used by the European Society for Clinical Microbiology Study Group on Legionella Infections (ESGLI), were tested together with another 229 isolates. Over 98% of isolates were considered typeable using the SNP- and kmer-based methods. Percentages of isolates with complete extended MLST profiles ranged from 99.1% (50 genes) to 86.8% (1,455 genes), while only 41.5% produced a full profile with the gene presence/absence scheme. Replicates demonstrated that all methods offer 100% reproducibility. Indices of discrimination range from 0.972 (ribosomal MLST) to 0.999 (SNP based), and all values were higher than that achieved with SBT (0.940). Epidemiological concordance is generally inversely related to discriminatory power. We propose that an extended MLST scheme with ∼50 genes provides optimal epidemiological concordance while substantially improving the discrimination offered by SBT and can be used as part of a hierarchical typing scheme that should maintain backwards compatibility and increase discrimination where necessary. This analysis will be useful for the ESGLI to design a scheme that has the potential to become the new gold standard typing method for L. pneumophila. PMID:27280420

  20. Evaluation of an Optimal Epidemiological Typing Scheme for Legionella pneumophila with Whole-Genome Sequence Data Using Validation Guidelines.

    PubMed

    David, Sophia; Mentasti, Massimo; Tewolde, Rediat; Aslett, Martin; Harris, Simon R; Afshar, Baharak; Underwood, Anthony; Fry, Norman K; Parkhill, Julian; Harrison, Timothy G

    2016-08-01

    Sequence-based typing (SBT), analogous to multilocus sequence typing (MLST), is the current "gold standard" typing method for investigation of legionellosis outbreaks caused by Legionella pneumophila However, as common sequence types (STs) cause many infections, some investigations remain unresolved. In this study, various whole-genome sequencing (WGS)-based methods were evaluated according to published guidelines, including (i) a single nucleotide polymorphism (SNP)-based method, (ii) extended MLST using different numbers of genes, (iii) determination of gene presence or absence, and (iv) a kmer-based method. L. pneumophila serogroup 1 isolates (n = 106) from the standard "typing panel," previously used by the European Society for Clinical Microbiology Study Group on Legionella Infections (ESGLI), were tested together with another 229 isolates. Over 98% of isolates were considered typeable using the SNP- and kmer-based methods. Percentages of isolates with complete extended MLST profiles ranged from 99.1% (50 genes) to 86.8% (1,455 genes), while only 41.5% produced a full profile with the gene presence/absence scheme. Replicates demonstrated that all methods offer 100% reproducibility. Indices of discrimination range from 0.972 (ribosomal MLST) to 0.999 (SNP based), and all values were higher than that achieved with SBT (0.940). Epidemiological concordance is generally inversely related to discriminatory power. We propose that an extended MLST scheme with ∼50 genes provides optimal epidemiological concordance while substantially improving the discrimination offered by SBT and can be used as part of a hierarchical typing scheme that should maintain backwards compatibility and increase discrimination where necessary. This analysis will be useful for the ESGLI to design a scheme that has the potential to become the new gold standard typing method for L. pneumophila. Copyright © 2016 David et al.

  1. Molecular identification based on ITS sequences for Kappaphycus and Eucheuma cultivated in China

    NASA Astrophysics Data System (ADS)

    Zhao, Sufen; He, Peimin

    2011-11-01

    The systematic classification of the Eucheumatoideae is difficult because of their variable morphology and interpretation of reproductive structures. Kappaphycus and Eucheuma specimens cultivated on the Hainan and Fujian coast of China were introduced from Vietnam, the Philippines and Indonesia. Combined with morphological characteristics, all Kappaphycus and Eucheuma cultivated strains were identified by internal transcribed spacer (ITS) sequences. The phylogenetic tree was constructed using neighbor-joining and maximum likelihood methods. The results indicate that different ITS sequence lengths occurred in the different genera and species. An obvious difference in morphology could be found in the protuberance shape between Kappaphycus and Eucheuma. The protuberance in Eucheuma was thorn-like and in Kappaphycus was wartlike or papillate. Their ITS sequence lengths differed significantly in nucleotide variation rates up to 58.55%-63.90%. All nucleotide variations occurred in the ITS1 and ITS2 regions except for five nucleotide transversions in the 5.8S rDNA region. In addition, the difference was at the branches among congeneric species. Kappaphycus sp. had branches with small buds, while K. alvarezii did not have such a feature. The nucleotide variation rates varied from 7.02% to 7.48% among species; within the same species of the clades it was <1.20%. Eucheumatoideae algae cultivated in China consisted of three clades, K. alvarezii, Kappaphycus sp., and E. denticulatum. The results indicate that ITS sequence analysis was an effective way for identification of interspecies and intraspecies phylogenetic relationships and might provide a clue for molecular identification of algal Eucheumatoideae.

  2. Reflecting on Earlier Experiences with Unsolicited Findings: Points to Consider for Next-Generation Sequencing and Informed Consent in Diagnostics

    PubMed Central

    Rigter, Tessel; Henneman, Lidewij; Kristoffersson, Ulf; Hall, Alison; Yntema, Helger G; Borry, Pascal; Tönnies, Holger; Waisfisz, Quinten; Elting, Mariet W; Dondorp, Wybo J; Cornel, Martina C

    2013-01-01

    High-throughput nucleotide sequencing (often referred to as next-generation sequencing; NGS) is increasingly being chosen as a diagnostic tool for cases of expected but unresolved genetic origin. When exploring a higher number of genetic variants, there is a higher chance of detecting unsolicited findings. The consequential increased need for decisions on disclosure of these unsolicited findings poses a challenge for the informed consent procedure. This article discusses the ethical and practical dilemmas encountered when contemplating informed consent for NGS in diagnostics from a multidisciplinary point of view. By exploring recent similar experiences with unsolicited findings in other settings, an attempt is made to describe what can be learned so far for implementing NGS in standard genetic diagnostics. The article concludes with a set of points to consider in order to guide decision-making on the extent of return of results in relation to the mode of informed consent. We hereby aim to provide a sound basis for developing guidelines for optimizing the informed consent procedure. PMID:23784691

  3. Spontaneous Mutation Rate in the Smallest Photosynthetic Eukaryotes

    PubMed Central

    Krasovec, Marc; Eyre-Walker, Adam; Sanchez-Ferandin, Sophie

    2017-01-01

    Abstract Mutation is the ultimate source of genetic variation, and knowledge of mutation rates is fundamental for our understanding of all evolutionary processes. High throughput sequencing of mutation accumulation lines has provided genome wide spontaneous mutation rates in a dozen model species, but estimates from nonmodel organisms from much of the diversity of life are very limited. Here, we report mutation rates in four haploid marine bacterial-sized photosynthetic eukaryotic algae; Bathycoccus prasinos, Ostreococcus tauri, Ostreococcus mediterraneus, and Micromonas pusilla. The spontaneous mutation rate between species varies from μ = 4.4 × 10−10 to 9.8 × 10−10 mutations per nucleotide per generation. Within genomes, there is a two-fold increase of the mutation rate in intergenic regions, consistent with an optimization of mismatch and transcription-coupled DNA repair in coding sequences. Additionally, we show that deviation from the equilibrium GC content increases the mutation rate by ∼2% to ∼12% because of a GC bias in coding sequences. More generally, the difference between the observed and equilibrium GC content of genomes explains some of the inter-specific variation in mutation rates. PMID:28379581

  4. Zn-metalloprotease sequences in extremophiles

    NASA Astrophysics Data System (ADS)

    Holden, T.; Dehipawala, S.; Golebiewska, U.; Cheung, E.; Tremberger, G., Jr.; Williams, E.; Schneider, P.; Gadura, N.; Lieberman, D.; Cheung, T.

    2010-09-01

    The Zn-metalloprotease family contains conserved amino acid structures such that the nucleotide fluctuation at the DNA level would exhibit correlated randomness as described by fractal dimension. A nucleotide sequence fractal dimension can be calculated from a numerical series consisting of the atomic numbers of each nucleotide. The structure's vibration modes can also be studied using a Gaussian Network Model. The vibration measure and fractal dimension values form a two-dimensional plot with a standard vector metric that can be used for comparison of structures. The preference for amino acid usage in extremophiles may suppress nucleotide fluctuations that could be analyzed in terms of fractal dimension and Shannon entropy. A protein level cold adaptation study of the thermolysin Zn-metalloprotease family using molecular dynamics simulation was reported recently and our results show that the associated nucleotide fluctuation suppression is consistent with a regression pattern generated from the sequences's fractal dimension and entropy values (R-square { 0.98, N =5). It was observed that cold adaptation selected for high entropy and low fractal dimension values. Extension to the Archaemetzincin M54 family in extremophiles reveals a similar regression pattern (R-square = 0.98, N = 6). It was observed that the metalloprotease sequences of extremely halophilic organisms possess high fractal dimension and low entropy values as compared with non-halophiles. The zinc atom is usually bonded to the histidine residue, which shows limited levels of vibration in the Gaussian Network Model. The variability of the fractal dimension and entropy for a given protein structure suggests that extremophiles would have evolved after mesophiles, consistent with the bias usage of non-prebiotic amino acids by extremophiles. It may be argued that extremophiles have the capacity to offer extinction protection during drastic changes in astrobiological environments.

  5. Identification of Forensically Important Calliphoridae and Sarcophagidae Species Collected in Korea Using SNaPshot Multiplex System Targeting the Cytochrome c Oxidase Subunit I Gene

    PubMed Central

    Park, Ji Hye

    2018-01-01

    Estimation of postmortem interval (PMI) is paramount in modern forensic investigation. After the disappearance of the early postmortem phenomena conventionally used to estimate PMI, entomologic evidence provides important indicators for PMI estimation. The age of the oldest fly larvae or pupae can be estimated to pinpoint the time of oviposition, which is considered the minimum PMI (PMImin). The development rate of insects is usually temperature dependent and species specific. Therefore, species identification is mandatory for PMImin estimation using entomological evidence. The classical morphological identification method cannot be applied when specimens are damaged or have not yet matured. To overcome this limitation, some investigators employ molecular identification using mitochondrial cytochrome c oxidase subunit I (COI) nucleotide sequences. The molecular identification method commonly uses Sanger's nucleotide sequencing and molecular phylogeny, which are complex and time consuming and constitute another obstacle for forensic investigators. In this study, instead of using conventional Sanger's nucleotide sequencing, single-nucleotide polymorphisms (SNPs) in the COI gene region, which are unique between fly species, were selected and targeted for single-base extension (SBE) technology. These SNPs were genotyped using a SNaPshot® kit. Eleven Calliphoridae and seven Sarcophagidae species were covered. To validate this genotyping, fly DNA samples (103 adults, 84 larvae, and 4 pupae) previously confirmed by DNA barcoding were used. This method worked quickly with minimal DNA, providing a potential alternative to conventional DNA barcoding. Consisting of only a few simple electropherogram peaks, the results were more straightforward compared with those of the conventional DNA barcoding produced by Sanger's nucleotide sequencing. PMID:29682531

  6. The complete genome sequence of a virus associated with cotton blue disease, cotton leafroll dwarf virus, confirms that it is a new member of the genus Polerovirus.

    PubMed

    Distéfano, Ana J; Bonacic Kresic, Ivan; Hopp, H Esteban

    2010-11-01

    Cotton blue disease is the most important virus disease of cotton in the southern part of America. The complete nucleotide sequence of the ssRNA genome of the cotton blue disease-associated virus was determined for the first time. It comprised 5,866 nucleotides, and the deduced genomic organization resembled that of members of the genus Polerovirus. Sequence homology comparison and phylogenetic analysis confirm that this virus (previous proposed name cotton leafroll dwarf virus) is a member of a new species within the genus Polerovirus.

  7. Sequence specificity of the hammerhead ribozyme revisited; the NHH rule.

    PubMed Central

    Kore, A R; Vaish, N K; Kutzke, U; Eckstein, F

    1998-01-01

    The sequence specificity of hammerhead ribozyme cleavage has been re-evaluated with respect to the NUH rule. Contrary to previous reports it was found that substrates with GAC triplets were also cleaved. This was established in three different sequence contexts. The rate of cleavage under single turnover conditions was between 3 and 7% that of cleavage 3' of GUC. Specificity of cleavage of substrates containing a central A in the cleavable triplet can be described as NAH, where N can be any nucleotide and H any nucleotide but G. As cleavage 3' of NCH triplets has recently been described, the NUH rule can be reformulated to NHH. PMID:9722629

  8. Adenovirus sequences required for replication in vivo.

    PubMed Central

    Wang, K; Pearson, G D

    1985-01-01

    We have studied the in vivo replication properties of plasmids carrying deletion mutations within cloned adenovirus terminal sequences. Deletion mapping located the adenovirus DNA replication origin entirely within the first 67 bp of the adenovirus inverted terminal repeat. This region could be further subdivided into two functional domains: a minimal replication origin and an adjacent auxillary region which boosted the efficiency of replication by more than 100-fold. The minimal origin occupies the first 18 to 21 bp and includes sequences conserved between all adenovirus serotypes. The adjacent auxillary region extends past nucleotide 36 but not past nucleotide 67 and contains the binding site for nuclear factor I. Images PMID:2991857

  9. Evolutionary conservation of regulatory elements in vertebrate HOX gene clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Santini, Simona; Boore, Jeffrey L.; Meyer, Axel

    2003-12-31

    Due to their high degree of conservation, comparisons of DNA sequences among evolutionarily distantly-related genomes permit to identify functional regions in noncoding DNA. Hox genes are optimal candidate sequences for comparative genome analyses, because they are extremely conserved in vertebrates and occur in clusters. We aligned (Pipmaker) the nucleotide sequences of HoxA clusters of tilapia, pufferfish, striped bass, zebrafish, horn shark, human and mouse (over 500 million years of evolutionary distance). We identified several highly conserved intergenic sequences, likely to be important in gene regulation. Only a few of these putative regulatory elements have been previously described as being involvedmore » in the regulation of Hox genes, while several others are new elements that might have regulatory functions. The majority of these newly identified putative regulatory elements contain short fragments that are almost completely conserved and are identical to known binding sites for regulatory proteins (Transfac). The conserved intergenic regions located between the most rostrally expressed genes in the developing embryo are longer and better retained through evolution. We document that presumed regulatory sequences are retained differentially in either A or A clusters resulting from a genome duplication in the fish lineage. This observation supports both the hypothesis that the conserved elements are involved in gene regulation and the Duplication-Deletion-Complementation model.« less

  10. Abundance and Genetic Diversity of Microbial Polygalacturonase and Pectate Lyase in the Sheep Rumen Ecosystem

    PubMed Central

    Wang, Yaru; Luo, Huiying; Huang, Huoqing; Shi, Pengjun; Bai, Yingguo; Yang, Peilong; Yao, Bin

    2012-01-01

    Background Efficient degradation of pectin in the rumen is necessary for plant-based feed utilization. The objective of this study was to characterize the diversity, abundance, and functions of pectinases from microorganisms in the sheep rumen. Methodology/Principal Findings A total of 103 unique fragments of polygalacturonase (PF00295) and pectate lyase (PF00544 and PF09492) genes were retrieved from microbial DNA in the rumen of a Small Tail Han sheep, and 66% of the sequences of these fragments had low identities (<65%) with known sequences. Phylogenetic tree building separated the PF00295, PF00544, and PF09492 sequences into five, three, and three clades, respectively. Cellulolytic and noncellulolytic Butyrivibrio, Prevotella, and Fibrobacter species were the major sources of the pectinases. The two most abundant pectate lyase genes were cloned, and their protein products, expressed in Escherichia coli, were characterized. Both enzymes probably act extracellularly as their nucleotide sequences contained signal sequences, and they had optimal activities at the ruminal physiological temperature and complementary pH-dependent activity profiles. Conclusion/Significance This study reveals the specificity, diversity, and abundance of pectinases in the rumen ecosystem and provides two additional ruminal pectinases for potential industrial use under physiological conditions. PMID:22815874

  11. Isolation of laccase gene-specific sequences from white rot and brown rot fungi by PCR

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    D`Souza, T.M.; Boominathan, K.; Reddy, C.A.

    1996-10-01

    Degenerate primers corresponding to the consensus sequences of the copper-binding regions in the N-terminal domains of known basidiomycete laccases were used to isolate laccase gene-specific sequences from strains representing nine genera of wood rot fungi. All except three gave the expected PCR product of about 200 bp. Computer searches of the databases identified the sequences of each of the PCR product of about 200 bp. Computer searches of the databases identified the sequence of each of the PCR products analyzed as a laccase gene sequence, suggesting the specificity of the primers. PCR products of the white rot fungi Ganoderma lucidum,more » Phlebia brevispora, and Trametes versicolor showed 65 to 74% nucleotide sequence similarity to each other; the similarity in deduced amino acid sequences was 83 to 91%. The PCR products of Lentinula edodes and Lentinus tigrinus, on the other hand, showed relatively low nucleotide and amino acid similarities (58 to 64 and 62 to 81%, respectively); however, these similarities were still much higher than when compared with the corresponding regions in the laccases of the ascomycete fungi Aspergillus nidulans and Neurospora crassa. A few of the white rot fungi, as well as Gloeophyllum trabeum, a brown rot fungus, gave a 144-bp PCR fragment which had a nucleotide sequence similarity of 60 to 71%. Demonstration of laccase activity in G. trabeum and several other brown rot fungi was of particular interest because these organisms were not previously shown to produce laccases. 36 refs., 6 figs., 2 tabs.« less

  12. “Shovel-ready” Sequences as a Stimulus for the Next Generation of Life Scientists

    PubMed Central

    Boyle, Michael D.

    2010-01-01

    Genomics and bioinformatics are dynamic fields well-suited for capturing the imagination of undergraduates in both research laboratories and classrooms. Currently, raw nucleotide sequence is being provided, as part of several genomics research initiatives, for undergraduate research and teaching. These initiatives could be easily extended and much more effective if the source of the sequenced material and the subsequent focus of the data analysis were aligned with the research interests of individual faculty at undergraduate institutions. By judicious use of surplus capacity in existing nucleotide sequencing cores, raw sequence data could be generated to support ongoing research efforts involving undergraduates. This would allow these students to participate actively in discovery research, with a goal of making novel contributions to their field through original research while nurturing the next generation of talented research scientists. PMID:23653696

  13. "Shovel-ready" Sequences as a Stimulus for the Next Generation of Life Scientists.

    PubMed

    Boyle, Michael D

    2010-01-01

    Genomics and bioinformatics are dynamic fields well-suited for capturing the imagination of undergraduates in both research laboratories and classrooms. Currently, raw nucleotide sequence is being provided, as part of several genomics research initiatives, for undergraduate research and teaching. These initiatives could be easily extended and much more effective if the source of the sequenced material and the subsequent focus of the data analysis were aligned with the research interests of individual faculty at undergraduate institutions. By judicious use of surplus capacity in existing nucleotide sequencing cores, raw sequence data could be generated to support ongoing research efforts involving undergraduates. This would allow these students to participate actively in discovery research, with a goal of making novel contributions to their field through original research while nurturing the next generation of talented research scientists.

  14. Molecular characterization of a novel Nucleorhabdovirus from black currant identified by high-throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    Contigs with sequence similarities to several nucleorhabdoviruses were identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genomic sequence of this new nucleorhabdovirus is 14,432 nucleotides. Its genomic organization is typical of nucleorh...

  15. SNP discovery through de novo deep sequencing using the next generation of DNA sequencers

    USDA-ARS?s Scientific Manuscript database

    The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....

  16. Inhibition of Escherichia coli viability by external guide sequences complementary to two essential genes

    PubMed Central

    McKinney, Jeffrey; Guerrier-Takada, Cecilia; Wesolowski, Donna; Altman, Sidney

    2001-01-01

    Narrow spectrum antimicrobial activity has been designed to reduce the expression of two essential genes, one coding for the protein subunit of RNase P (C5 protein) and one for gyrase (gyrase A). In both cases, external guide sequences (EGS) have been designed to complex with either mRNA. Using the EGS technology, the level of microbial viability is reduced to less than 10% of the wild-type strain. The EGSs are additive when used together and depend on the number of nucleotides paired when attacking gyrase A mRNA. In the case of gyrase A, three nucleotides unpaired out of a 15-mer EGS still favor complete inhibition by the EGS but five unpaired nucleotides do not. PMID:11381134

  17. LISTA, LISTA-HOP and LISTA-HON: a comprehensive compilation of protein encoding sequences and its associated homology databases from the yeast Saccharomyces.

    PubMed Central

    Dölz, R; Mossé, M O; Slonimski, P P; Bairoch, A; Linder, P

    1994-01-01

    We continued our effort to make a comprehensive database (LISTA) for the yeast Saccharomyces cerevisiae. In this database each sequence has been attributed a single genetic name. In the case of duplicated sequences a simple method has been applied to distinguish between sequences of one and the same gene from non-allelic sequences of duplicated genes. If necessary, synonyms are given in the case of allelic duplicated sequences. Thus sequences can be found either by the name or by synonyms given in LISTA. Each entry contains the genetic name, the mnemonic from the EMBL data bank, the codon bias, reference of the publication of the sequence, Chromosomal location as far as known, Swissprot and EMBL accession numbers. To obtain more information on the included sequences, each entry has been screened against non-redundant nucleotide and protein data bank collections resulting in LISTA-HON and LISTA-HOP. The LISTA data base can be linked to the associated data sets or to nucleotide and protein banks by the Sequence Retrieval System (SRS). PMID:7937046

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gelb, Bruce D; Tartaglia, Marco; Pennacchio, Len

    Diagnostic and therapeutic applications for Noonan Syndrome are described. The diagnostic and therapeutic applications are based on certain mutations in a RAS-specific guanine nucleotide exchange factor gene SOS1 or its expression product. The diagnostic and therapeutic applications are also based on certain mutations in a serine/threonine protein kinase gene RAF1 or its expression product thereof. Also described are nucleotide sequences, amino acid sequences, probes, and primers related to RAF1 or SOS1, and variants thereof, as well as host cells expressing such variants.

  19. Variants of beta-glucosidase

    DOEpatents

    Fidantsef, Ana; Lamsa, Michael; Gorre-Clancy, Brian

    2015-07-14

    The present invention relates to variants of a parent beta-glucosidase, comprising a substitution at one or more positions corresponding to positions 142, 183, 266, and 703 of amino acids 1 to 842 of SEQ ID NO: 2 or corresponding to positions 142, 183, 266, and 705 of amino acids 1 to 844 of SEQ ID NO: 70, wherein the variant has beta-glucosidase activity. The present invention also relates to nucleotide sequences encoding the variant beta-glucosidases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  20. Human ribosomal protein L37 has motifs predicting serine/threonine phosphorylation and a zinc-finger domain.

    PubMed

    Barnard, G F; Staniunas, R J; Puder, M; Steele, G D; Chen, L B

    1994-08-02

    Ribosomal protein L37 mRNA is overexpressed in colon cancer. The nucleotide sequences of human L37 from several tumor and normal, colon and liver cDNA sources were determined to be identical. L37 mRNA was approximately 375 nucleotides long encoding 97 amino acids with M(r) = 11,070, pI = 12.6, multiple potential serine/threonine phosphorylation sites and a zinc-finger domain. The human sequence is compared to other species.

  1. Molecular detection of kobuviruses in European roe deer (Capreolus capreolus) in Italy.

    PubMed

    Di Martino, Barbara; Di Profio, Federica; Melegari, Irene; Di Felice, Elisabetta; Robetto, Serena; Guidetti, Cristina; Orusa, Riccardo; Martella, Vito; Marsilio, Fulvio

    2015-08-01

    Kobuvirus RNA was found in 6.6 % (13/198) of stool specimens from roe deer (Capreolus capreolus) captured during the regular hunting season. Upon sequence analysis of a fragment of the 3D gene, nine strains displayed the highest nucleotide sequence identity (91.2-97.4 %) to bovine kobuviruses previously detected in either diarrhoeic or asymptomatic calves. Interestingly, four strains were genetically related to the newly discovered caprine kobuviruses (84.2-87.6 % nucleotide identity) identified in black goats in Korea.

  2. Variants of beta-glucosidases

    DOEpatents

    Fidantsef, Ana; Lamsa, Michael; Gorre-Clancy, Brian

    2014-10-07

    The present invention relates to variants of a parent beta-glucosidase, comprising a substitution at one or more positions corresponding to positions 142, 183, 266, and 703 of amino acids 1 to 842 of SEQ ID NO: 2 or corresponding to positions 142, 183, 266, and 705 of amino acids 1 to 844 of SEQ ID NO: 70, wherein the variant has beta-glucosidase activity. The present invention also relates to nucleotide sequences encoding the variant beta-glucosidases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  3. Variants of beta-glucosidase

    DOEpatents

    Fidantsef, Ana [Davis, CA; Lamsa, Michael [Davis, CA; Gorre-Clancy, Brian [Elk Grove, CA

    2009-12-29

    The present invention relates to variants of a parent beta-glucosidase, comprising a substitution at one or more positions corresponding to positions 142, 183, 266, and 703 of amino acids 1 to 842 of SEQ ID NO: 2 or corresponding to positions 142, 183, 266, and 705 of amino acids 1 to 844 of SEQ ID NO: 70, wherein the variant has beta-glucosidase activity. The present invention also relates to nucleotide sequences encoding the variant beta-glucosidases and to nucleic acid constructs, vectors, and host cells comprising the nucleotide sequences.

  4. Weak Negative and Positive Selection and the Drift Load at Splice Sites

    PubMed Central

    Denisov, Stepan V.; Bazykin, Georgii A.; Sutormin, Roman; Favorov, Alexander V.; Mironov, Andrey A.; Gelfand, Mikhail S.; Kondrashov, Alexey S.

    2014-01-01

    Splice sites (SSs) are short sequences that are crucial for proper mRNA splicing in eukaryotic cells, and therefore can be expected to be shaped by strong selection. Nevertheless, in mammals and in other intron-rich organisms, many of the SSs often involve nonconsensus (Nc), rather than consensus (Cn), nucleotides, and beyond the two critical nucleotides, the SSs are not perfectly conserved between species. Here, we compare the SS sequences between primates, and between Drosophila fruit flies, to reveal the pattern of selection acting at SSs. Cn-to-Nc substitutions are less frequent, and Nc-to-Cn substitutions are more frequent, than neutrally expected, indicating, respectively, negative and positive selection. This selection is relatively weak (1 < |4Nes| < 4), and has a similar efficiency in primates and in Drosophila. Within some nucleotide positions, the positive selection in favor of Nc-to-Cn substitutions is weaker than the negative selection maintaining already established Cn nucleotides; this difference is due to site-specific negative selection favoring current Nc nucleotides. In general, however, the strength of negative selection protecting the Cn alleles is similar in magnitude to the strength of positive selection favoring replacement of Nc alleles, as expected under the simple nearly neutral turnover. In summary, although a fraction of the Nc nucleotides within SSs is maintained by selection, the abundance of deleterious nucleotides in this class suggests a substantial genome-wide drift load. PMID:24966225

  5. [Sequencing and analysis of complete genome of rabies viruses isolated from Chinese Ferret-Badger and dog in Zhejiang province].

    PubMed

    Lei, Yong-Liang; Wang, Xiao-Guang; Tao, Xiao-Yan; Li, Hao; Meng, Sheng-Li; Chen, Xiu-Ying; Liu, Fu-Ming; Ye, Bi-Feng; Tang, Qing

    2010-01-01

    Based on sequencing the full-length genomes of four Chinese Ferret-Badger and dog, we analyze the properties of rabies viruses genetic variation in molecular level, get the information about rabies viruses prevalence and variation in Zhejiang, and enrich the genome database of rabies viruses street strains isolated from China. Rabies viruses in suckling mice were isolated, overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses from Chinese Ferret-Badger, dog, sika deer, vole, used vaccine strain were determined. The four full-length genomes were sequenced completely and had the same genetic structure with the length of 11, 923 nts or 11, 925 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions(IGRs), 423 nts-Pseudogene-like sequence (psi), 70 nts-Trailer. The four full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by BLAST and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the four full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so the nucleotide mutations happened in these four genomes were most synonymous mutations. Compared with the reference rabies viruses, the lengths of the five protein coding regions had no change, no recombination, only with a few point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the four genomes were similar to the reference vaccine or street strains. And the four strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessed the distinct district characteristics of China. Therefore, these four rabies viruses are likely to be street viruses already existing in the natural world.

  6. Evidence for a Complex Class of Nonadenylated mRNA in Drosophila

    PubMed Central

    Zimmerman, J. Lynn; Fouts, David L.; Manning, Jerry E.

    1980-01-01

    The amount, by mass, of poly(A+) mRNA present in the polyribosomes of third-instar larvae of Drosophila melanogaster, and the relative contribution of the poly(A+) mRNA to the sequence complexity of total polysomal RNA, has been determined. Selective removal of poly(A+) mRNA from total polysomal RNA by use of either oligo-dT-cellulose, or poly(U)-sepharose affinity chromatography, revealed that only 0.15% of the mass of the polysomal RNA was present as poly(A+) mRNA. The present study shows that this RNA hybridized at saturation with 3.3% of the single-copy DNA in the Drosophila genome. After correction for asymmetric transcription and reactability of the DNA, 7.4% of the single-copy DNA in the Drosophila genome is represented in larval poly(A+) mRNA. This corresponds to 6.73 x 106 nucleotides of mRNA coding sequences, or approximately 5,384 diverse RNA sequences of average size 1,250 nucleotides. However, total polysomal RNA hybridizes at saturation to 10.9% of the single-copy DNA sequences. After correcting this value for asymmetric transcription and tracer DNA reactability, 24% of the single-copy DNA in Drosophila is represented in total polysomal RNA. This corresponds to 2.18 x 107 nucleotides of RNA coding sequences or 17,440 diverse RNA molecules of size 1,250 nucleotides. This value is 3.2 times greater than that observed for poly(A+) mRNA, and indicates that ≃69% of the polysomal RNA sequence complexity is contributed by nonadenylated RNA. Furthermore, if the number of different structural genes represented in total polysomal RNA is ≃1.7 x 104, then the number of genes expressed in third-instar larvae exceeds the number of chromomeres in Drosophila by about a factor of three. This numerology indicates that the number of chromomeres observed in polytene chromosomes does not reflect the number of structural gene sequences in the Drosophila genome. PMID:6777246

  7. Ancient diversity and geographical sub-structuring in African buffalo Theileria parva populations revealed through metagenetic analysis of antigen-encoding loci.

    PubMed

    Hemmink, Johanneke D; Sitt, Tatjana; Pelle, Roger; de Klerk-Lorist, Lin-Mari; Shiels, Brian; Toye, Philip G; Morrison, W Ivan; Weir, William

    2018-03-01

    An infection and treatment protocol involving infection with a mixture of three parasite isolates and simultaneous treatment with oxytetracycline is currently used to vaccinate cattle against Theileria parva. While vaccination results in high levels of protection in some regions, little or no protection is observed in areas where animals are challenged predominantly by parasites of buffalo origin. A previous study involving sequencing of two antigen-encoding genes from a series of parasite isolates indicated that this is associated with greater antigenic diversity in buffalo-derived T. parva. The current study set out to extend these analyses by applying high-throughput sequencing to ex vivo samples from naturally infected buffalo to determine the extent of diversity in a set of antigen-encoding genes. Samples from two populations of buffalo, one in Kenya and the other in South Africa, were examined to investigate the effect of geographical distance on the nature of sequence diversity. The results revealed a number of significant findings. First, there was a variable degree of nucleotide sequence diversity in all gene segments examined, with the percentage of polymorphic nucleotides ranging from 10% to 69%. Second, large numbers of allelic variants of each gene were found in individual animals, indicating multiple infection events. Third, despite the observed diversity in nucleotide sequences, several of the gene products had highly conserved amino acid sequences, and thus represent potential candidates for vaccine development. Fourth, although compelling evidence for population differentiation between the Kenyan and South African T. parva parasites was identified, analysis of molecular variance for each gene revealed that the majority of the underlying nucleotide sequence polymorphism was common to both areas, indicating that much of this aspect of genetic variation in the parasite population arose prior to geographic separation. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  8. Construction of nested genetic core collections to optimize the exploitation of natural diversity in Vitis vinifera L. subsp. sativa

    PubMed Central

    Le Cunff, Loïc; Fournier-Level, Alexandre; Laucou, Valérie; Vezzulli, Silvia; Lacombe, Thierry; Adam-Blondon, Anne-Françoise; Boursiquot, Jean-Michel; This, Patrice

    2008-01-01

    Background The first high quality draft of the grape genome sequence has just been published. This is a critical step in accessing all the genes of this species and increases the chances of exploiting the natural genetic diversity through association genetics. However, our basic knowledge of the extent of allelic variation within the species is still not sufficient. Towards this goal, we constructed nested genetic core collections (G-cores) to capture the simple sequence repeat (SSR) diversity of the grape cultivated compartment (Vitis vinifera L. subsp. sativa) from the world's largest germplasm collection (Domaine de Vassal, INRA Hérault, France), containing 2262 unique genotypes. Results Sub-samples of 12, 24, 48 and 92 varieties of V. vinifera L. were selected based on their genotypes for 20 SSR markers using the M-strategy. They represent respectively 58%, 73%, 83% and 100% of total SSR diversity. The capture of allelic diversity was analyzed by sequencing three genes scattered throughout the genome on 233 individuals: 41 single nucleotide polymorphisms (SNPs) were identified using the G-92 core (one SNP for every 49 nucleotides) while only 25 were observed using a larger sample of 141 individuals selected on the basis of 50 morphological traits, thus demonstrating the reliability of the approach. Conclusion The G-12 and G-24 core-collections displayed respectively 78% and 88% of the SNPs respectively, and are therefore of great interest for SNP discovery studies. Furthermore, the nested genetic core collections satisfactorily reflected the geographic and the genetic diversity of grape, which are also of great interest for the study of gene evolution in this species. PMID:18384667

  9. Adaptive microclimatic structural and expressional dehydrin 1 evolution in wild barley, Hordeum spontaneum, at 'Evolution Canyon', Mount Carmel, Israel.

    PubMed

    Yang, Zujun; Zhang, Tao; Bolshoy, Alexander; Beharav, Alexander; Nevo, Eviatar

    2009-05-01

    'Evolution Canyon' (ECI) at Lower Nahal Oren, Mount Carmel, Israel, is an optimal natural microscale model for unravelling evolution in action highlighting the twin evolutionary processes of adaptation and speciation. A major model organism in ECI is wild barley, Hordeum spontaneum, the progenitor of cultivated barley, which displays dramatic interslope adaptive and speciational divergence on the 'African' dry slope (AS) and the 'European' humid slope (ES), separated on average by 200 m. Here we examined interslope single nucleotide polymorphism (SNP) sequences and the expression diversity of the drought resistant dehydrin 1 gene (Dhn1) between the opposite slopes. We analysed 47 plants (genotypes), 4-10 individuals in each of seven stations (populations) in an area of 7000 m(2), for Dhn1 sequence diversity located in the 5' upstream flanking region of the gene. We found significant levels of Dhn1 genic diversity represented by 29 haplotypes, derived from 45 SNPs in a total of 708 bp sites. Most of the haplotypes, 25 out of 29 (= 86.2%), were represented by one genotype; hence, unique to one population. Only a single haplotype was common to both slopes. Genetic divergence of sequence and haplotype diversity was generally and significantly different among the populations and slopes. Nucleotide diversity was higher on the AS, whereas haplotype diversity was higher on the ES. Interslope divergence was significantly higher than intraslope divergence. The applied Tajima D rejected neutrality of the SNP diversity. The Dhn1 expression under dehydration indicated interslope divergent expression between AS and ES genotypes, reinforcing Dhn1 associated with drought resistance of wild barley at 'Evolution Canyon'. These results are inexplicable by mutation, gene flow, or chance effects, and support adaptive natural microclimatic selection as the major evolutionary divergent driving force.

  10. Identification of a common single nucleotide polymorphism at the primer binding site of D2S1360 that causes heterozygote peak imbalance when using the Investigator HDplex Kit.

    PubMed

    Inokuchi, Shota; Yamashita, Yasuhiro; Nishimura, Kazuma; Nakanishi, Hiroaki; Saito, Kazuyuki

    2017-11-01

    Phenomena known as null alleles and peak imbalance can occur because of mutations in the primer binding sites used for DNA typing. In these cases, an accurate statistical evaluation of DNA typing is difficult. The estimated likelihood ratio is incorrectly calculated because of the null allele and allele dropout caused by mutation-induced peak imbalance. Although a number of studies have attempted to uncover examples of these phenomena, few reports are available on the human identification kit manufactured by Qiagen. In this study, 196 Japanese individuals who were heterozygous at D2S1360 were genotyped using an Investigator HDplex Kit with optimal amounts of DNA. A peak imbalance was frequently observed at the D2S1360 locus. We performed a sequencing analysis of the area surrounding the D2S1360 repeat motif to identify the cause for peak imbalance. A point mutation (G>A transition) 136 nucleotides upstream from the D2S1360 repeat motif was discovered in a number of samples. The allele frequency of the mutation was 0.0566 in the Japanese population. Therefore, human identification or kinship testing using the Investigator HDplex Kit requires caution because of the higher frequency of single nucleotide polymorphisms at the primer binding site of D2S1360 locus in the Japanese population.

  11. Systematic and stochastic influences on the performance of the MinION nanopore sequencer across a range of nucleotide bias

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.

    Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less

  12. Systematic and stochastic influences on the performance of the MinION nanopore sequencer across a range of nucleotide bias

    DOE PAGES

    Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.; ...

    2018-02-16

    Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less

  13. Protein structure and the sequential structure of mRNA: alpha-helix and beta-sheet signals at the nucleotide level.

    PubMed

    Brunak, S; Engelbrecht, J

    1996-06-01

    A direct comparison of experimentally determined protein structures and their corresponding protein coding mRNA sequences has been performed. We examine whether real world data support the hypothesis that clusters of rare codons correlate with the location of structural units in the resulting protein. The degeneracy of the genetic code allows for a biased selection of codons which may control the translational rate of the ribosome, and may thus in vivo have a catalyzing effect on the folding of the polypeptide chain. A complete search for GenBank nucleotide sequences coding for structural entries in the Brookhaven Protein Data Bank produced 719 protein chains with matching mRNA sequence, amino acid sequence, and secondary structure assignment. By neural network analysis, we found strong signals in mRNA sequence regions surrounding helices and sheets. These signals do not originate from the clustering of rare codons, but from the similarity of codons coding for very abundant amino acid residues at the N- and C-termini of helices and sheets. No correlation between the positioning of rare codons and the location of structural units was found. The mRNA signals were also compared with conserved nucleotide features of 16S-like ribosomal RNA sequences and related to mechanisms for maintaining the correct reading frame by the ribosome.

  14. Molecular variability analysis of five new complete cacao swollen shoot virus genomic sequences.

    PubMed

    Muller, E; Sackey, S

    2005-01-01

    Cacao swollen shoot virus (CSSV), a member of the family Caulimovi-ridae, genus Badnavirus occurs in all the main cacao-growing areas of West Africa. We amplified, cloned and sequenced complete genomes of five new isolates, two originating from Togo and three originating from Ghana. The genome of these five newly sequenced isolates all contain the five putative open reading frames I, II, III, X and Y described for the first sequenced CSSV isolate, Agou1 originating from Togo. Their genomes have been aligned with the genome of Agou1. The nucleotide and amino acid sequence identities between isolates have been calculated and a phylogenetic analysis has been made including other pararetroviruses. Maximum nucleotide sequence variability between complete genomes of CSSV isolates was 29.4%. Geographical differentiation between isolates appears more important than differentiation between mild and severe isolates. ORF X differs greatly in size and sequence between the Togolese isolates Nyongbo2 and Agou1, and the four other isolates, its functional role is therefore clearly questionable.

  15. Validation of Skeletal Muscle cis-Regulatory Module Predictions Reveals Nucleotide Composition Bias in Functional Enhancers

    PubMed Central

    Kwon, Andrew T.; Chou, Alice Yi; Arenillas, David J.; Wasserman, Wyeth W.

    2011-01-01

    We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs) using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions. PMID:22144875

  16. Molecular characterization of a novel Luteovirus from peach identified by high-throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    Contigs with sequence homologies to Cherry-associated luteovirus were identified by high-throughput sequencing analysis of two peach accessions undergoing quarantine testing. The complete genomic sequences of the two isolates of this virus are 5,819 and 5,814 nucleotides. Their genome organization i...

  17. Aspergillus and Penicillium identification using DNA sequences: Barcode or MLST?

    USDA-ARS?s Scientific Manuscript database

    Current methods in DNA technology can detect single nucleotide polymorphisms with measurable accuracy using several different approaches appropriate for different uses. If there are even single nucleotide differences that are invariant markers of the species, we can accomplish identification through...

  18. Identification and characterization of Theileria ovis surface protein (ToSp) resembled TaSp in Theileria annulata.

    PubMed

    Shayan, P; Jafari, S; Fattahi, R; Ebrahimzade, E; Amininia, N; Changizi, E

    2016-05-01

    Ovine theileriosis is an important hemoprotozoal disease of sheep and goats in tropical and subtropical regions which caused high economic loses in the livestock industry. Theileria annulata surface protein (TaSp) was used previously as a tool for serological analysis in livestock. Since the amino acid sequences of TaSp is, at least, in part very conserved in T. annulata, Theileria lestoquardi and Theileria china I and II, it is very important to determine the amino acid sequence of this protein in Theileria ovis as well, to avoid false interpretation of serological data based on this protein in small animal. In the present study, the nucleotide sequence and amino acid sequence of T. ovis surface protein (ToSp) were determined. The comparison of the nucleotide sequence of ToSp showed 96, 96, 99, and 86 % homology to the corresponding nucleotide sequence of TaSp genes by T. annulata, T. China I, T. China II and T. lestoquardi, previously registered in GenBank under accession nos. AJ316260.1, AY274329.1, DQ120058.1, and EF092924.1 respectively. The amino acid sequence analysis showed 95, 81, 98 and 70 % homology to the corresponding amino acid sequence of T. annulata, T chinaI, T china II and T. lestoquardi, registered in GenBank under accession nos. CAC87478.1, AAP36993.1, AAZ30365.1 and AAP36999.11, respectively. Interestingly, in contrast to the C terminus, a significant difference in amino acid sequence in the N teminus of the ToSp protein could be determined compared to the other known corresponding TaSp sequences, which make this region attractive for designing of a suitable tool for serological diagnosis.

  19. Repeated sequence sets in mitochondrial DNA molecules of root knot nematodes (Meloidogyne): nucleotide sequences, genome location and potential for host-race identification.

    PubMed Central

    Okimoto, R; Chamberlin, H M; Macfarlane, J L; Wolstenholme, D R

    1991-01-01

    Within a 7 kb segment of the mtDNA molecule of the root knot nematode, Meloidogyne javanica, that lacks standard mitochondrial genes, are three sets of strictly tandemly arranged, direct repeat sequences: approximately 36 copies of a 102 ntp sequence that contains a TaqI site; 11 copies of a 63 ntp sequence, and 5 copies of an 8 ntp sequence. The 7 kb repeat-containing segment is bounded by putative tRNAasp and tRNAf-met genes and the arrangement of sequences within this segment is: the tRNAasp gene; a unique 1,528 ntp segment that contains two highly stable hairpin-forming sequences; the 102 ntp repeat set; the 8 ntp repeat set; a unique 1,068 ntp segment; the 63 ntp repeat set; and the tRNAf-met gene. The nucleotide sequences of the 102 ntp copies and the 63 ntp copies have been conserved among the species examined. Data from Southern hybridization experiments indicate that 102 ntp and 63 ntp repeats occur in the mtDNAs of three, two and two races of M.incognita, M.hapla and M.arenaria, respectively. Nucleotide sequences of the M.incognita Race-3 102 ntp repeat were found to be either identical or highly similar to those of the M.javanica 102 ntp repeat. Differences in migration distance and number of 102 ntp repeat-containing bands seen in Southern hybridization autoradiographs of restriction-digested mtDNAs of M.javanica and the different host races of M.incognita, M.hapla and M.arenaria are sufficient to distinguish the different host races of each species. Images PMID:2027769

  20. Completion of full length genome sequence of novel avian paramyxovirus strain APMV/Shimane67 isolated from migratory wild geese in Japan.

    PubMed

    Yamamoto, Eiji; Ito, Toshihiro; Ito, Hiroshi

    2016-11-01

    The nucleotide sequences of nucleocapsid protein (N); phosphoprotein (P); matrix protein (M); hemagglutinin-neuraminidase (HN); and large polymerase protein (L) genes, 3'-end leader, 5'-end trailer and intergenic regions of the avian paramyxovirus (APMV) strain goose/Shimane/67/2000 (APMV/Shimane67) were determined. Together with previously reported data on fusion protein (F) gene sequence [46], the determination of the genome sequence of APMV/Shimane67 has been completed in this study. The genome of APMV/Shimane67 comprised 16,146 nucleotides in length and contains six genes in the order of 3'-N-P-M-F-HN-L-5'. The features of the APMV/Shimane67 genome (e.g., nucleotide length of whole genome and each of the six genes, and predicted amino acid length of each of the six genes) were distinct from those of other APMV serotypes. Phylogenetic analysis indicated that although APMV/Shimane67 was grouped with APMV-1, -9 and -12, the evolutionary distance between APMV/Shimane67 and these viruses was longer than that observed between intra-serotype viruses. These results show that the genome sequence of APMV/Shimane67 contains specific characteristics and is distinguishable from other types of APMV.

Top