Science.gov

Sample records for aligned nucleotide sequences

  1. Nucleotide sequence alignment using sparse coding and belief propagation.

    PubMed

    Roozgard, Aminmohammad; Barzigar, Nafise; Wang, Shuang; Jiang, Xiaoqian; Ohno-Machado, Lucila; Cheng, Samuel

    2013-01-01

    Advances in DNA information extraction techniques have led to huge sequenced genomes from organisms spanning the tree of life. This increasing amount of genomic information requires tools for comparison of the nucleotide sequences. In this paper, we propose a novel nucleotide sequence alignment method based on sparse coding and belief propagation to compare the similarity of the nucleotide sequences. We used the neighbors of each nucleotide as features, and then we employed sparse coding to find a set of candidate nucleotides. To select optimum matches, belief propagation was subsequently applied to these candidate nucleotides. Experimental results show that the proposed approach is able to robustly align nucleotide sequences and is competitive to SOAPaligner [1] and BWA [2].

  2. [Tabular excel editor for analysis of aligned nucleotide sequences].

    PubMed

    Demkin, V V

    2010-01-01

    Excel platform was used for transition of results of multiple aligned nucleotide sequences obtained using the BLAST network service to the form appropriate for visual analysis and editing. Two macros operators for MS Excel 2007 were constructed. The array of aligned sequences transformed into Excel table and processed using macros operators is more appropriate for analysis than initial html data.

  3. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations.

    PubMed

    Abascal, Federico; Zardoya, Rafael; Telford, Maximilian J

    2010-07-01

    We present TranslatorX, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. Many comparisons between biological sequences (nucleic acids and proteins) involve the construction of multiple alignments. Alignments represent a statement regarding the homology between individual nucleotides or amino acids within homologous genes. As protein-coding DNA sequences evolve as triplets of nucleotides (codons) and it is known that sequence similarity degrades more rapidly at the DNA than at the amino acid level, alignments are generally more accurate when based on amino acids than on their corresponding nucleotides. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. The TranslatorX server is freely available at http://translatorx.co.uk.

  4. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

    PubMed

    Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong; Warnow, Tandy

    2015-05-01

    We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.

  5. Nucleotide sequence alignment of hdcA from Gram-positive bacteria

    PubMed Central

    Diaz, Maria; Ladero, Victor; Redruello, Begoña; Sanchez-Llana, Esther; del Rio, Beatriz; Fernandez, Maria; Martin, Maria Cruz; Alvarez, Miguel A.

    2016-01-01

    The decarboxylation of histidine -carried out mainly by some gram-positive bacteria- yields the toxic dietary biogenic amine histamine (Ladero et al. 2010 〈10.2174/157340110791233256〉 [1], Linares et al. 2016 〈http://dx.doi.org/10.1016/j.foodchem.2015.11.013〉〉 [2]). The reaction is catalyzed by a pyruvoyl-dependent histidine decarboxylase (Linares et al. 2011 〈10.1080/10408398.2011.582813〉 [3]), which is encoded by the gene hdcA. In order to locate conserved regions in the hdcA gene of Gram-positive bacteria, this article provides a nucleotide sequence alignment of all the hdcA sequences from Gram-positive bacteria present in databases. For further utility and discussion, see 〈http://dx.doi.org/ 10.1016/j.foodcont.2015.11.035〉〉 [4]. PMID:26958625

  6. FOGSAA: Fast Optimal Global Sequence Alignment Algorithm

    NASA Astrophysics Data System (ADS)

    Chakraborty, Angana; Bandyopadhyay, Sanghamitra

    2013-04-01

    In this article we propose a Fast Optimal Global Sequence Alignment Algorithm, FOGSAA, which aligns a pair of nucleotide/protein sequences faster than any optimal global alignment method including the widely used Needleman-Wunsch (NW) algorithm. FOGSAA is applicable for all types of sequences, with any scoring scheme, and with or without affine gap penalty. Compared to NW, FOGSAA achieves a time gain of (70-90)% for highly similar nucleotide sequences (> 80% similarity), and (54-70)% for sequences having (30-80)% similarity. For other sequences, it terminates with an approximate score. For protein sequences, the average time gain is between (25-40)%. Compared to three heuristic global alignment methods, the quality of alignment is improved by about 23%-53%. FOGSAA is, in general, suitable for aligning any two sequences defined over a finite alphabet set, where the quality of the global alignment is of supreme importance.

  7. Pairwise Sequence Alignment Library

    SciTech Connect

    Jeff Daily, PNNL

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.

  8. ALIGN_MTX--an optimal pairwise textual sequence alignment program, adapted for using in sequence-structure alignment.

    PubMed

    Vishnepolsky, Boris; Pirtskhalava, Malak

    2009-06-01

    The presented program ALIGN_MTX makes alignment of two textual sequences with an opportunity to use any several characters for the designation of sequence elements and arbitrary user substitution matrices. It can be used not only for the alignment of amino acid and nucleotide sequences but also for sequence-structure alignment used in threading, amino acid sequence alignment, using preliminary known PSSM matrix, and in other cases when alignment of biological or non-biological textual sequences is required. This distinguishes it from the majority of similar alignment programs that make, as a rule, alignment only of amino acid or nucleotide sequences represented as a sequence of single alphabetic characters. ALIGN_MTX is presented as downloadable zip archive at http://www.imbbp.org/software/ALIGN_MTX/ and available for free use. As application of using the program, the results of comparison of different types of substitution matrix for alignment quality in distantly related protein pair sets were presented. Threading matrix SORDIS, based on side-chain orientation in relation to hydrophobic core centers with evolutionary change-based substitution matrix BLOSUM and using multiple sequence alignment information position-specific score matrices (PSSM) were taken for test alignment accuracy. The best performance shows PSSM matrix, but in the reduced set with lower sequence similarity threading matrix SORDIS shows the same performance and it was shown that combined potential with SORDIS and PSSM can improve alignment quality in evolutionary distantly related protein pairs.

  9. Alignment method for spectrograms of DNA sequences.

    PubMed

    Bucur, Anca; van Leeuwen, Jasper; Dimitrova, Nevenka; Mittal, Chetan

    2010-01-01

    DNA spectrograms express the periodicities of each of the four nucleotides A, T, C, and G in one or several genomic sequences to be analyzed. DNA spectral analysis can be applied to systematically investigate DNA patterns, which may correspond to relevant biological features. As opposed to looking at nucleotide sequences, spectrogram analysis may detect structural characteristics in very long sequences that are not identifiable by sequence alignment. Alignment of DNA spectrograms can be used to facilitate analysis of very long sequences or entire genomes at different resolutions. Standard clustering algorithms have been used in spectral analysis to find strong patterns in spectra. However, as they use a global distance metric, these algorithms can only detect strong patterns coexisting in several frequencies. In this paper, we propose a new method and several algorithms for aligning spectra suitable for efficient spectral analysis and allowing for the easy detection of strong patterns in both single frequencies and multiple frequencies.

  10. HIV-1 and HIV-2 LTR nucleotide sequences: assessment of the alignment by N-block presentation, "retroviral signatures" of overrepeated oligonucleotides, and a probable important role of scrambled stepwise duplications/deletions in molecular evolution.

    PubMed

    Laprevotte, I; Pupin, M; Coward, E; Didier, G; Terzian, C; Devauchelle, C; Hénaut, A

    2001-07-01

    Previous analyses of retroviral nucleotide sequences, suggest a so-called "scrambled duplicative stepwise molecular evolution" (many sectors with successive duplications/deletions of short and longer motifs) that could have stemmed from one or several starter tandemly repeated short sequence(s). In the present report, we tested this hypothesis by focusing on the long terminal repeats (LTRs) (and flanking sequences) of 24 human and 3 simian immunodeficiency viruses. By using a calculation strategy applicable to short sequences, we found consensus overrepresented motifs (often containing CTG or CAG) that were congruent with the previously defined "retroviral signature." We also show many local repetition patterns that are significant when compared with simply shuffled sequences. First- and second-order Markov chain analyses demonstrate that a major portion of the overrepresented oligonucleotides can be predicted from the dinucleotide compositions of the sequences, but by no means can biological mechanisms be deduced from these results: some of the listed local repetitions remain significant against dinucleotide-conserving shuffled sequences; together with previous results, this suggests that interspersed and/or local mononucleotide and oligonucleotide repetitions could have biased the dinucleotide compositions of the sequences. We searched for suggestive evolutionary patterns by scrutinizing a reliable multiple alignment of the 27 sequences. A manually constructed alignment based on homology blocks was in good agreement with the polypeptide alignment in the coding sectors and has been exhaustively assessed by using a multiplied alphabet obtained by the promising mathematical strategy called the N-block presentation (taking into account the environment of each nucleotide in a sequence). Sector by sector, we hypothesize many successive duplication/deletion scenarios that fit our previous evolutionary hypotheses. This suggests an important duplication/deletion role for

  11. Automated Identification of Nucleotide Sequences

    NASA Technical Reports Server (NTRS)

    Osman, Shariff; Venkateswaran, Kasthuri; Fox, George; Zhu, Dian-Hui

    2007-01-01

    STITCH is a computer program that processes raw nucleotide-sequence data to automatically remove unwanted vector information, perform reverse-complement comparison, stitch shorter sequences together to make longer ones to which the shorter ones presumably belong, and search against the user s choice of private and Internet-accessible public 16S rRNA databases. ["16S rRNA" denotes a ribosomal ribonucleic acid (rRNA) sequence that is common to all organisms.] In STITCH, a template 16S rRNA sequence is used to position forward and reverse reads. STITCH then automatically searches known 16S rRNA sequences in the user s chosen database(s) to find the sequence most similar to (the sequence that lies at the smallest edit distance from) each spliced sequence. The result of processing by STITCH is the identification of the most similar well-described bacterium. Whereas previously commercially available software for analyzing genetic sequences operates on one sequence at a time, STITCH can manipulate multiple sequences simultaneously to perform the aforementioned operations. A typical analysis of several dozen sequences (length of the order of 103 base pairs) by use of STITCH is completed in a few minutes, whereas such an analysis performed by use of prior software takes hours or days.

  12. Nucleotide sequences encoding a thermostable alkaline protease

    DOEpatents

    Wilson, David B.; Lao, Guifang

    1998-01-01

    Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium.

  13. Nucleotide sequences encoding a thermostable alkaline protease

    DOEpatents

    Wilson, D.B.; Lao, G.

    1998-01-06

    Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium. 3 figs.

  14. Long-range correlations in nucleotide sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E.

    1992-01-01

    DNA sequences have been analysed using models, such as an n-step Markov chain, that incorporate the possibility of short-range nucleotide correlations. We propose here a method for studying the stochastic properties of nucleotide sequences by constructing a 1:1 map of the nucleotide sequence onto a walk, which we term a 'DNA walk'. We then use the mapping to provide a quantitative measure of the correlation between nucleotides over long distances along the DNA chain. Thus we uncover in the nucleotide sequence a remarkably long-range power law correlation that implies a new scale-invariant property of DNA. We find such long-range correlations in intron-containing genes and in nontranscribed regulatory DNA sequences, but not in complementary DNA sequences or intron-less genes.

  15. Long-range correlations in nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Peng, C.-K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E.

    1992-03-01

    DNA SEQUENCES have been analysed using models, such as an it-step Markov chain, that incorporate the possibility of short-range nucleotide correlations1. We propose here a method for studying the stochastic properties of nucleotide sequences by constructing a 1:1 map of the nucleotide sequence onto a walk, which we term a 'DNA walk'. We then use the mapping to provide a quantitative measure of the correlation between nucleotides over long distances along the DNA chain. Thus we uncover in the nucleotide sequence a remarkably long-range power law correlation that implies a new scale-invariant property of DNA. We find such long-range correlations in intron-containing genes and in nontranscribed regulatory DNA sequences, but not in complementary DNA sequences or intron-less genes.

  16. Multiple sequence alignment with hierarchical clustering.

    PubMed Central

    Corpet, F

    1988-01-01

    An algorithm is presented for the multiple alignment of sequences, either proteins or nucleic acids, that is both accurate and easy to use on microcomputers. The approach is based on the conventional dynamic-programming method of pairwise alignment. Initially, a hierarchical clustering of the sequences is performed using the matrix of the pairwise alignment scores. The closest sequences are aligned creating groups of aligned sequences. Then close groups are aligned until all sequences are aligned in one group. The pairwise alignments included in the multiple alignment form a new matrix that is used to produce a hierarchical clustering. If it is different from the first one, iteration of the process can be performed. The method is illustrated by an example: a global alignment of 39 sequences of cytochrome c. PMID:2849754

  17. An efficient method for multiple sequence alignment

    SciTech Connect

    Kim, J.; Pramanik, S.

    1994-12-31

    Multiple sequence alignment has been a useful method in the study of molecular evolution and sequence-structure relationships. This paper presents a new method for multiple sequence alignment based on simulated annealing technique. Dynamic programming has been widely used to find an optimal alignment. However, dynamic programming has several limitations to obtain optimal alignment. It requires long computation time and cannot apply certain types of cost functions. We describe detail mechanisms of simulated annealing for multiple sequence alignment problem. It is shown that simulated annealing can be an effective approach to overcome the limitations of dynamic programming in multiple sequence alignment problem.

  18. Nucleotide capacitance calculation for DNA sequencing

    SciTech Connect

    Lu, Jun-Qiang; Zhang, Xiaoguang

    2008-01-01

    Using a first-principles linear response theory, the capacitance of the DNA nucleotides, adenine, cytosine, guanine and thymine, are calculated. The difference in the capacitance between the nucleotides is studied with respect to conformational distortion. The result suggests that although an alternate current capacitance measurement of a single-stranded DNA chain threaded through a nano-gap electrodes may not sufficient to be used as a stand alone method for rapid DNA sequencing, the capacitance of the nucleotides should be taken into consideration in any GHz-frequency electric measurements and may also serve as an additional criterion for identifying the DNA sequence.

  19. R3D Align web server for global nucleotide to nucleotide alignments of RNA 3D structures

    PubMed Central

    Rahrig, Ryan R.; Petrov, Anton I.; Leontis, Neocles B.; Zirbel, Craig L.

    2013-01-01

    The R3D Align web server provides online access to ‘RNA 3D Align’ (R3D Align), a method for producing accurate nucleotide-level structural alignments of RNA 3D structures. The web server provides a streamlined and intuitive interface, input data validation and output that is more extensive and easier to read and interpret than related servers. The R3D Align web server offers a unique Gallery of Featured Alignments, providing immediate access to pre-computed alignments of large RNA 3D structures, including all ribosomal RNAs, as well as guidance on effective use of the server and interpretation of the output. By accessing the non-redundant lists of RNA 3D structures provided by the Bowling Green State University RNA group, R3D Align connects users to structure files in the same equivalence class and the best-modeled representative structure from each group. The R3D Align web server is freely accessible at http://rna.bgsu.edu/r3dalign/. PMID:23716643

  20. Nucleotide sequencing and identification of some wild mushrooms.

    PubMed

    Das, Sudip Kumar; Mandal, Aninda; Datta, Animesh K; Gupta, Sudha; Paul, Rita; Saha, Aditi; Sengupta, Sonali; Dubey, Priyanka Kumari

    2013-01-01

    The rDNA-ITS (Ribosomal DNA Internal Transcribed Spacers) fragment of the genomic DNA of 8 wild edible mushrooms (collected from Eastern Chota Nagpur Plateau of West Bengal, India) was amplified using ITS1 (Internal Transcribed Spacers 1) and ITS2 primers and subjected to nucleotide sequence determination for identification of mushrooms as mentioned. The sequences were aligned using ClustalW software program. The aligned sequences revealed identity (homology percentage from GenBank data base) of Amanita hemibapha [CN (Chota Nagpur) 1, % identity 99 (JX844716.1)], Amanita sp. [CN 2, % identity 98 (JX844763.1)], Astraeus hygrometricus [CN 3, % identity 87 (FJ536664.1)], Termitomyces sp. [CN 4, % identity 90 (JF746992.1)], Termitomyces sp. [CN 5, % identity 99 (GU001667.1)], T. microcarpus [CN 6, % identity 82 (EF421077.1)], Termitomyces sp. [CN 7, % identity 76 (JF746993.1)], and Volvariella volvacea [CN 8, % identity 100 (JN086680.1)]. Although out of 8 mushrooms 4 could be identified up to species level, the nucleotide sequences of the rest may be relevant to further characterization. A phylogenetic tree is constructed using Neighbor-Joining method showing interrelationship between/among the mushrooms. The determined nucleotide sequences of the mushrooms may provide additional information enriching GenBank database aiding to molecular taxonomy and facilitating its domestication and characterization for human benefits.

  1. Nucleotide Sequencing and Identification of Some Wild Mushrooms

    PubMed Central

    Das, Sudip Kumar; Mandal, Aninda; Datta, Animesh K.; Gupta, Sudha; Paul, Rita; Saha, Aditi; Sengupta, Sonali; Dubey, Priyanka Kumari

    2013-01-01

    The rDNA-ITS (Ribosomal DNA Internal Transcribed Spacers) fragment of the genomic DNA of 8 wild edible mushrooms (collected from Eastern Chota Nagpur Plateau of West Bengal, India) was amplified using ITS1 (Internal Transcribed Spacers 1) and ITS2 primers and subjected to nucleotide sequence determination for identification of mushrooms as mentioned. The sequences were aligned using ClustalW software program. The aligned sequences revealed identity (homology percentage from GenBank data base) of Amanita hemibapha [CN (Chota Nagpur) 1, % identity 99 (JX844716.1)], Amanita sp. [CN 2, % identity 98 (JX844763.1)], Astraeus hygrometricus [CN 3, % identity 87 (FJ536664.1)], Termitomyces sp. [CN 4, % identity 90 (JF746992.1)], Termitomyces sp. [CN 5, % identity 99 (GU001667.1)], T. microcarpus [CN 6, % identity 82 (EF421077.1)], Termitomyces sp. [CN 7, % identity 76 (JF746993.1)], and Volvariella volvacea [CN 8, % identity 100 (JN086680.1)]. Although out of 8 mushrooms 4 could be identified up to species level, the nucleotide sequences of the rest may be relevant to further characterization. A phylogenetic tree is constructed using Neighbor-Joining method showing interrelationship between/among the mushrooms. The determined nucleotide sequences of the mushrooms may provide additional information enriching GenBank database aiding to molecular taxonomy and facilitating its domestication and characterization for human benefits. PMID:24489501

  2. Progressive multiple sequence alignments from triplets

    PubMed Central

    Kruspe, Matthias; Stadler, Peter F

    2007-01-01

    Background The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors. Research Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the "once a gap, always a gap" problem of progressive alignment procedures. Conclusion The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mis)match scores. PMID:17631683

  3. Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments

    PubMed Central

    2010-01-01

    Background While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate. Results We compared near-optimal protein sequence alignments produced by the Zuker algorithm and a set of probabilistic alignments produced by the probA program with structural alignments produced by four different structure alignment algorithms. There is significant overlap between the solution spaces of structural alignments and both the near-optimal sequence alignments produced by commonly used scoring parameters for sequences that share significant sequence similarity (E-values < 10-5) and the ensemble of probA alignments. We constructed a logistic regression model incorporating three input variables derived from sets of near-optimal alignments: robustness, edge frequency, and maximum bits-per-position. A ROC analysis shows that this model more accurately classifies amino acid pairs (edges in the alignment path graph) according to the likelihood of appearance in structural alignments than the robustness score alone. We investigated various trimming protocols for removing incorrect edges from the optimal sequence alignment; the most effective protocol is to remove matches from the semi-global optimal alignment that are outside the boundaries of the local alignment, although trimming according to the model-generated probabilities achieves a similar level of improvement. The model can also be used to

  4. The International Nucleotide Sequence Database Collaboration.

    PubMed

    Nakamura, Yasukazu; Cochrane, Guy; Karsch-Mizrachi, Ilene

    2013-01-01

    The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org), one of the longest-standing global alliances of biological data archives, captures, preserves and provides comprehensive public domain nucleotide sequence information. Three partners of the INSDC work in cooperation to establish formats for data and metadata and protocols that facilitate reliable data submission to their databases and support continual data exchange around the world. In this article, the INSDC current status and update for the year of 2012 are presented. Among discussed items of international collaboration meeting in 2012, BioSample database and changes in submission are described as topics.

  5. The complete nucleotide sequence of bean yellow mosaic potyvirus RNA.

    PubMed

    Guyatt, K J; Proll, D F; Menssen, A; Davidson, A D

    1996-01-01

    The complete nucleotide sequence of an Australian strain of bean yellow mosaic virus (BYMV-S) has been determined from cloned viral cDNAs. The BYMV-S genome is 9 547 nucleotides in length excluding a poly(A) tail. Computer analysis of the sequence revealed a single long open reading frame (ORF) of 9168 nucleotides, commencing at position 206 and terminating with UAG at position 9374-6. The ORF potentially encodes a polyprotein of 3056 amino acids with a deduced Mr of 347 409. The 5' and 3' untranslated regions are 205 and 174 nucleotides in length respectively. Alignment of the amino acid sequence of the BYMV-S polyprotein with those of other potyviruses identified nine putative proteolytic cleavage sites. The predicted consensus cleavage site of the BYMV NIa protease was found to differ from that described for other potyviruses. Processing of the BYMV polyprotein at the designated proteolytic cleavage sites would result in a typical potyviral genome arrangement. The amino acid sequences of the putative BYMV encoded proteins were compared to the homologous gene products of twelve individual potyviruses to identify overall and specific regions of amino acid sequence homology.

  6. Alignment of Helical Membrane Protein Sequences Using AlignMe

    PubMed Central

    Khafizov, Kamil; Forrest, Lucy R.

    2013-01-01

    Few sequence alignment methods have been designed specifically for integral membrane proteins, even though these important proteins have distinct evolutionary and structural properties that might affect their alignments. Existing approaches typically consider membrane-related information either by using membrane-specific substitution matrices or by assigning distinct penalties for gap creation in transmembrane and non-transmembrane regions. Here, we ask whether favoring matching of predicted transmembrane segments within a standard dynamic programming algorithm can improve the accuracy of pairwise membrane protein sequence alignments. We tested various strategies using a specifically designed program called AlignMe. An updated set of homologous membrane protein structures, called HOMEP2, was used as a reference for optimizing the gap penalties. The best of the membrane-protein optimized approaches were then tested on an independent reference set of membrane protein sequence alignments from the BAliBASE collection. When secondary structure (S) matching was combined with evolutionary information (using a position-specific substitution matrix (P)), in an approach we called AlignMePS, the resultant pairwise alignments were typically among the most accurate over a broad range of sequence similarities when compared to available methods. Matching transmembrane predictions (T), in addition to evolutionary information, and secondary-structure predictions, in an approach called AlignMePST, generally reduces the accuracy of the alignments of closely-related proteins in the BAliBASE set relative to AlignMePS, but may be useful in cases of extremely distantly related proteins for which sequence information is less informative. The open source AlignMe code is available at https://sourceforge.net/projects/alignme/, and at http://www.forrestlab.org, along with an online server and the HOMEP2 data set. PMID:23469223

  7. The International Nucleotide Sequence Database Collaboration

    PubMed Central

    Cochrane, Guy; Karsch-Mizrachi, Ilene; Takagi, Toshihisa; Sequence Database Collaboration, International Nucleotide

    2016-01-01

    The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org) comprises three global partners committed to capturing, preserving and providing comprehensive public-domain nucleotide sequence information. The INSDC establishes standards, formats and protocols for data and metadata to make it easier for individuals and organisations to submit their nucleotide data reliably to public archives. This work enables the continuous, global exchange of information about living things. Here we present an update of the INSDC in 2015, including data growth and diversification, new standards and requirements by publishers for authors to submit their data to the public archives. The INSDC serves as a model for data sharing in the life sciences. PMID:26657633

  8. Aligning Two Genomic Sequences That Contain Duplications

    NASA Astrophysics Data System (ADS)

    Hou, Minmei; Riemer, Cathy; Berman, Piotr; Hardison, Ross C.; Miller, Webb

    It is difficult to properly align genomic sequences that contain intra-species duplications. With this goal in mind, we have developed a tool, called TOAST (two-way orthologous alignment selection tool), for predicting whether two aligned regions from different species are orthologous, i.e., separated by a speciation event, as opposed to a duplication event. The advantage of restricting alignment to orthologous pairs is that they constitute the aligning regions that are most likely to share the same biological function, and most easily analyzed for evidence of selection. We evaluate TOAST on 12 human/mouse gene clusters.

  9. Estimation of evolutionary distances between nucleotide sequences.

    PubMed

    Zharkikh, A

    1994-09-01

    A formal mathematical analysis of the substitution process in nucleotide sequence evolution was done in terms of the Markov process. By using matrix algebra theory, the theoretical foundation of Barry and Hartigan's (Stat. Sci. 2:191-210, 1987) and Lanave et al.'s (J. Mol. Evol. 20:86-93, 1984) methods was provided. Extensive computer simulation was used to compare the accuracy and effectiveness of various methods for estimating the evolutionary distance between two nucleotide sequences. It was shown that the multiparameter methods of Lanave et al.'s (J. Mol. Evol. 20:86-93, 1984), Gojobori et al.'s (J. Mol. Evol. 18:414-422, 1982), and Barry and Hartigan's (Stat. Sci. 2:191-210, 1987) are preferable to others for the purpose of phylogenetic analysis when the sequences are long. However, when sequences are short and the evolutionary distance is large, Tajima and Nei's (Mol. Biol. Evol. 1:269-285, 1984) method is superior to others.

  10. Should nucleotide sequence analyzing computer algorithms always extend homologies by extending homologies?

    PubMed

    Burnett, L; Basten, A; Hensley, W J

    1986-01-10

    Most computer algorithms used for comparing or aligning nucleotide sequences rely on the premise that the best way to extend a homology between the two sequences is to select a match rather than a mismatch. We have tested this assumption and found that it is not always valid.

  11. Two Hybrid Algorithms for Multiple Sequence Alignment

    NASA Astrophysics Data System (ADS)

    Naznin, Farhana; Sarker, Ruhul; Essam, Daryl

    2010-01-01

    In order to design life saving drugs, such as cancer drugs, the design of Protein or DNA structures has to be accurate. These structures depend on Multiple Sequence Alignment (MSA). MSA is used to find the accurate structure of Protein and DNA sequences from existing approximately correct sequences. To overcome the overly greedy nature of the well known global progressive alignment method for multiple sequence alignment, we have proposed two different algorithms in this paper; one is using an iterative approach with a progressive alignment method (PAMIM) and the second one is using a genetic algorithm with a progressive alignment method (PAMGA). Both of our methods started with a "kmer" distance table to generate single guide-tree. In the iterative approach, we have introduced two new techniques: the first technique is to generate Guide-trees with randomly selected sequences and the second is of shuffling the sequences inside that tree. The output of the tree is a multiple sequence alignment which has been evaluated by the Sum of Pairs Method (SPM) considering the real value data from PAM250. In our second GA approach, these two techniques are used to generate an initial population and also two different approaches of genetic operators are implemented in crossovers and mutation. To test the performance of our two algorithms, we have compared these with the existing well known methods: T-Coffee, MUSCEL, MAFFT and Probcon, using BAliBase benchmarks. The experimental results show that the first algorithm works well for some situations, where other existing methods face difficulties in obtaining better solutions. The proposed second method works well compared to the existing methods for all situations and it shows better performance over the first one.

  12. Robust temporal alignment of multimodal cardiac sequences

    NASA Astrophysics Data System (ADS)

    Perissinotto, Andrea; Queirós, Sandro; Morais, Pedro; Baptista, Maria J.; Monaghan, Mark; Rodrigues, Nuno F.; D'hooge, Jan; Vilaça, João. L.; Barbosa, Daniel

    2015-03-01

    Given the dynamic nature of cardiac function, correct temporal alignment of pre-operative models and intraoperative images is crucial for augmented reality in cardiac image-guided interventions. As such, the current study focuses on the development of an image-based strategy for temporal alignment of multimodal cardiac imaging sequences, such as cine Magnetic Resonance Imaging (MRI) or 3D Ultrasound (US). First, we derive a robust, modality-independent signal from the image sequences, estimated by computing the normalized cross-correlation between each frame in the temporal sequence and the end-diastolic frame. This signal is a resembler for the left-ventricle (LV) volume curve over time, whose variation indicates different temporal landmarks of the cardiac cycle. We then perform the temporal alignment of these surrogate signals derived from MRI and US sequences of the same patient through Dynamic Time Warping (DTW), allowing to synchronize both sequences. The proposed framework was evaluated in 98 patients, which have undergone both 3D+t MRI and US scans. The end-systolic frame could be accurately estimated as the minimum of the image-derived surrogate signal, presenting a relative error of 1.6 +/- 1.9% and 4.0 +/- 4.2% for the MRI and US sequences, respectively, thus supporting its association with key temporal instants of the cardiac cycle. The use of DTW reduces the desynchronization of the cardiac events in MRI and US sequences, allowing to temporally align multimodal cardiac imaging sequences. Overall, a generic, fast and accurate method for temporal synchronization of MRI and US sequences of the same patient was introduced. This approach could be straightforwardly used for the correct temporal alignment of pre-operative MRI information and intra-operative US images.

  13. Identifying subset errors in multiple sequence alignments.

    PubMed

    Roy, Aparna; Taddese, Bruck; Vohra, Shabana; Thimmaraju, Phani K; Illingworth, Christopher J R; Simpson, Lisa M; Mukherjee, Keya; Reynolds, Christopher A; Chintapalli, Sree V

    2014-01-01

    Multiple sequence alignment (MSA) accuracy is important, but there is no widely accepted method of judging the accuracy that different alignment algorithms give. We present a simple approach to detecting two types of error, namely block shifts and the misplacement of residues within a gap. Given a MSA, subsets of very similar sequences are generated through the use of a redundancy filter, typically using a 70-90% sequence identity cut-off. Subsets thus produced are typically small and degenerate, and errors can be easily detected even by manual examination. The errors, albeit minor, are inevitably associated with gaps in the alignment, and so the procedure is particularly relevant to homology modelling of protein loop regions. The usefulness of the approach is illustrated in the context of the universal but little known [K/R]KLH motif that occurs in intracellular loop 1 of G protein coupled receptors (GPCR); other issues relevant to GPCR modelling are also discussed.

  14. Image Correlation Method for DNA Sequence Alignment

    PubMed Central

    Curilem Saldías, Millaray; Villarroel Sassarini, Felipe; Muñoz Poblete, Carlos; Vargas Vásquez, Asticio; Maureira Butler, Iván

    2012-01-01

    The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were “digitally” obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs) and 100 scenes represented by 100 x 100 images each (in total, one million base pair database) were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%), specificity (98.99%) and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment. PMID:22761742

  15. Image correlation method for DNA sequence alignment.

    PubMed

    Curilem Saldías, Millaray; Villarroel Sassarini, Felipe; Muñoz Poblete, Carlos; Vargas Vásquez, Asticio; Maureira Butler, Iván

    2012-01-01

    The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs) and 100 scenes represented by 100 x 100 images each (in total, one million base pair database) were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%), specificity (98.99%) and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment.

  16. Nucleotide Sequence of the Akv env Gene

    PubMed Central

    Lenz, Jack; Crowther, Robert; Straceski, Anthony; Haseltine, William

    1982-01-01

    The sequence of 2,191 nucleotides encoding the env gene of murine retrovirus Akv was determined by using a molecular clone of the Akv provirus. Deduction of the encoded amino acid sequence showed that a single open reading frame encodes a 638-amino acid precursor to gp70 and p15E. In addition, there is a typical leader sequence preceding the amino terminus of gp70. The locations of potential glycosylation sites and other structural features indicate that the entire gp70 molecule and most of p15E are located on the outer side of the membrane. Internal cleavage of the env precursor to generate gp70 and p15E occurs immediately adjacent to several basic amino acids at the carboxyl terminus of gp70. This cleavage generates a region of 42 uncharged, relatively hydrophobic amino acids at the amino terminus of p15E, which is located in a position analogous to the hydrophobic membrane fusion sequence of influenza virus hemagglutinin. The mature polypeptides are predicted to associate with the membrane via a region of 30 uncharged, mostly hydrophobic amino acids located near the carboxyl terminus of p15E. Distal to this membrane association region is a sequence of 35 amino acids at the carboxyl terminus of the env precursor, which is predicted to be located on the inner side of the membrane. By analogy to Moloney murine leukemia virus, a proteolytic cleavage in this region removes the terminal 19 amino acids, thus generating the carboxyl terminus of p15E. This leaves 15 amino acids at the carboxyl terminus of p15E on the inner side of the membrane in a position to interact with virion cores during budding. The precise location and order of the large RNase T1-resistant oligonucleotides in the env region were determined and compared with those from several leukemogenic viruses of AKR origin. This permitted a determination of how the differences in the leukemogenic viruses affect the primary structure of the env gene products. PMID:6283170

  17. Base sequence context effects on nucleotide excision repair.

    PubMed

    Cai, Yuqin; Patel, Dinshaw J; Broyde, Suse; Geacintov, Nicholas E

    2010-08-23

    Nucleotide excision repair (NER) plays a critical role in maintaining the integrity of the genome when damaged by bulky DNA lesions, since inefficient repair can cause mutations and human diseases notably cancer. The structural properties of DNA lesions that determine their relative susceptibilities to NER are therefore of great interest. As a model system, we have investigated the major mutagenic lesion derived from the environmental carcinogen benzo[a]pyrene (B[a]P), 10S (+)-trans-anti-B[a]P-N(2)-dG in six different sequence contexts that differ in how the lesion is positioned in relation to nearby guanine amino groups. We have obtained molecular structural data by NMR and MD simulations, bending properties from gel electrophoresis studies, and NER data obtained from human HeLa cell extracts for our six investigated sequence contexts. This model system suggests that disturbed Watson-Crick base pairing is a better recognition signal than a flexible bend, and that these can act in concert to provide an enhanced signal. Steric hinderance between the minor groove-aligned lesion and nearby guanine amino groups determines the exact nature of the disturbances. Both nearest neighbor and more distant neighbor sequence contexts have an impact. Regardless of the exact distortions, we hypothesize that they provide a local thermodynamic destabilization signal for repair.

  18. Nucleotide sequence of the pyruvate decarboxylase gene from Zymomonas mobilis.

    PubMed

    Neale, A D; Scopes, R K; Wettenhall, R E; Hoogenraad, N J

    1987-02-25

    Pyruvate decarboxylase (EC 4.1.1.1), the penultimate enzyme in the alcoholic fermentation pathway of Zymomonas mobilis, converts pyruvate to acetaldehyde and carbon dioxide. The complete nucleotide sequence of the structural gene encoding pyruvate decarboxylase from Zymomonas mobilis has been determined. The coding region is 1704 nucleotides long and encodes a polypeptide of 567 amino acids with a calculated subunit mass of 60,790 daltons. The amino acid sequence was confirmed by comparison with the amino acid sequence of a selection of tryptic fragments of the enzyme. The amino acid composition obtained from the nucleotide sequence is in good agreement with that obtained experimentally.

  19. Regular Language Constrained Sequence Alignment Revisited

    NASA Astrophysics Data System (ADS)

    Kucherov, Gregory; Pinhas, Tamar; Ziv-Ukelson, Michal

    Imposing constraints in the form of a finite automaton or a regular expression is an effective way to incorporate additional a priori knowledge into sequence alignment procedures. With this motivation, Arslan [1] introduced the Regular Language Constrained Sequence Alignment Problem and proposed an O(n 2 t 4) time and O(n 2 t 2) space algorithm for solving it, where n is the length of the input strings and t is the number of states in the non-deterministic automaton, which is given as input. Chung et al. [2] proposed a faster O(n 2 t 3) time algorithm for the same problem. In this paper, we further speed up the algorithms for Regular Language Constrained Sequence Alignment by reducing their worst case time complexity bound to O(n 2 t 3/logt). This is done by establishing an optimal bound on the size of Straight-Line Programs solving the maxima computation subproblem of the basic dynamic programming algorithm. We also study another solution based on a Steiner Tree computation. While it does not improve the run time complexity in the worst case, our simulations show that both approaches are efficient in practice, especially when the input automata are dense.

  20. Parallel sequence alignment in limited space.

    PubMed

    Grice, J A; Hughey, R; Speck, D

    1995-01-01

    Sequence comparison with affine gap costs is a problem that is readily parallelizable on simple single-instruction, multiple-data stream (SIMD) parallel processors using only constant space per processing element. Unfortunately, the twin problem of sequence alignment, finding the optimal character-by-character correspondence between two sequences, is more complicated. While the innovative O(n2)-time and O(n)-space serial algorithm has been parallelized for multiple-instruction, multiple-data stream (MIMD) computers with only a communication-time slowdown, typically O(log n), it is not suitable for hardware-efficient SIMD parallel processors with only local communication. This paper proposes several methods of computing sequence alignments with limited memory per processing element. The algorithms are also well-suited to serial implementation. The simpler algorithms feature, for an arbitrary integer L, a factor of L slowdown in exchange for reducing space requirements from O(n) to O(L square root of n) per processing element. Using this result, we describe an O(n log n) parallel time algorithm that requires O(log n) space per processing element on O(n) SIMD processing elements with only a mesh or linear interconnection network.

  1. DNA Sequence Alignment during Homologous Recombination.

    PubMed

    Greene, Eric C

    2016-05-27

    Homologous recombination allows for the regulated exchange of genetic information between two different DNA molecules of identical or nearly identical sequence composition, and is a major pathway for the repair of double-stranded DNA breaks. A key facet of homologous recombination is the ability of recombination proteins to perfectly align the damaged DNA with homologous sequence located elsewhere in the genome. This reaction is referred to as the homology search and is akin to the target searches conducted by many different DNA-binding proteins. Here I briefly highlight early investigations into the homology search mechanism, and then describe more recent research. Based on these studies, I summarize a model that includes a combination of intersegmental transfer, short-distance one-dimensional sliding, and length-specific microhomology recognition to efficiently align DNA sequences during the homology search. I also suggest some future directions to help further our understanding of the homology search. Where appropriate, I direct the reader to other recent reviews describing various issues related to homologous recombination.

  2. Sequence Alignment to Predict Across Species Susceptibility ...

    EPA Pesticide Factsheets

    Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target, so it is amenable to variable degrees of protein characterization, depending on available information about the chemical/protein interaction and the molecular target itself. To allow for flexibility in the analysis, a layered strategy was adopted for the tool. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of candidate orthologs), the second level evaluates sequence similarity within selected domains (e.g., ligand-binding domain, DNA binding domain), and the third level of analysis compares individual amino acid residue positions identified as being of importance for protein conformation and/or ligand binding upon chemical perturbation. Each level of the SeqAPASS analysis provides increasing evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further ev

  3. Nucleotide sequence of papaya mosaic virus RNA.

    PubMed

    Sit, T L; Abouhaidar, M G; Holy, S

    1989-09-01

    The RNA genome of papaya mosaic virus is 6656 nucleotides long [excluding the poly(A) tail] with six open reading frames (ORFs) more than 200 nucleotides long. The four nearest the 5' end each overlap with adjacent ORFs and could code for proteins with Mr 176307, 26248, 11949 and 7224 (ORFs 1 to 4). The fifth ORF produces the capsid protein of Mr 23043 and the sixth ORF, located completely within ORF1, could code for a protein with Mr 14113. The translation products of ORFs 1 to 3 show strong similarity with those of other potexviruses but the ORF 4 protein has only limited similarity with the other potexvirus ORF 4 proteins of 7K to 11K.

  4. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

    PubMed

    Martin, Andrew C R

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.

  5. Reading biological processes from nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Murugan, Anand

    Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical

  6. DUC-Curve, a highly compact 2D graphical representation of DNA sequences and its application in sequence alignment

    NASA Astrophysics Data System (ADS)

    Li, Yushuang; Liu, Qian; Zheng, Xiaoqi

    2016-08-01

    A highly compact and simple 2D graphical representation of DNA sequences, named DUC-Curve, is constructed through mapping four nucleotides to a unit circle with a cyclic order. DUC-Curve could directly detect nucleotide, di-nucleotide compositions and microsatellite structure from DNA sequences. Moreover, it also could be used for DNA sequence alignment. Taking geometric center vectors of DUC-Curves as sequence descriptor, we perform similarity analysis on the first exons of β-globin genes of 11 species, oncogene TP53 of 27 species and twenty-four Influenza A viruses, respectively. The obtained reasonable results illustrate that the proposed method is very effective in sequence comparison problems, and will at least play a complementary role in classification and clustering problems.

  7. Nucleotide sequence of SHV-2 beta-lactamase gene

    SciTech Connect

    Garbarg-Chenon, A.; Godard, V.; Labia, R.; Nicolas, J.C. )

    1990-07-01

    The nucleotide sequence of plasmid-mediated beta-lactamase SHV-2 from Salmonella typhimurium (SHV-2pHT1) was determined. The gene was very similar to chromosomally encoded beta-lactamase LEN-1 of Klebsiella pneumoniae. Compared with the sequence of the Escherichia coli SHV-2 enzyme (SHV-2E.coli) obtained by protein sequencing, the deduced amino acid sequence of SHV-2pHT1 differed by three amino acid substitutions.

  8. Nucleotide sequences important for translation initiation of enterovirus RNA.

    PubMed Central

    Iizuka, N; Yonekawa, H; Nomoto, A

    1991-01-01

    An infectious cDNA clone was constructed from the genome of coxsackievirus B1 strain. A number of RNA transcripts that have mutations in the 5' noncoding region were synthesized in vitro from the modified cDNA clones and examined for their abilities to act as mRNAs in a cell-free translation system prepared from HeLa S3 cells. RNAs that lack nucleotide sequences at positions 568 to 726 and 565 to 726 were found to be less efficient and inactive mRNAs, respectively. To understand the biological significance of this region of RNA, small deletions and point mutations were introduced in the nucleotide sequence between positions 538 and 601. Except for a nucleotide substitution at 592 (U----C) within the 7-base conserved sequence, mutations introduced in the sequence downstream of position 568 did not affect much, if any, of the ability of RNA to act as mRNA. Except for a point mutation at 558 (C----U), mutations upstream of position 567 appeared to inactivate the mRNA. In the upstream region, a sequence consisting of 21 nucleotides at positions 546 to 566 is perfectly conserved in the 5' noncoding regions of enterovirus and rhinovirus genomes. These results suggest that the 7-base conserved sequence functions to maintain the efficiency of translation initiation and that the nucleotide sequence upstream of position 567, including the 21-base conserved sequence, plays essential roles in translation initiation. A deletion mutant whose genome lacks the nucleotide sequence at positions 568 to 726 showed a small-plaque phenotype and less virulence against suckling mice than the wild-type virus. Thus, reduction of the efficiency of translation initiation may result in the construction of enteroviruses with the lower-virulence phenotype. Images PMID:1651409

  9. SQUARE--determining reliable regions in sequence alignments.

    PubMed

    Tress, Michael L; Graña, Osvaldo; Valencia, Alfonso

    2004-04-12

    The Server for Quick Alignment Reliability Evaluation (SQUARE) is a Web-based version of the method we developed to predict regions of reliably aligned residues in sequence alignments. Given an alignment between a query sequence and a sequence of known structure, SQUARE is able to predict which residues are reliably aligned. The server accesses a database of profiles of sequences of known three-dimensional structures in order to calculate the scores for each residue in the alignment. SQUARE produces a graphical output of the residue profile-derived alignment scores along with an indication of the reliability of the alignment. In addition, the scores can be compared against template secondary structure, conserved residues and important sites.

  10. Blasting and Zipping: Sequence Alignment and Mutual Information

    NASA Astrophysics Data System (ADS)

    Penner, Orion; Grassberger, Peter; Paczuski, Maya

    2009-03-01

    Alignment of biological sequences such as DNA, RNA or proteins is one of the most widely used tools in computational bioscience. While the accomplishments of sequence alignment algorithms are undeniable the fact remains that these algorithms are based upon heuristic scoring schemes. Therefore, these algorithms do not provide model independent and objective measures for how similar two (or more) sequences actually are. Although information theory provides such a similarity measure - the mutual information (MI) - numerous previous attempts to connect sequence alignment and information have not produced realistic estimates for the MI from a given alignment. We report on a simple and flexible approach to get robust estimates of MI from global alignments. The presented results may help establish MI as a reliable tool for evaluating the quality of global alignments, judging the relative merits of different alignment algorithms, and estimating the significance of specific alignments.

  11. Nucleotide sequence of the coat protein gene of canine parvovirus.

    PubMed Central

    Rhode, S L

    1985-01-01

    The nucleotide sequence of the canine parvovirus (CPV2) from map units 33 to 95 has been determined. This includes the entire coat protein gene and noncoding sequences at the 3' end of the gene, exclusive of the terminal inverted repeat. The predicted capsid protein structures are discussed and compared with those of the rodent parvoviruses H-1 and MVM. PMID:3989914

  12. The Nucleotide Sequence of the lac Operator

    PubMed Central

    Gilbert, Walter; Maxam, Allan

    1973-01-01

    The lac repressor protects the lac operator against digestion with deoxyribonuclease. The protected fragment is double-stranded and about 27 base-pairs long. We determined the sequence of RNA transcription copies of this fragment and present a sequence for 24 base pairs. It is: 5′--T G G A A T T G T G A G C G G A T A A C A A T T 3′ 3′--A C C T T A A C A C T C G C C T A T T G T T A A 5′ The sequence has 2-fold symmetry regions; the two longest are separated by one turn of the DNA double helix. PMID:4587255

  13. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid...

  14. Nucleotide sequence composition and method for detection of neisseria gonorrhoeae

    SciTech Connect

    Lo, A.; Yang, H.L.

    1990-02-13

    This patent describes a composition of matter that is specific for {ital Neisseria gonorrhoeae}. It comprises: at least one nucleotide sequence for which the ratio of the amount of the sequence which hybridizes to chromosomal DNA of {ital Neisseria gonorrhoeae} to the amount of the sequence which hybridizes to chromosomal DNA of {ital Neisseria meningitidis} is greater than about five. The ratio being obtained by a method described.

  15. Cloning and characterization of a highly repetitive fish nucleotide sequence.

    PubMed

    Datta, U; Dutta, P; Mandal, R K

    1988-01-01

    We have cloned and sequenced a highly repetitive HindIII fragment of DNA from the common carp Cyprinus carpio. It represents a tandemly repeated sequence with a monomeric unit of 245 bp and comprises 8% of the fish genome. Higher units of this monomer appear as a ladder in Southern blots. The monomeric unit has been sequenced; it is A + T-rich with some direct and some inverse-repeat nucleotide clusters.

  16. A novel partial sequence alignment tool for finding large deletions.

    PubMed

    Aruk, Taner; Ustek, Duran; Kursun, Olcay

    2012-01-01

    Finding large deletions in genome sequences has become increasingly more useful in bioinformatics, such as in clinical research and diagnosis. Although there are a number of publically available next generation sequencing mapping and sequence alignment programs, these software packages do not correctly align fragments containing deletions larger than one kb. We present a fast alignment software package, BinaryPartialAlign, that can be used by wet lab scientists to find long structural variations in their experiments. For BinaryPartialAlign, we make use of the Smith-Waterman (SW) algorithm with a binary-search-based approach for alignment with large gaps that we called partial alignment. BinaryPartialAlign implementation is compared with other straight-forward applications of SW. Simulation results on mtDNA fragments demonstrate the effectiveness (runtime and accuracy) of the proposed method.

  17. Nucleotide correlations and electronic transport of DNA sequences

    NASA Astrophysics Data System (ADS)

    Albuquerque, E. L.; Vasconcelos, M. S.; Lyra, M. L.; de Moura, F. A. B. F.

    2005-02-01

    We use a tight-binding formulation to investigate the transmissivity and wave-packet dynamics of sequences of single-strand DNA molecules made up from the nucleotides guanine G , adenine A , cytosine C , and thymine T . In order to reveal the relevance of the underlying correlations in the nucleotides distribution, we compare the results for the genomic DNA sequence with those of two artificial sequences: (i) the Rudin-Shapiro one, which has long-range correlations; (ii) a random sequence, which is a kind of prototype of a short-range correlated system, presented here with the same first-neighbor pair correlations of the human DNA sequence. We found that the long-range character of the correlations is important to the persistence of resonances of finite segments. On the other hand, the wave-packet dynamics seems to be mostly influenced by the short-range correlations.

  18. R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server

    PubMed Central

    Cannone, Jamie J.; Sweeney, Blake A.; Petrov, Anton I.; Gutell, Robin R.; Zirbel, Craig L.; Leontis, Neocles

    2015-01-01

    The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa. PMID:26048960

  19. R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server.

    PubMed

    Cannone, Jamie J; Sweeney, Blake A; Petrov, Anton I; Gutell, Robin R; Zirbel, Craig L; Leontis, Neocles

    2015-07-01

    The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa.

  20. Probabilistic sequence alignment of stratigraphic records

    NASA Astrophysics Data System (ADS)

    Lin, Luan; Khider, Deborah; Lisiecki, Lorraine E.; Lawrence, Charles E.

    2014-10-01

    The assessment of age uncertainty in stratigraphically aligned records is a pressing need in paleoceanographic research. The alignment of ocean sediment cores is used to develop mutually consistent age models for climate proxies and is often based on the δ18O of calcite from benthic foraminifera, which records a global ice volume and deep water temperature signal. To date, δ18O alignment has been performed by manual, qualitative comparison or by deterministic algorithms. Here we present a hidden Markov model (HMM) probabilistic algorithm to find 95% confidence bands for δ18O alignment. This model considers the probability of every possible alignment based on its fit to the δ18O data and transition probabilities for sedimentation rate changes obtained from radiocarbon-based estimates for 37 cores. Uncertainty is assessed using a stochastic back trace recursion to sample alignments in exact proportion to their probability. We applied the algorithm to align 35 late Pleistocene records to a global benthic δ18O stack and found that the mean width of 95% confidence intervals varies between 3 and 23 kyr depending on the resolution and noisiness of the record's δ18O signal. Confidence bands within individual cores also vary greatly, ranging from ~0 to >40 kyr. These alignment uncertainty estimates will allow researchers to examine the robustness of their conclusions, including the statistical evaluation of lead-lag relationships between events observed in different cores.

  1. Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks

    PubMed Central

    Bawono, Punto; van der Velde, Arjan; Abeln, Sanne; Heringa, Jaap

    2015-01-01

    Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and CS scores treat mismatches between a query and reference alignment as equally bad, and do not take the separation into account between two amino acids in the query alignment, that should have been matched according to the reference alignment. This is significant since the magnitude of alignment shifts is often of relevance in biological analyses, including homology modeling and MSA refinement/manual alignment editing. In this study we develop a new alignment benchmark scoring scheme, SPdist, that takes the degree of discordance of mismatches into account by measuring the sequence distance between mismatched residue pairs in the query alignment. Using this new score along with the standard SP score, we investigate the discriminatory behavior of the new score by assessing how well six different MSA methods perform with respect to BAliBASE reference alignments. The SP score and the SPdist score yield very similar outcomes when the reference and query alignments are close. However, for more divergent reference alignments the SPdist score is able to distinguish between methods that keep alignments approximately close to the reference and those exhibiting larger shifts. We observed that by using SPdist together with SP scoring we were able to better delineate the alignment quality difference between alternative MSA methods. With a case study we exemplify why it is important, from a biological perspective, to consider the separation of mismatches. The SPdist scoring scheme has been implemented in the VerAlign web server (http://www.ibi.vu.nl/programs/veralignwww/). The code for calculating SPdist score is also available upon request. PMID:25993129

  2. Novel hybrid genetic algorithm for progressive multiple sequence alignment.

    PubMed

    Afridi, Muhammad Ishaq

    2013-01-01

    The family of evolutionary or genetic algorithms is used in various fields of bioinformatics. Genetic algorithms (GAs) can be used for simultaneous comparison of a large pool of DNA or protein sequences. This article explains how the GA is used in combination with other methods like the progressive multiple sequence alignment strategy to get an optimal multiple sequence alignment (MSA). Optimal MSA get much importance in the field of bioinformatics and some other related disciplines. Evolutionary algorithms evolve and improve their performance. In this optimisation, the initial pair-wise alignment is achieved through a progressive method and then a good objective function is used to select and align more alignments and profiles. Child and subpopulation initialisation is based upon changes in the probability of similarity or the distance matrix of the alignment population. In this genetic algorithm, optimisation of mutation, crossover and migration in the population of candidate solution reflect events of natural organic evolution.

  3. Method for the detection of specific nucleic acid sequences by polymerase nucleotide incorporation

    DOEpatents

    Castro, Alonso

    2004-06-01

    A method for rapid and efficient detection of a target DNA or RNA sequence is provided. A primer having a 3'-hydroxyl group at one end and having a sequence of nucleotides sufficiently homologous with an identifying sequence of nucleotides in the target DNA is selected. The primer is hybridized to the identifying sequence of nucleotides on the DNA or RNA sequence and a reporter molecule is synthesized on the target sequence by progressively binding complementary nucleotides to the primer, where the complementary nucleotides include nucleotides labeled with a fluorophore. Fluorescence emitted by fluorophores on single reporter molecules is detected to identify the target DNA or RNA sequence.

  4. An enhanced algorithm for multiple sequence alignment of protein sequences using genetic algorithm

    PubMed Central

    Kumar, Manish

    2015-01-01

    One of the most fundamental operations in biological sequence analysis is multiple sequence alignment (MSA). The basic of multiple sequence alignment problems is to determine the most biologically plausible alignments of protein or DNA sequences. In this paper, an alignment method using genetic algorithm for multiple sequence alignment has been proposed. Two different genetic operators mainly crossover and mutation were defined and implemented with the proposed method in order to know the population evolution and quality of the sequence aligned. The proposed method is assessed with protein benchmark dataset, e.g., BALIBASE, by comparing the obtained results to those obtained with other alignment algorithms, e.g., SAGA, RBT-GA, PRRP, HMMT, SB-PIMA, CLUSTALX, CLUSTAL W, DIALIGN and PILEUP8 etc. Experiments on a wide range of data have shown that the proposed algorithm is much better (it terms of score) than previously proposed algorithms in its ability to achieve high alignment quality. PMID:27065770

  5. Protein Sequence Alignment Taking the Structure of Peptide Bond

    NASA Astrophysics Data System (ADS)

    Hara, Toshihide; Sato, Keiko; Ohya, Masanori

    2013-01-01

    In a previous paper1 we proposed a new method for performing pairwise alignment of protein sequences. The method, called MTRAP, achieves the highest performance compared with other alignment methods such as ClustalW22,3 on two benchmarks for alignment accuracy. In this paper, we introduce a new measure between two amino acids based on the formation of peptide bonds. The measure is implemented into MTRAP software to further improve alignment accuracy. Our alignment software is available at

  6. MANGO: a new approach to multiple sequence alignment.

    PubMed

    Zhang, Zefeng; Lin, Hao; Li, Ming

    2007-01-01

    Multiple sequence alignment is a classical and challenging task for biological sequence analysis. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state of the art multiple sequence alignment programs suffer from the 'once a gap, always a gap' phenomenon. Is there a radically new way to do multiple sequence alignment? This paper introduces a novel and orthogonal multiple sequence alignment method, using multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds are provably significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks showing that MANGO compares favorably, in both accuracy and speed, against state-of-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, Prob-ConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0 and Kalign 2.0.

  7. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

    PubMed

    Xia, Xuhua

    2016-09-01

    While pairwise sequence alignment (PSA) by dynamic programming is guaranteed to generate one of the optimal alignments, multiple sequence alignment (MSA) of highly divergent sequences often results in poorly aligned sequences, plaguing all subsequent phylogenetic analysis. One way to avoid this problem is to use only PSA to reconstruct phylogenetic trees, which can only be done with distance-based methods. I compared the accuracy of this new computational approach (named PhyPA for phylogenetics by pairwise alignment) against the maximum likelihood method using MSA (the ML+MSA approach), based on nucleotide, amino acid and codon sequences simulated with different topologies and tree lengths. I present a surprising discovery that the fast PhyPA method consistently outperforms the slow ML+MSA approach for highly diverged sequences even when all optimization options were turned on for the ML+MSA approach. Only when sequences are not highly diverged (i.e., when a reliable MSA can be obtained) does the ML+MSA approach outperforms PhyPA. The true topologies are always recovered by ML with the true alignment from the simulation. However, with MSA derived from alignment programs such as MAFFT or MUSCLE, the recovered topology consistently has higher likelihood than that for the true topology. Thus, the failure to recover the true topology by the ML+MSA is not because of insufficient search of tree space, but by the distortion of phylogenetic signal by MSA methods. I have implemented in DAMBE PhyPA and two approaches making use of multi-gene data sets to derive phylogenetic support for subtrees equivalent to resampling techniques such as bootstrapping and jackknifing.

  8. Complete nucleotide sequence and genome organization of bovine parvovirus.

    PubMed Central

    Chen, K C; Shull, B C; Moses, E A; Lederman, M; Stout, E R; Bates, R C

    1986-01-01

    We determined the complete nucleotide sequence of bovine parvovirus (BPV), an autonomous parvovirus. The sequence is 5,491 nucleotides long. The terminal regions contain nonidentical imperfect palindromic sequences of 150 and 121 nucleotides. In the plus strand, there are three large open reading frames (left ORF, mid ORF, and right ORF) with coding capacities of 729, 255, and 685 amino acids, respectively. As with all parvoviruses studied to date, the left ORF of BPV codes for the nonstructural protein NS-1 and the right ORF codes for the major parts of the three capsid proteins. The mid ORF probably encodes the major part of the nonstructural protein NP-1. There are promoterlike sequences at map units 4.5, 12.8, and 38.7 and polyadenylation signals at map units 61.6, 64.6, and 98.5. BPV has little DNA homology with the defective parvovirus AAV, with the human autonomous parvovirus B19, or with the other autonomous parvoviruses sequenced (canine parvovirus, feline panleukopenia virus, H-1, and minute virus of mice). Even though the overall DNA homology of BPV with other parvoviruses is low, several small regions of high homology are observed when the amino acid sequences encoded by the left and right ORFs are compared. From these comparisons, it can be shown that the evolutionary relationship among the parvoviruses is B19 in equilibrium with AAV in equilibrium with BPV in equilibrium with MVM. The highly conserved amino acid sequences observed among all parvoviruses may be useful in the identification and detection of parvoviruses and in the design of a general parvovirus vaccine. PMID:3783814

  9. FASH: A web application for nucleotides sequence search

    PubMed Central

    Veksler-Lublinksy, Isana; Barash, Danny; Avisar, Chai; Troim, Einav; Chew, Paul; Kedem, Klara

    2008-01-01

    FASH (Fourier Alignment Sequence Heuristics) is a web application, based on the Fast Fourier Transform, for finding remote homologs within a long nucleic acid sequence. Given a query sequence and a long text-sequence (e.g, the human genome), FASH detects subsequences within the text that are remotely-similar to the query. FASH offers an alternative approach to Blast/Fasta for querying long RNA/DNA sequences. FASH differs from these other approaches in that it does not depend on the existence of contiguous seed-sequences in its initial detection phase. The FASH web server is user friendly and very easy to operate. FASH can be accessed at (secured website) PMID:18505581

  10. Nucleotide-Specific Contrast for DNA Sequencing by Electron Spectroscopy

    PubMed Central

    Schmid, Andreas K.; Davis, Ronald W.

    2016-01-01

    DNA sequencing by imaging in an electron microscope is an approach that holds promise to deliver long reads with low error rates and without the need for amplification. Earlier work using transmission electron microscopes, which use high electron energies on the order of 100 keV, has shown that low contrast and radiation damage necessitates the use of heavy atom labeling of individual nucleotides, which increases the read error rates. Other prior work using scattering electrons with much lower energy has shown to suppress beam damage on DNA. Here we explore possibilities to increase contrast by employing two methods, X-ray photoelectron and Auger electron spectroscopy. Using bulk DNA samples with monomers of each base, both methods are shown to provide contrast mechanisms that can distinguish individual nucleotides without labels. Both spectroscopic techniques can be readily implemented in a low energy electron microscope, which may enable label-free DNA sequencing by direct imaging. PMID:27149617

  11. Sequence alignments and pair hidden Markov models using evolutionary history.

    PubMed

    Knudsen, Bjarne; Miyamoto, Michael M

    2003-10-17

    This work presents a novel pairwise statistical alignment method based on an explicit evolutionary model of insertions and deletions (indels). Indel events of any length are possible according to a geometric distribution. The geometric distribution parameter, the indel rate, and the evolutionary time are all maximum likelihood estimated from the sequences being aligned. Probability calculations are done using a pair hidden Markov model (HMM) with transition probabilities calculated from the indel parameters. Equations for the transition probabilities make the pair HMM closely approximate the specified indel model. The method provides an optimal alignment, its likelihood, the likelihood of all possible alignments, and the reliability of individual alignment regions. Human alpha and beta-hemoglobin sequences are aligned, as an illustration of the potential utility of this pair HMM approach.

  12. The nucleotide sequence of the human beta-globin gene.

    PubMed

    Lawn, R M; Efstratiadis, A; O'Connell, C; Maniatis, T

    1980-10-01

    We report the complete nucleotide sequence of the human beta-globin gene. The purpose of this study is to obtain information necessary to study the evolutionary relationships between members of the human beta-like globin gene family and to provide the basis for comparing normal beta-globin genes with those obtained from the DNA of individuals with genetic defects in hemoglobin expression.

  13. Spatio-temporal alignment of pedobarographic image sequences.

    PubMed

    Oliveira, Francisco P M; Sousa, Andreia; Santos, Rubim; Tavares, João Manuel R S

    2011-07-01

    This article presents a methodology to align plantar pressure image sequences simultaneously in time and space. The spatial position and orientation of a foot in a sequence are changed to match the foot represented in a second sequence. Simultaneously with the spatial alignment, the temporal scale of the first sequence is transformed with the aim of synchronizing the two input footsteps. Consequently, the spatial correspondence of the foot regions along the sequences as well as the temporal synchronizing is automatically attained, making the study easier and more straightforward. In terms of spatial alignment, the methodology can use one of four possible geometric transformation models: rigid, similarity, affine, or projective. In the temporal alignment, a polynomial transformation up to the 4th degree can be adopted in order to model linear and curved time behaviors. Suitable geometric and temporal transformations are found by minimizing the mean squared error (MSE) between the input sequences. The methodology was tested on a set of real image sequences acquired from a common pedobarographic device. When used in experimental cases generated by applying geometric and temporal control transformations, the methodology revealed high accuracy. In addition, the intra-subject alignment tests from real plantar pressure image sequences showed that the curved temporal models produced better MSE results (P < 0.001) than the linear temporal model. This article represents an important step forward in the alignment of pedobarographic image data, since previous methods can only be applied on static images.

  14. Large-scale detection and application of expressed sequence tag single nucleotide polymorphisms in Nicotiana.

    PubMed

    Wang, Y; Zhou, D; Wang, S; Yang, L

    2015-07-14

    Single nucleotide polymorphisms (SNPs) are widespread in the Nicotiana genome. Using an alignment and variation detection method, we developed 20,607,973 SNPs, based on the expressed sequence tag sequences of 10 Nicotiana species. The replacement rate was much higher than the transversion rate in the SNPs, and SNPs widely exist in the Nicotiana. In vitro verification indicated that all of the SNPs were high quality and accurate. Evolutionary relationships between 15 varieties were investigated by polymerase chain reaction with a special primer; the specific 302 locus of these sequence results clearly indicated the origin of Zhongyan 100. A database of Nicotiana SNPs (NSNP) was developed to store and search for SNPs in Nicotiana. NSNP is a tool for researchers to develop SNP markers of sequence data.

  15. The complete nucleotide sequence of pelargonium leaf curl virus.

    PubMed

    McGavin, Wendy J; MacFarlane, Stuart A

    2016-05-01

    Investigation of a tombusvirus isolated from tulip plants in Scotland revealed that it was pelargonium leaf curl virus (PLCV) rather than the originally suggested tomato bushy stunt virus. The complete sequence of the PLCV genome was determined for the first time, revealing it to be 4789 nucleotides in size and to have an organization similar to that of the other, previously described tombusviruses. Primers derived from the sequence were used to construct a full-length infectious clone of PLCV that recapitulates the disease symptoms of leaf curling in systemically infected pelargonium plants.

  16. Empirical Transition Probability Indexing Sparse-Coding Belief Propagation (ETPI-SCoBeP) Genome Sequence Alignment

    PubMed Central

    Roozgard, Aminmohammad; Barzigar, Nafise; Wang, Shuang; Jiang, Xiaoqian; Cheng, Samuel

    2014-01-01

    The advance in human genome sequencing technology has significantly reduced the cost of data generation and overwhelms the computing capability of sequence analysis. Efficiency, efficacy, and scalability remain challenging in sequence alignment, which is an important and foundational operation for genome data analysis. In this paper, we propose a two-stage approach to tackle this problem. In the preprocessing step, we match blocks of reference and target sequences based on the similarities between their empirical transition probability distributions using belief propagation. We then conduct a refined match using our recently published sparse-coding belief propagation (SCoBeP) technique. Our experimental results demonstrated robustness in nucleotide sequence alignment, and our results are competitive to those of the SOAP aligner and the BWA algorithm. Moreover, compared to SCoBeP alignment, the proposed technique can handle sequences of much longer lengths. PMID:25983537

  17. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  18. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2012-07-01 2012-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  19. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2014-07-01 2014-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  20. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2011-07-01 2011-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  1. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  2. Multiple sequence alignment with user-defined anchor points

    PubMed Central

    Morgenstern, Burkhard; Prohaska, Sonja J; Pöhler, Dirk; Stadler, Peter F

    2006-01-01

    Background Automated software tools for multiple alignment often fail to produce biologically meaningful results. In such situations, expert knowledge can help to improve the quality of alignments. Results Herein, we describe a semi-automatic version of the alignment program DIALIGN that can take pre-defined constraints into account. It is possible for the user to specify parts of the sequences that are assumed to be homologous and should therefore be aligned to each other. Our software program can use these sites as anchor points by creating a multiple alignment respecting these constraints. This way, our alignment method can produce alignments that are biologically more meaningful than alignments produced by fully automated procedures. As a demonstration of how our method works, we apply our approach to genomic sequences around the Hox gene cluster and to a set of DNA-binding proteins. As a by-product, we obtain insights about the performance of the greedy algorithm that our program uses for multiple alignment and about the underlying objective function. This information will be useful for the further development of DIALIGN. The described alignment approach has been integrated into the TRACKER software system. PMID:16722533

  3. Protein multiple sequence alignment by hybrid bio-inspired algorithms.

    PubMed

    Cutello, Vincenzo; Nicosia, Giuseppe; Pavone, Mario; Prizzi, Igor

    2011-03-01

    This article presents an immune inspired algorithm to tackle the Multiple Sequence Alignment (MSA) problem. MSA is one of the most important tasks in biological sequence analysis. Although this paper focuses on protein alignments, most of the discussion and methodology may also be applied to DNA alignments. The problem of finding the multiple alignment was investigated in the study by Bonizzoni and Vedova and Wang and Jiang, and proved to be a NP-hard (non-deterministic polynomial-time hard) problem. The presented algorithm, called Immunological Multiple Sequence Alignment Algorithm (IMSA), incorporates two new strategies to create the initial population and specific ad hoc mutation operators. It is based on the 'weighted sum of pairs' as objective function, to evaluate a given candidate alignment. IMSA was tested using both classical benchmarks of BAliBASE (versions 1.0, 2.0 and 3.0), and experimental results indicate that it is comparable with state-of-the-art multiple alignment algorithms, in terms of quality of alignments, weighted Sums-of-Pairs (SP) and Column Score (CS) values. The main novelty of IMSA is its ability to generate more than a single suboptimal alignment, for every MSA instance; this behaviour is due to the stochastic nature of the algorithm and of the populations evolved during the convergence process. This feature will help the decision maker to assess and select a biologically relevant multiple sequence alignment. Finally, the designed algorithm can be used as a local search procedure to properly explore promising alignments of the search space.

  4. Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints

    PubMed Central

    Dowell, Robin D; Eddy, Sean R

    2006-01-01

    Background We are interested in the problem of predicting secondary structure for small sets of homologous RNAs, by incorporating limited comparative sequence information into an RNA folding model. The Sankoff algorithm for simultaneous RNA folding and alignment is a basis for approaches to this problem. There are two open problems in applying a Sankoff algorithm: development of a good unified scoring system for alignment and folding and development of practical heuristics for dealing with the computational complexity of the algorithm. Results We use probabilistic models (pair stochastic context-free grammars, pairSCFGs) as a unifying framework for scoring pairwise alignment and folding. A constrained version of the pairSCFG structural alignment algorithm was developed which assumes knowledge of a few confidently aligned positions (pins). These pins are selected based on the posterior probabilities of a probabilistic pairwise sequence alignment. Conclusion Pairwise RNA structural alignment improves on structure prediction accuracy relative to single sequence folding. Constraining on alignment is a straightforward method of reducing the runtime and memory requirements of the algorithm. Five practical implementations of the pairwise Sankoff algorithm – this work (Consan), David Mathews' Dynalign, Ian Holmes' Stemloc, Ivo Hofacker's PMcomp, and Jan Gorodkin's FOLDALIGN – have comparable overall performance with different strengths and weaknesses. PMID:16952317

  5. Bioinformatics comparison of sulfate-reducing metabolism nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Tremberger, G.; Dehipawala, Sunil; Nguyen, A.; Cheung, E.; Sullivan, R.; Holden, T.; Lieberman, D.; Cheung, T.

    2015-09-01

    The sulfate-reducing bacteria can be traced back to 3.5 billion years ago. The thermodynamics details of the sulfur cycle have been well documented. A recent sulfate-reducing bacteria report (Robator, Jungbluth, et al , 2015 Jan, Front. Microbiol) with Genbank nucleotide data has been analyzed in terms of the sulfite reductase (dsrAB) via fractal dimension and entropy values. Comparison to oil field sulfate-reducing sequences was included. The AUCG translational mass fractal dimension versus ATCG transcriptional mass fractal dimension for the low temperature dsrB and dsrA sequences reported in Reference Thirteen shows correlation R-sq ~ 0.79 , with a probably of about 3% in simulation. A recent report of using Cystathionine gamma-lyase sequence to produce CdS quantum dot in a biological method, where the sulfur is reduced just like in the H2S production process, was included for comparison. The AUCG mass fractal dimension versus ATCG mass fractal dimension for the Cystathionine gamma-lyase sequences was found to have R-sq of 0.72, similar to the low temperature dissimilatory sulfite reductase dsr group with 3% probability, in contrary to the oil field group having R-sq ~ 0.94, a high probable outcome in the simulation. The other two simulation histograms, namely, fractal dimension versus entropy R-sq outcome values, and di-nucleotide entropy versus mono-nucleotide entropy R-sq outcome values are also discussed in the data analysis focusing on low probability outcomes.

  6. Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence tag data.

    PubMed

    Batley, Jacqueline; Barker, Gary; O'Sullivan, Helen; Edwards, Keith J; Edwards, David

    2003-05-01

    We have developed a computer based method to identify candidate single nucleotide polymorphisms (SNPs) and small insertions/deletions from expressed sequence tag data. Using a redundancy-based approach, valid SNPs are distinguished from erroneous sequence by their representation multiple times in an alignment of sequence reads. A second measure of validity was also calculated based on the cosegregation of the SNP pattern between multiple SNP loci in an alignment. The utility of this method was demonstrated by applying it to 102,551 maize (Zea mays) expressed sequence tag sequences. A total of 14,832 candidate polymorphisms were identified with an SNP redundancy score of two or greater. Segregation of these SNPs with haplotype indicates that candidate SNPs with high redundancy and cosegregation confidence scores are likely to represent true SNPs. This was confirmed by validation of 264 candidate SNPs from 27 loci, with a range of redundancy and cosegregation scores, in four inbred maize lines. The SNP transition/transversion ratio and insertion/deletion size frequencies correspond to those observed by direct sequencing methods of SNP discovery and suggest that the majority of predicted SNPs and insertion/deletions identified using this approach represent true genetic variation in maize.

  7. Nucleotide sequence and genome organization of canine parvovirus.

    PubMed Central

    Reed, A P; Jones, E V; Miller, T J

    1988-01-01

    The genome of a canine parvovirus isolate strain (CPV-N) was cloned, and the DNA sequence was determined. The entire genome, including ends, was 5,323 nucleotides in length. The terminal repeat at the 3' end of the genome shared similar structural characteristics but limited homology with the rodent parvoviruses. The 5' terminal repeat was not detected in any of the clones. Instead, a region of DNA starting near the capsid gene stop codon and extending 248 base pairs into the coding region had been duplicated and inserted 75 base pairs downstream from the poly(A) addition site. Consensus sequences for the 5' donor and 3' acceptor sites as well as promotors and poly(A) addition sites were identified and compared with the available information on related parvoviruses. The genomic organization of CPV-N is similar to that of feline parvovirus (FPV) in that there are two major open reading frames (668 and 722 amino acids) in the plus strand (mRNA polarity). Both coding domains are in the same frame, and no significant open reading frames were apparent in any of the other frames of both minus and plus DNA strands. The nucleotide and amino acid homologies of the capsid genes between CPV-N and FPV were 98 and 99%, respectively. In contrast, the nucleotide and amino acid homologies of the capsid genes for CPV-N and CPV-b (S. Rhode III, J. Virol. 54:630-633, 1985) were 95 and 98%, respectively. These results indicate that very few nucleotide or amino acid changes differentiate the antigenic and host range specificity of FPV and CPV. PMID:2824850

  8. Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns

    PubMed Central

    Amir, Amnon; McDonald, Daniel; Navas-Molina, Jose A.; Kopylova, Evguenia; Morton, James T.; Zech Xu, Zhenjiang; Kightley, Eric P.; Thompson, Luke R.; Hyde, Embriette R.; Gonzalez, Antonio

    2017-01-01

    ABSTRACT High-throughput sequencing of 16S ribosomal RNA gene amplicons has facilitated understanding of complex microbial communities, but the inherent noise in PCR and DNA sequencing limits differentiation of closely related bacteria. Although many scientific questions can be addressed with broad taxonomic profiles, clinical, food safety, and some ecological applications require higher specificity. Here we introduce a novel sub-operational-taxonomic-unit (sOTU) approach, Deblur, that uses error profiles to obtain putative error-free sequences from Illumina MiSeq and HiSeq sequencing platforms. Deblur substantially reduces computational demands relative to similar sOTU methods and does so with similar or better sensitivity and specificity. Using simulations, mock mixtures, and real data sets, we detected closely related bacterial sequences with single nucleotide differences while removing false positives and maintaining stability in detection, suggesting that Deblur is limited only by read length and diversity within the amplicon sequences. Because Deblur operates on a per-sample level, it scales to modern data sets and meta-analyses. To highlight Deblur’s ability to integrate data sets, we include an interactive exploration of its application to multiple distinct sequencing rounds of the American Gut Project. Deblur is open source under the Berkeley Software Distribution (BSD) license, easily installable, and downloadable from https://github.com/biocore/deblur. IMPORTANCE Deblur provides a rapid and sensitive means to assess ecological patterns driven by differentiation of closely related taxa. This algorithm provides a solution to the problem of identifying real ecological differences between taxa whose amplicons differ by a single base pair, is applicable in an automated fashion to large-scale sequencing data sets, and can integrate sequencing runs collected over time. PMID:28289731

  9. The nucleotide sequence of a nematode vitellogenin gene.

    PubMed Central

    Spieth, J; Denison, K; Zucker, E; Blumenthal, T

    1985-01-01

    The nematode, Caenorhabditis elegans, contains a family of six genes that code for vitellogenins. Here we report the complete nucleotide sequence of one of these genes, vit-5. The gene specifies a mRNA of 4869 nucleotides, including untranslated regions of 9 bases at the 5' end and 51 bases at the 3' end. Vit-5 contains four short introns totalling 218 bp. The predicted vitellogenin, yp170A, has a molecular weight of 186,430. At its N terminus it is clearly related to the vitellogenins of vertebrates. However, the vit-5-encoded protein does not contain a serine-rich sequence related to the vertebrate vitellin, phosvitin. In fact, the amino acid composition of the nematode protein is very similar to that of the vertebrate protein without phosvitin. Vit-5 has a highly asymmetric codon choice dictionary. The favored codons are different from those favored in other organisms, but are characteristic of highly expressed C. elegans genes. The strong selection against rare codons is not as great near the 5' end of the gene; rare codons are 15 times more frequent within the first 54 bp than in the next 4.8 kb. PMID:3855245

  10. MSAViewer: interactive JavaScript visualization of multiple sequence alignments.

    PubMed

    Yachdav, Guy; Wilzbach, Sebastian; Rauscher, Benedikt; Sheridan, Robert; Sillitoe, Ian; Procter, James; Lewis, Suzanna E; Rost, Burkhard; Goldberg, Tatyana

    2016-11-15

    The MSAViewer is a quick and easy visualization and analysis JavaScript component for Multiple Sequence Alignment data of any size. Core features include interactive navigation through the alignment, application of popular color schemes, sorting, selecting and filtering. The MSAViewer is 'web ready': written entirely in JavaScript, compatible with modern web browsers and does not require any specialized software. The MSAViewer is part of the BioJS collection of components.

  11. ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches.

    PubMed

    Rognes, T

    2001-04-01

    There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

  12. Recursive dynamic programming for adaptive sequence and structure alignment

    SciTech Connect

    Thiele, R.; Zimmer, R.; Lengauer, T.

    1995-12-31

    We propose a new alignment procedure that is capable of aligning protein sequences and structures in a unified manner. Recursive dynamic programming (RDP) is a hierarchical method which, on each level of the hierarchy, identifies locally optimal solutions and assembles them into partial alignments of sequences and/or structures. In contrast to classical dynamic programming, RDP can also handle alignment problems that use objective functions not obeying the principle of prefix optimality, e.g. scoring schemes derived from energy potentials of mean force. For such alignment problems, RDP aims at computing solutions that are near-optimal with respect to the involved cost function and biologically meaningful at the same time. Towards this goal, RDP maintains a dynamic balance between different factors governing alignment fitness such as evolutionary relationships and structural preferences. As in the RDP method gaps are not scored explicitly, the problematic assignment of gap cost parameters is circumvented. In order to evaluate the RDP approach we analyse whether known and accepted multiple alignments based on structural information can be reproduced with the RDP method.

  13. Nucleotide sequences specific to Yersinia pestis and methods for the detection of Yersinia pestis

    DOEpatents

    McCready, Paula M.; Radnedge, Lyndsay; Andersen, Gary L.; Ott, Linda L.; Slezak, Thomas R.; Kuczmarski, Thomas A.; Motin, Vladinir L.

    2009-02-24

    Nucleotide sequences specific to Yersinia pestis that serve as markers or signatures for identification of this bacterium were identified. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  14. Nucleotide sequences specific to Francisella tularensis and methods for the detection of Francisella tularensis

    DOEpatents

    McCready, Paula M.; Radnedge, Lyndsay; Andersen, Gary L.; Ott, Linda L.; Slezak, Thomas R.; Kuczmarski, Thomas A.; Vitalis, Elizabeth A

    2009-02-24

    Described herein is the identification of nucleotide sequences specific to Francisella tularensis that serves as a marker or signature for identification of this bacterium. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  15. Nucleotide sequences specific to Francisella tularensis and methods for the detection of Francisella tularensis

    DOEpatents

    McCready, Paula M.; Radnedge, Lyndsay; Andersen, Gary L.; Ott, Linda L.; Slezak, Thomas R.; Kuczmarski, Thomas A.; Vitalis, Elizabeth A

    2007-02-06

    Described herein is the identification of nucleotide sequences specific to Francisella tularensis that serves as a marker or signature for identification of this bacterium. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  16. Nucleotide sequences specific to Brucella and methods for the detection of Brucella

    DOEpatents

    McCready, Paula M.; Radnedge, Lyndsay; Andersen, Gary L.; Ott, Linda L.; Slezak, Thomas R.; Kuczmarski, Thomas A.

    2009-02-24

    Nucleotide sequences specific to Brucella that serves as a marker or signature for identification of this bacterium were identified. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  17. The complete nucleotide sequence of chrysanthemum stem necrosis virus.

    PubMed

    Dullemans, A M; Verhoeven, J Th J; Kormelink, R; van der Vlugt, R A A

    2015-02-01

    The complete genome sequence of chrysanthemum stem necrosis virus (CSNV) was determined using Roche 454 next-generation sequencing. CSNV is a tentative member of the genus Tospovirus within the family Bunyaviridae, whose members are arthropod-borne. This is the first report of the entire RNA genome sequence of a CSNV isolate. The large RNA of CSNV is 8955 nucleotides (nt) in size and contains a single open reading frame of 8625 nt in the antisense arrangement, coding for the putative RNA-dependent RNA polymerase (L protein) of 2874 aa with a predicted Mr of 331 kDa. Two untranslated regions of 397 and 33 nt are present at the 5' and 3' termini, respectively. The medium (M) and small (S) RNAs are 4830 and 2947 nt in size, respectively, and show 99 % identity to the corresponding genomic segments of previously partially characterized CSNV genomes. Protein sequences for the precursor of the Gn/Gc proteins, N and NSs, are identical in length in all of the analysed CSNV isolates.

  18. Generalized Levy-walk model for DNA nucleotide sequences

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Simons, M.; Stanley, H. E.

    1993-01-01

    We propose a generalized Levy walk to model fractal landscapes observed in noncoding DNA sequences. We find that this model provides a very close approximation to the empirical data and explains a number of statistical properties of genomic DNA sequences such as the distribution of strand-biased regions (those with an excess of one type of nucleotide) as well as local changes in the slope of the correlation exponent alpha. The generalized Levy-walk model simultaneously accounts for the long-range correlations in noncoding DNA sequences and for the apparently paradoxical finding of long subregions of biased random walks (length lj) within these correlated sequences. In the generalized Levy-walk model, the lj are chosen from a power-law distribution P(lj) varies as lj(-mu). The correlation exponent alpha is related to mu through alpha = 2-mu/2 if 2 < mu < 3. The model is consistent with the finding of "repetitive elements" of variable length interspersed within noncoding DNA.

  19. Multiple sequence alignment using multi-objective based bacterial foraging optimization algorithm.

    PubMed

    Rani, R Ranjani; Ramyachitra, D

    2016-12-01

    Multiple sequence alignment (MSA) is a widespread approach in computational biology and bioinformatics. MSA deals with how the sequences of nucleotides and amino acids are sequenced with possible alignment and minimum number of gaps between them, which directs to the functional, evolutionary and structural relationships among the sequences. Still the computation of MSA is a challenging task to provide an efficient accuracy and statistically significant results of alignments. In this work, the Bacterial Foraging Optimization Algorithm was employed to align the biological sequences which resulted in a non-dominated optimal solution. It employs Multi-objective, such as: Maximization of Similarity, Non-gap percentage, Conserved blocks and Minimization of gap penalty. BAliBASE 3.0 benchmark database was utilized to examine the proposed algorithm against other methods In this paper, two algorithms have been proposed: Hybrid Genetic Algorithm with Artificial Bee Colony (GA-ABC) and Bacterial Foraging Optimization Algorithm. It was found that Hybrid Genetic Algorithm with Artificial Bee Colony performed better than the existing optimization algorithms. But still the conserved blocks were not obtained using GA-ABC. Then BFO was used for the alignment and the conserved blocks were obtained. The proposed Multi-Objective Bacterial Foraging Optimization Algorithm (MO-BFO) was compared with widely used MSA methods Clustal Omega, Kalign, MUSCLE, MAFFT, Genetic Algorithm (GA), Ant Colony Optimization (ACO), Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO) and Hybrid Genetic Algorithm with Artificial Bee Colony (GA-ABC). The final results show that the proposed MO-BFO algorithm yields better alignment than most widely used methods.

  20. Image-based temporal alignment of echocardiographic sequences

    NASA Astrophysics Data System (ADS)

    Danudibroto, Adriyana; Bersvendsen, Jørn; Mirea, Oana; Gerard, Olivier; D'hooge, Jan; Samset, Eigil

    2016-04-01

    Temporal alignment of echocardiographic sequences enables fair comparisons of multiple cardiac sequences by showing corresponding frames at given time points in the cardiac cycle. It is also essential for spatial registration of echo volumes where several acquisitions are combined for enhancement of image quality or forming larger field of view. In this study, three different image-based temporal alignment methods were investigated. First, a method based on dynamic time warping (DTW). Second, a spline-based method that optimized the similarity between temporal characteristic curves of the cardiac cycle using 1D cubic B-spline interpolation. Third, a method based on the spline-based method with piecewise modification. These methods were tested on in-vivo data sets of 19 echo sequences. For each sequence, the mitral valve opening (MVO) time was manually annotated. The results showed that the average MVO timing error for all methods are well under the time resolution of the sequences.

  1. A sequence alignment-independent method for protein classification.

    PubMed

    Vries, John K; Munshi, Rajan; Tobi, Dror; Klein-Seetharaman, Judith; Benos, Panayiotis V; Bahar, Ivet

    2004-01-01

    Annotation of the rapidly accumulating body of sequence data relies heavily on the detection of remote homologues and functional motifs in protein families. The most popular methods rely on sequence alignment. These include programs that use a scoring matrix to compare the probability of a potential alignment with random chance and programs that use curated multiple alignments to train profile hidden Markov models (HMMs). Related approaches depend on bootstrapping multiple alignments from a single sequence. However, alignment-based programs have limitations. They make the assumption that contiguity is conserved between homologous segments, which may not be true in genetic recombination or horizontal transfer. Alignments also become ambiguous when sequence similarity drops below 40%. This has kindled interest in classification methods that do not rely on alignment. An approach to classification without alignment based on the distribution of contiguous sequences of four amino acids (4-grams) was developed. Interest in 4-grams stemmed from the observation that almost all theoretically possible 4-grams (20(4)) occur in natural sequences and the majority of 4-grams are uniformly distributed. This implies that the probability of finding identical 4-grams by random chance in unrelated sequences is low. A Bayesian probabilistic model was developed to test this hypothesis. For each protein family in Pfam-A and PIR-PSD, a feature vector called a probe was constructed from the set of 4-grams that best characterised the family. In rigorous jackknife tests, unknown sequences from Pfam-A and PIR-PSD were compared with the probes for each family. A classification result was deemed a true positive if the probe match with the highest probability was in first place in a rank-ordered list. This was achieved in 70% of cases. Analysis of false positives suggested that the precision might approach 85% if selected families were clustered into subsets. Case studies indicated that the 4

  2. Heuristic reusable dynamic programming: efficient updates of local sequence alignment.

    PubMed

    Hong, Changjin; Tewfik, Ahmed H

    2009-01-01

    Recomputation of the previously evaluated similarity results between biological sequences becomes inevitable when researchers realize errors in their sequenced data or when the researchers have to compare nearly similar sequences, e.g., in a family of proteins. We present an efficient scheme for updating local sequence alignments with an affine gap model. In principle, using the previous matching result between two amino acid sequences, we perform a forward-backward alignment to generate heuristic searching bands which are bounded by a set of suboptimal paths. Given a correctly updated sequence, we initially predict a new score of the alignment path for each contour to select the best candidates among them. Then, we run the Smith-Waterman algorithm in this confined space. Furthermore, our heuristic alignment for an updated sequence shows that it can be further accelerated by using reusable dynamic programming (rDP), our prior work. In this study, we successfully validate "relative node tolerance bound" (RNTB) in the pruned searching space. Furthermore, we improve the computational performance by quantifying the successful RNTB tolerance probability and switch to rDP on perturbation-resilient columns only. In our searching space derived by a threshold value of 90 percent of the optimal alignment score, we find that 98.3 percent of contours contain correctly updated paths. We also find that our method consumes only 25.36 percent of the runtime cost of sparse dynamic programming (sDP) method, and to only 2.55 percent of that of a normal dynamic programming with the Smith-Waterman algorithm.

  3. Complete nucleotide sequence of a native plasmid from Brevibacterium linens.

    PubMed

    Moore, Mathew; Svenson, Charles; Bowling, David; Glenn, Dianne

    2003-03-01

    Brevibacterium linens has commercial significance in the dairy industry and potential application in the production of bacteriocins and carotenoids. Strain development of these industrially significant organisms would be facilitated by the use of vectors, yet few are available. In this study we report the isolation of four novel plasmids from the Gram-positive coryneform B. linens, and determine the first complete nucleotide sequence of a native plasmid of B. linens. The cryptic plasmid pLIM is 7610 bp in length, and belongs to a subfamily of theta replicating ColE2-related plasmids. Initial investigation suggests that replication in pLIM requires two replicases, a primase (RepA) and a DNA binding protein (RepB), encoded by a single operon repAB. The origin of replication is located upstream of repAB transcription.

  4. Nucleotide sequence of the hemolysin I gene from Actinobacillus pleuropneumoniae.

    PubMed Central

    Frey, J; Meier, R; Gygi, D; Nicolet, J

    1991-01-01

    The DNA sequence of the gene encoding the structural protein of hemolysin I (HlyI) of Actinobacillus pleuropneumoniae serotype 1 strain 4074 was analyzed. The nucleotide sequence shows a 3,072-bp reading frame encoding a protein of 1,023 amino acids with a calculated molecular size of 110.1 kDa. This corresponds to the HlyI protein, which has an apparent molecular size on sodium dodecyl sulfate gels of 105 kDa. The structure of the protein derived from the DNA sequence shows three hydrophobic regions in the N-terminal part of the protein, 13 glycine-rich domains in the second half of the protein, and a hydrophilic C-terminal area, all of which are typical of the cytotoxins of the RTX (repeats in the structural toxin) toxin family. The derived amino acid sequence of HlyI shows 42% homology with the hemolysin of A. pleuropneumoniae serotype 5, 41% homology with the leukotoxin of Pasteurella haemolytica, and 56% homology with the Escherichia coli alpha-hemolysin. The 13 glycine-rich repeats and three hydrophobic areas of the HlyI sequence show more similarity to the E. coli alpha-hemolysin than to either the A. pleuropneumoniae serotype 5 hemolysin or the leukotoxin (while the last two are more similar to each other). Two types of RTX hemolysins therefore seem to be present in A. pleuropneumoniae, one (HlyI) resembling the alpha-hemolysin and a second more closely related to the leukotoxin. Ca(2+)-binding experiments using HlyI and recombinant A. pleuropneumoniae prohemolysin (HlyIA) that was produced in E. coli shows that HlyI binds 45Ca2+, probably because of the 13 glycine-rich repeated domains. Activation of the prohemolysin is not required for Ca2+ binding. Images PMID:1879928

  5. Nucleotide sequence and phylogeny of a chloramphenicol acetyltransferase encoded by the plasmid pSCS7 from Staphylococcus aureus.

    PubMed

    Schwarz, S; Cardoso, M

    1991-08-01

    The nucleotide sequence of the chloramphenicol acetyltransferase gene (cat) and its regulatory region, encoded by the plasmid pSCS7 from Staphylococcus aureus, was determined. The structural cat gene encoded a protein of 209 amino acids, which represented one monomer of the enzyme chloramphenicol acetyltransferase (CAT). Comparisons between the amino acid sequences of the pSCS7-encoded CAT from S. aureus and the previously sequenced CAT variants from S. aureus, Staphylococcus intermedius, Staphylococcus haemolyticus, Bacillus pumilis, Clostridium difficile, Clostridium perfringens, Escherichia coli, Shigella flexneri, and Proteus mirabilis were performed. An alignment of CAT amino acid sequences demonstrated the presence of 34 conserved amino acids among all CAT variants. These conserved residues were considered for their possible roles in the structure and function of CAT. On the basis of the alignment, a phylogenetic tree was constructed. It demonstrated relatively large evolutionary distances between the CAT variants of enteric bacteria, Clostridium, Bacillus, and Staphylococcus species.

  6. A novel approach to multiple sequence alignment using hadoop data grids.

    PubMed

    Sudha Sadasivam, G; Baktavatchalam, G

    2010-01-01

    Multiple alignment of protein sequences helps to determine evolutionary linkage and to predict molecular structures. The factors to be considered while aligning multiple sequences are speed and accuracy of alignment. Although dynamic programming algorithms produce accurate alignments, they are computation intensive. In this paper we propose a time efficient approach to sequence alignment that also produces quality alignment. The dynamic nature of the algorithm coupled with data and computational parallelism of hadoop data grids improves the accuracy and speed of sequence alignment. The principle of block splitting in hadoop coupled with its scalability facilitates alignment of very large sequences.

  7. [Nucleotide sequence of genes for alpha- and beta-subunits of luciferase from Photobacterium leiognathi].

    PubMed

    Illarionov, B A; Protopopova, M V; Karginov, V A; Mertvetsov, N P; Gitel'zon, I I

    1988-03-01

    Nucleotide sequence of the Photobacterium leiognathi DNA containing genes of alpha and beta subunits of luciferase has been determined. We also deduced amino acid sequence and molecular mass of luciferase and localized luciferase genes in the sequenced DNA fragment.

  8. AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

    PubMed Central

    2010-01-01

    Background Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly". Conclusions AlignMiner can be used to reliably detect

  9. Nucleotide sequence of the human N-myc gene

    SciTech Connect

    Stanton, L.W.; Schwab, M.; Bishop, J.M.

    1986-03-01

    Human neuroblastomas frequently display amplification and augmented expression of a gene known as N-myc because of its similarity to the protooncogene c-myc. It has therefore been proposed that N-myc is itself a protooncogene, and subsequent tests have shown that N-myc and c-myc have similar biological activities in cell culture. The authors have now detailed the kinship between N-myc and c-myc by determining the nucleotide sequence of human N-myc and deducing the amino acid sequence of the protein encoded by the gene. The topography of N-myc is strikingly similar to that of c-myc: both genes contain three exons of similar lengths; the coding elements of both genes are located in the second and third exons; and both genes have unusually long 5' untranslated regions in their mRNAs, with features that raise the possibility that expression of the genes may be subject to similar controls of translation. The resemblance between the proteins encoded by N-myc and c-myc sustains previous suspicions that the genes encode related functions.

  10. SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly.

    PubMed

    Wala, Jeremiah; Beroukhim, Rameen

    2017-03-01

    We present SeqLib, a C ++ API and command line tool that provides a rapid and user-friendly interface to BAM/SAM/CRAM files, global sequence alignment operations and sequence assembly. Four C libraries perform core operations in SeqLib: HTSlib for BAM access, BWA-MEM and BLAT for sequence alignment and Fermi for error correction and sequence assembly. Benchmarking indicates that SeqLib has lower CPU and memory requirements than leading C ++ sequence analysis APIs. We demonstrate an example of how minimal SeqLib code can extract, error-correct and assemble reads from a CRAM file and then align with BWA-MEM. SeqLib also provides additional capabilities, including chromosome-aware interval queries and read plotting. Command line tools are available for performing integrated error correction, micro-assemblies and alignment.

  11. Nucleotide sequence from the coding region of rabbit β-globin messenger RNA

    PubMed Central

    Proudfoot, N.J.

    1976-01-01

    A sequence of 89 nucleotides from rabbit β-globin mRNA has been determined and is shown to code for residues 107 to 137 of the β-globin protein. In addition, a sequence heterogeneity has been identified within this 89 nucleotide long sequence which corresponds to a known polymorphic variant of rabbit β-globin. Images PMID:61580

  12. Multiple sequence alignment in HTML: colored, possibly hyperlinked, compact representations.

    PubMed

    Campagne, F; Maigret, B

    1998-02-01

    Protein sequence alignments are widely used in protein structure prediction, protein engineering, modeling of proteins, etc. This type of representation is useful at different stages of scientific activity: looking at previous results, working on a research project, and presenting the results. There is a need to make it available through a network (intranet or WWW), in a way that allows biologists, chemists, and noncomputer specialists to look at the data and carry on research--possibly in a collaborative research. Previous methods (text-based, Java-based) are reported and their advantages are discussed. We have developed two novel approaches to represent the alignments as colored, hyper-linked HTML pages. The first method creates an HTML page that uses efficiently the image cache mechanism of a WWW browser, thereby allowing the user to browse different alignments without waiting for the images to be loaded through the network, but only for the first viewed alignment. The generated pages can be browsed with any HTML2.0-compliant browser. The second method that we propose uses W3C-CSS1-style sheets to render alignments. This new method generates pages that require recent browsers to be viewed. We implemented these methods in the Viseur program and made a WWW service available that allows a user to convert an MSF alignment file in HTML for WWW publishing. The latter service is available at http:@www.lctn.u-nancy.fr/viseur/services.htm l.

  13. New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing

    PubMed Central

    Song, Kai; Ren, Jie; Reinert, Gesine; Deng, Minghua

    2014-01-01

    With the development of next-generation sequencing (NGS) technologies, a large amount of short read data has been generated. Assembly of these short reads can be challenging for genomes and metagenomes without template sequences, making alignment-based genome sequence comparison difficult. In addition, sequence reads from NGS can come from different regions of various genomes and they may not be alignable. Sequence signature-based methods for genome comparison based on the frequencies of word patterns in genomes and metagenomes can potentially be useful for the analysis of short reads data from NGS. Here we review the recent development of alignment-free genome and metagenome comparison based on the frequencies of word patterns with emphasis on the dissimilarity measures between sequences, the statistical power of these measures when two sequences are related and the applications of these measures to NGS data. PMID:24064230

  14. SDT: a virus classification tool based on pairwise sequence alignment and identity calculation.

    PubMed

    Muhire, Brejnev Muhizi; Varsani, Arvind; Martin, Darren Patrick

    2014-01-01

    The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV). There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT), a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences. SDT can produce publication quality pairwise identity plots and colour-coded distance matrices to further aid the classification of sequences according to ICTV approved taxonomic demarcation criteria. Besides a graphical interface version of the program for Windows computers, command-line versions of the program are available for a variety of different operating systems (including a parallel version for cluster computing platforms).

  15. Distributed sequence alignment applications for the public computing architecture.

    PubMed

    Pellicer, S; Chen, G; Chan, K C C; Pan, Y

    2008-03-01

    The public computer architecture shows promise as a platform for solving fundamental problems in bioinformatics such as global gene sequence alignment and data mining with tools such as the basic local alignment search tool (BLAST). Our implementation of these two problems on the Berkeley open infrastructure for network computing (BOINC) platform demonstrates a runtime reduction factor of 1.15 for sequence alignment and 16.76 for BLAST. While the runtime reduction factor of the global gene sequence alignment application is modest, this value is based on a theoretical sequential runtime extrapolated from the calculation of a smaller problem. Because this runtime is extrapolated from running the calculation in memory, the theoretical sequential runtime would require 37.3 GB of memory on a single system. With this in mind, the BOINC implementation not only offers the reduced runtime, but also the aggregation of the available memory of all participant nodes. If an actual sequential run of the problem were compared, a more drastic reduction in the runtime would be seen due to an additional secondary storage I/O overhead for a practical system. Despite the limitations of the public computer architecture, most notably in communication overhead, it represents a practical platform for grid- and cluster-scale bioinformatics computations today and shows great potential for future implementations.

  16. The impact of single substitutions on multiple sequence alignments.

    PubMed

    Klaere, Steffen; Gesell, Tanja; von Haeseler, Arndt

    2008-12-27

    We introduce another view of sequence evolution. Contrary to other approaches, we model the substitution process in two steps. First we assume (arbitrary) scaled branch lengths on a given phylogenetic tree. Second we allocate a Poisson distributed number of substitutions on the branches. The probability to place a mutation on a branch is proportional to its relative branch length. More importantly, the action of a single mutation on an alignment column is described by a doubly stochastic matrix, the so-called one-step mutation matrix. This matrix leads to analytical formulae for the posterior probability distribution of the number of substitutions for an alignment column.

  17. A guide to parallel execution of sequence alignment

    NASA Astrophysics Data System (ADS)

    Lauredo, Alexandre M.; Sena, Alexandre C.; de Castro, Maria Clicia S.; Leandro, Marzulo, A. J.

    2016-12-01

    Finding the longest common subsequence (LCS) is an important part of DNA sequence alignment. Through dynamic programming it is possible to find the exact solution to the LCS, with space and time complexity of O(m × n), being m e n the sequence sizes. Parallel algorithms are essential, since large sequences require too much time and memory to be processed sequentially. Thus, the aim of this work is to implement and evaluate different parallel solutions for distributed memory machines, so that the amount of memory is equally divided among the various processing nodes.

  18. Review of alignment and SNP calling algorithms for next-generation sequencing data.

    PubMed

    Mielczarek, M; Szyda, J

    2016-02-01

    Application of the massive parallel sequencing technology has become one of the most important issues in life sciences. Therefore, it was crucial to develop bioinformatics tools for next-generation sequencing (NGS) data processing. Currently, two of the most significant tasks include alignment to a reference genome and detection of single nucleotide polymorphisms (SNPs). In many types of genomic analyses, great numbers of reads need to be mapped to the reference genome; therefore, selection of the aligner is an essential step in NGS pipelines. Two main algorithms-suffix tries and hash tables-have been introduced for this purpose. Suffix array-based aligners are memory-efficient and work faster than hash-based aligners, but they are less accurate. In contrast, hash table algorithms tend to be slower, but more sensitive. SNP and genotype callers may also be divided into two main different approaches: heuristic and probabilistic methods. A variety of software has been subsequently developed over the past several years. In this paper, we briefly review the current development of NGS data processing algorithms and present the available software.

  19. QGRS-H Predictor: a web server for predicting homologous quadruplex forming G-rich sequence motifs in nucleotide sequences

    PubMed Central

    Menendez, Camille; Frees, Scott; Bagga, Paramjeet S.

    2012-01-01

    Naturally occurring G-quadruplex structural motifs, formed by guanine-rich nucleic acids, have been reported in telomeric, promoter and transcribed regions of mammalian genomes. G-quadruplex structures have received significant attention because of growing evidence for their role in important biological processes, human disease and as therapeutic targets. Lately, there has been much interest in the potential roles of RNA G-quadruplexes as cis-regulatory elements of post-transcriptional gene expression. Large-scale computational genomics studies on G-quadruplexes have difficulty validating their predictions without laborious testing in ‘wet’ labs. We have developed a bioinformatics tool, QGRS-H Predictor that can map and analyze conserved putative Quadruplex forming 'G'-Rich Sequences (QGRS) in mRNAs, ncRNAs and other nucleotide sequences, e.g. promoter, telomeric and gene flanking regions. Identifying conserved regulatory motifs helps validate computations and enhances accuracy of predictions. The QGRS-H Predictor is particularly useful for mapping homologous G-quadruplex forming sequences as cis-regulatory elements in the context of 5′- and 3′-untranslated regions, and CDS sections of aligned mRNA sequences. QGRS-H Predictor features highly interactive graphic representation of the data. It is a unique and user-friendly application that provides many options for defining and studying G-quadruplexes. The QGRS-H Predictor can be freely accessed at: http://quadruplex.ramapo.edu/qgrs/app/start. PMID:22576365

  20. Nucleotide sequence of a cloned woodchuck hepatitis virus genome: comparison with the hepatitis B virus sequence.

    PubMed Central

    Galibert, F; Chen, T N; Mandart, E

    1982-01-01

    The complete nucleotide sequence of a woodchuck hepatitis virus genome cloned in Escherichia coli was determined by the method of Maxam and Gilbert. This sequence was found to be 3,308 nucleotides long. Potential ATG initiator triplets and nonsense codons were identified and used to locate regions with a substantial coding capacity. A striking similarity was observed between the organization of human hepatitis B virus and woodchuck hepatitis virus. Nucleotide sequences of these open regions in the woodchuck virus were compared with corresponding regions present in hepatitis B virus. This allowed the location of four viral genes on the L strand and indicated the absence of protein coded by the S strand. Evolution rates of the various parts of the genome as well as of the four different proteins coded by hepatitis B virus and woodchuck hepatitis virus were compared. These results indicated that: (i) the core protein has evolved slightly less rapidly than the other proteins; and (ii) when a region of DNA codes for two different proteins, there is less freedom for the DNA to evolve and, moreover, one of the proteins can evolve more rapidly than the other. A hairpin structure, very well conserved in the two genomes, was located in the only region devoid of coding function, suggesting the location of the origin of replication of the viral DNA. Images PMID:7086958

  1. Complete nucleotide sequence of a monopartite Begomovirus and associated satellites infecting Carica papaya in Nepal.

    PubMed

    Shahid, M S; Yoshida, S; Khatri-Chhetri, G B; Briddon, R W; Natsuaki, K T

    2013-06-01

    Carica papaya (papaya) is a fruit crop that is cultivated mostly in kitchen gardens throughout Nepal. Leaf samples of C. papaya plants with leaf curling, vein darkening, vein thickening, and a reduction in leaf size were collected from a garden in Darai village, Rampur, Nepal in 2010. Full-length clones of a monopartite Begomovirus, a betasatellite and an alphasatellite were isolated. The complete nucleotide sequence of the Begomovirus showed the arrangement of genes typical of Old World begomoviruses with the highest nucleotide sequence identity (>99 %) to an isolate of Ageratum yellow vein virus (AYVV), confirming it as an isolate of AYVV. The complete nucleotide sequence of betasatellite showed greater than 89 % nucleotide sequence identity to an isolate of Tomato leaf curl Java betasatellite originating from Indonesian. The sequence of the alphasatellite displayed 92 % nucleotide sequence identity to Sida yellow vein China alphasatellite. This is the first identification of these components in Nepal and the first time they have been identified in papaya.

  2. Reconfigurable systems for sequence alignment and for general dynamic programming.

    PubMed

    Jacobi, Ricardo P; Ayala-Rincón, Mauricio; Carvalho, Luis G A; Llanos, Carlos H; Hartenstein, Reiner W

    2005-09-30

    Reconfigurable systolic arrays can be adapted to efficiently resolve a wide spectrum of computational problems; parallelism is naturally explored in systolic arrays and reconfigurability allows for redefinition of the interconnections and operations even during run time (dynamically). We present a reconfigurable systolic architecture that can be applied for the efficient treatment of several dynamic programming methods for resolving well-known problems, such as global and local sequence alignment, approximate string matching and longest common subsequence. The dynamicity of the reconfigurability was found to be useful for practical applications in the construction of sequence alignments. A VHDL (VHSIC hardware description language) version of this new architecture was implemented on an APEX FPGA (Field programmable gate array). It would be several magnitudes faster than the software algorithm alternatives.

  3. Optimizing Data Intensive GPGPU Computations for DNA Sequence Alignment

    PubMed Central

    Trapnell, Cole; Schatz, Michael C.

    2009-01-01

    MUMmerGPU uses highly-parallel commodity graphics processing units (GPU) to accelerate the data-intensive computation of aligning next generation DNA sequence data to a reference sequence for use in diverse applications such as disease genotyping and personal genomics. MUMmerGPU 2.0 features a new stackless depth-first-search print kernel and is 13× faster than the serial CPU version of the alignment code and nearly 4× faster in total computation time than MUMmerGPU 1.0. We exhaustively examined 128 GPU data layout configurations to improve register footprint and running time and conclude higher occupancy has greater impact than reduced latency. MUMmerGPU is available open-source at http://mummergpu.sourceforge.net. PMID:20161021

  4. Incremental Window-based Protein Sequence Alignment Algorithms

    DTIC Science & Technology

    2006-03-23

    Huzefa Rangwala and George Karypis March 23, 2006 Report Documentation Page Form ApprovedOMB No. 0704-0188 Public reporting burden for the collection of... Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 Incremental Window-based Protein Sequence Alignment Algorithms Huzefa Rangwala and George Karypis...Then it per- forms a series of iterations in which it performs the following three steps: First, it extracts from ’ the residue-pair with the highest

  5. On the Impact of Widening Vector Registers on Sequence Alignment

    SciTech Connect

    Daily, Jeffrey A.; Kalyanaraman, Anantharaman; Krishnamoorthy, Sriram; Ren, Bin

    2016-09-22

    Vector extensions, such as SSE, have been part of the x86 since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. In this paper, we demonstrate that the trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. We present a practically efficient SIMD implementation of a parallel scan based sequence alignment algorithm that can better exploit wider SIMD units. We conduct comprehensive workload and use case analyses to characterize the relative behavior of the striped and scan approaches and identify the best choice of algorithm based on input length and SIMD width.

  6. Sampling rare events: statistics of local sequence alignments.

    PubMed

    Hartmann, Alexander K

    2002-05-01

    A method to calculate probability distributions in regions where the events are very unlikely (e.g., p approximately 10(-40)) is presented. The basic idea is to map the underlying model on a physical system. The system is simulated at a low temperature, such that preferably configurations with originally low probabilities are generated. Since the distribution of such a physical system is known, the original unbiased distribution can be obtained. As an application, local alignment of protein sequences is studied. The deviation of the distribution p(S) of optimum scores from the extreme-value distribution is quantified. This deviation decreases with growing sequence length.

  7. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms

    PubMed Central

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450

  8. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    PubMed

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  9. Exploring Dance Movement Data Using Sequence Alignment Methods

    PubMed Central

    Chavoshi, Seyed Hossein; De Baets, Bernard; Neutens, Tijs; De Tré, Guy; Van de Weghe, Nico

    2015-01-01

    Despite the abundance of research on knowledge discovery from moving object databases, only a limited number of studies have examined the interaction between moving point objects in space over time. This paper describes a novel approach for measuring similarity in the interaction between moving objects. The proposed approach consists of three steps. First, we transform movement data into sequences of successive qualitative relations based on the Qualitative Trajectory Calculus (QTC). Second, sequence alignment methods are applied to measure the similarity between movement sequences. Finally, movement sequences are grouped based on similarity by means of an agglomerative hierarchical clustering method. The applicability of this approach is tested using movement data from samba and tango dancers. PMID:26181435

  10. Nucleotide sequences of the cylindrical inclusion protein genes of two Japanese zucchini yellow mosaic virus isolates.

    PubMed

    Kundu, A K; Ohshima, K; Sako, N; Yaegashi, H

    1999-02-01

    The nucleotide sequences of the cylindrical inclusion protein (CIP) genes of two Japanese zucchini yellow mosaic virus (ZYMV) isolates (ZYMV-169 and ZYMV-M) were determined. The CIP genes of both isolates comprised 1902 nucleotides and encoded 634 amino acids containing consensus nucleotide binding motif. The sequence similarities between the two isolates at the nucleotide and amino acid levels were 91% and 98%, respectively. When the CIP gene sequences of the Japanese ZYMV isolates were compared with those of previously reported ZYMV isolates, the nucleotide and amino acid sequence similarities ranged between 81% and 97%, and between 95% and 97%, respectively. Phylogenetic analysis of the deduced amino acid sequences of the CIP genes indicated that the Japanese ZYMV isolates were closely related to those of other ZYMV isolates.

  11. MACSIMS : multiple alignment of complete sequences information management system

    PubMed Central

    Thompson, Julie D; Muller, Arnaud; Waterhouse, Andrew; Procter, Jim; Barton, Geoffrey J; Plewniak, Frédéric; Poch, Olivier

    2006-01-01

    Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at . PMID:16792820

  12. Alignments of DNA and protein sequences containing frameshift errors.

    PubMed

    Guan, X; Uberbacher, E C

    1996-02-01

    Molecular sequences, like all experimental data, are subject to error. Many current DNA sequencing protocols have very significant error rates and often generate artefactual insertions and deletions of bases (indels) which corrupt the translation of sequences and compromise the detection of protein homologies. The impact of these errors on the utility of molecular sequence data is dependent on the analytic technique used to interpret the data. In the presence of frameshift errors, standard algorithms using six-frame translation can miss important homologies because only subfragments of the correct translation are available in any given frame. We present a new algorithm which can detect and correct frameshift errors in DNA sequences during comparison of translated sequences with protein sequences in the databases. This algorithm can recognize homologous proteins sharing 30% identity even in the presence of a 7% frameshift error rate. Our algorithm uses dynamic programming, producing a guaranteed optimal alignment in the presence of frameshifts, and has a sensitivity equivalent to Smith-Waterman. The computational efficiency of the algorithm is O(nm) where n and m are the sizes of two sequences being compared. The algorithm does not rely on prior knowledge or heuristic rules and performs significantly better than any previously reported method.

  13. The complete nucleotide sequence and genomic characterization of grapevine asteroid mosaic associated virus.

    PubMed

    Vargas-Asencio, José; Wojciechowska, Klaudia; Baskerville, Maia; Gomez, Annika L; Perry, Keith L; Thompson, Jeremy R

    2017-01-02

    In analyzing grapevine clones infected with grapevine red blotch associated virus, we identified a small number of isometric particles of approximately 30nm in diameter from an enriched fraction of leaf extract. A dominant protein of 25kDa was isolated from this fraction using SDS-PAGE and was identified by mass spectrometry as belonging to grapevine asteroid mosaic associated virus (GAMaV). Using a combination of three methods RNA-Seq, sRNA-Seq, and Sanger sequencing of RT- and RACE-PCR products, we obtained a full-length genome sequence consisting of 6719 nucleotides without the poly(A) tail. The virus possesses all of the typical conserved functional domains concordant with the genus Marafivirus and lies evolutionarily between citrus sudden death associated virus and oat blue dwarf virus. A large shift in RNA-Seq coverage coincided with the predicted location of the subgenomic RNA involved in coat protein (CP) expression. Genus wide sequence alignments confirmed the cleavage motif LxG(G/A) to be dominant between the helicase and RNA dependent RNA polymerase (RdRp), and the RdRp and CP domains. A putative overlapping protein (OP) ORF lacking a canonical translational start codon was identified with a reading frame context more consistent with the putative OPs of tymoviruses and fig fleck associated virus than with those of marafiviruses. BLAST analysis of the predicted GAMaV OP showed a unique relatedness to the OPs of members of the genus Tymovirus.

  14. Extracting protein alignment models from the sequence database.

    PubMed Central

    Neuwald, A F; Liu, J S; Lipman, D J; Lawrence, C E

    1997-01-01

    Biologists often gain structural and functional insights into a protein sequence by constructing a multiple alignment model of the family. Here a program called Probe fully automates this process of model construction starting from a single sequence. Central to this program is a powerful new method to locate and align only those, often subtly, conserved patterns essential to the family as a whole. When applied to randomly chosen proteins, Probe found on average about four times as many relationships as a pairwise search and yielded many new discoveries. These include: an obscure subfamily of globins in the roundworm Caenorhabditis elegans ; two new superfamilies of metallohydrolases; a lipoyl/biotin swinging arm domain in bacterial membrane fusion proteins; and a DH domain in the yeast Bud3 and Fus2 proteins. By identifying distant relationships and merging families into superfamilies in this way, this analysis further confirms the notion that proteins evolved from relatively few ancient sequences. Moreover, this method automatically generates models of these ancient conserved regions for rapid and sensitive screening of sequences. PMID:9108146

  15. Genetic algorithms with permutation coding for multiple sequence alignment.

    PubMed

    Ben Othman, Mohamed Tahar; Abdel-Azim, Gamil

    2013-08-01

    Multiple sequence alignment (MSA) is one of the topics of bio informatics that has seriously been researched. It is known as NP-complete problem. It is also considered as one of the most important and daunting tasks in computational biology. Concerning this a wide number of heuristic algorithms have been proposed to find optimal alignment. Among these heuristic algorithms are genetic algorithms (GA). The GA has mainly two major weaknesses: it is time consuming and can cause local minima. One of the significant aspects in the GA process in MSA is to maximize the similarities between sequences by adding and shuffling the gaps of Solution Coding (SC). Several ways for SC have been introduced. One of them is the Permutation Coding (PC). We propose a hybrid algorithm based on genetic algorithms (GAs) with a PC and 2-opt algorithm. The PC helps to code the MSA solution which maximizes the gain of resources, reliability and diversity of GA. The use of the PC opens the area by applying all functions over permutations for MSA. Thus, we suggest an algorithm to calculate the scoring function for multiple alignments based on PC, which is used as fitness function. The time complexity of the GA is reduced by using this algorithm. Our GA is implemented with different selections strategies and different crossovers. The probability of crossover and mutation is set as one strategy. Relevant patents have been probed in the topic.

  16. Identification and nucleotide sequence of the glycoprotein gB gene of equine herpesvirus 4.

    PubMed Central

    Riggio, M P; Cullinane, A A; Onions, D E

    1989-01-01

    The nucleotide sequence of the glycoprotein gB gene of equine herpesvirus 4 (EHV-4) was determined. The gene was located within a BamHI genomic library by a combination of Southern and dot-blot hybridization with probes derived from the herpes simplex virus type 1 (HSV-1) gB DNA sequence. The predominant portion of the coding sequences was mapped to a 2.95-kilobase BamHI-EcoRI subfragment at the left-hand end of BamHI-C. Potential TATA box, CAT box, and mRNA start site sequences and the translational initiation codon were located in the BamHI M fragment of the virus, which is located immediately to the left of BamHI-C. A polyadenylation signal, AATAAA, occurs nine nucleotides past the chain termination codon. Translation of these sequences would give a 110-kilodalton protein possessing a 5' hydrophobic signal sequence, a hydrophilic surface domain containing 11 potential N-linked glycosylation sites, a hydrophobic transmembrane domain, and a 3' highly charged cytoplasmic domain. A potential internal proteolytic cleavage site, Arg-Arg/Ser, was identified at residues 459 to 461. Analysis of this protein revealed amino acid sequence homologies of 47% with HSV-1 gB, 54% with pseudorabies virus gpII, 51% with varicella-zoster virus gpII, 29% with human cytomegalovirus gB, and 30% with Epstein-Barr virus gB. Alignment of EHV-4 gB with HSV-1 (KOS) gB further revealed that four potential N-linked glycosylation sites and all 10 cysteine residues on the external surface of the molecules are perfectly conserved, suggesting that the proteins possess similar secondary and tertiary structures. Thus, we showed that EHV-4 gB is highly conserved with the gB and gpII glycoproteins of other herpesviruses, suggesting that this glycoprotein has a similar overall function in each virus. Images PMID:2915378

  17. Genome-wide synteny through highly sensitive sequence alignment: Satsuma

    PubMed Central

    Grabherr, Manfred G.; Russell, Pamela; Meyer, Miriah; Mauceli, Evan; Alföldi, Jessica; Di Palma, Federica; Lindblad-Toh, Kerstin

    2010-01-01

    Motivation: Comparative genomics heavily relies on alignments of large and often complex DNA sequences. From an engineering perspective, the problem here is to provide maximum sensitivity (to find all there is to find), specificity (to only find real homology) and speed (to accommodate the billions of base pairs of vertebrate genomes). Results: Satsuma addresses all three issues through novel strategies: (i) cross-correlation, implemented via fast Fourier transform; (ii) a match scoring scheme that eliminates almost all false hits; and (iii) an asynchronous ‘battleship’-like search that allows for aligning two entire fish genomes (470 and 217 Mb) in 120 CPU hours using 15 processors on a single machine. Availability: Satsuma is part of the Spines software package, implemented in C++ on Linux. The latest version of Spines can be freely downloaded under the LGPL license from http://www.broadinstitute.org/science/programs/genome-biology/spines/ Contact: grabherr@broadinstitute.org PMID:20208069

  18. Accelerating Computation of DNA Sequence Alignment in Distributed Environment

    NASA Astrophysics Data System (ADS)

    Guo, Tao; Li, Guiyang; Deaton, Russel

    Sequence similarity and alignment are most important operations in computational biology. However, analyzing large sets of DNA sequence seems to be impractical on a regular PC. Using multiple threads with JavaParty mechanism, this project has successfully implemented in extending the capabilities of regular Java to a distributed environment for simulation of DNA computation. With the aid of JavaParty and the design of multiple threads, the results of this study demonstrated that the modified regular Java program could perform parallel computing without using RMI or socket communication. In this paper, an efficient method for modeling and comparing DNA sequences with dynamic programming and JavaParty was firstly proposed. Additionally, results of this method in distributed environment have been discussed.

  19. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  20. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  1. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  2. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  3. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  4. Training alignment parameters for arbitrary sequencers with LAST-TRAIN

    PubMed Central

    Ono, Yukiteru; Asai, Kiyoshi

    2017-01-01

    Abstract Summary: LAST-TRAIN improves sequence alignment accuracy by inferring substitution and gap scores that fit the frequencies of substitutions, insertions, and deletions in a given dataset. We have applied it to mapping DNA reads from IonTorrent and PacBio RS, and we show that it reduces reference bias for Oxford Nanopore reads. Availability and Implementation: the source code is freely available at http://last.cbrc.jp/ Contact: mhamada@waseda.jp or mcfrith@edu.k.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28039163

  5. Implied alignment: a synapomorphy-based multiple-sequence alignment method and its use in cladogram search

    NASA Technical Reports Server (NTRS)

    Wheeler, Ward C.

    2003-01-01

    A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiple-alignment step, thereby creating topology-specific, dynamic homology statements. Hence, no multiple-alignment is required to generate cladograms. Unlike general and globally optimal multiple-alignment procedures, the method described here, implied alignment (IA), takes these dynamic homologies and traces them back through a single cladogram, linking the unaligned sequence positions in the terminal taxa via DO transformation series. These "lines of correspondence" link ancestor-descendent states and, when displayed as linearly arrayed columns without hypothetical ancestors, are largely indistinguishable from standard multiple alignment. Since this method is based on synapomorphy, the treatment of certain classes of insertion-deletion (indel) events may be different from that of other alignment procedures. As with all alignment methods, results are dependent on parameter assumptions such as indel cost and transversion:transition ratios. Such an IA could be used as a basis for phylogenetic search, but this would be questionable since the homologies derived from the implied alignment depend on its natal cladogram and any variance, between DO and IA + Search, due to heuristic approach. The utility of this procedure in heuristic cladogram searches using DO and the improvement of heuristic cladogram cost calculations are discussed. c2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.

  6. Implied alignment: a synapomorphy-based multiple-sequence alignment method and its use in cladogram search.

    PubMed

    Wheeler, Ward C

    2003-06-01

    A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiple-alignment step, thereby creating topology-specific, dynamic homology statements. Hence, no multiple-alignment is required to generate cladograms. Unlike general and globally optimal multiple-alignment procedures, the method described here, implied alignment (IA), takes these dynamic homologies and traces them back through a single cladogram, linking the unaligned sequence positions in the terminal taxa via DO transformation series. These "lines of correspondence" link ancestor-descendent states and, when displayed as linearly arrayed columns without hypothetical ancestors, are largely indistinguishable from standard multiple alignment. Since this method is based on synapomorphy, the treatment of certain classes of insertion-deletion (indel) events may be different from that of other alignment procedures. As with all alignment methods, results are dependent on parameter assumptions such as indel cost and transversion:transition ratios. Such an IA could be used as a basis for phylogenetic search, but this would be questionable since the homologies derived from the implied alignment depend on its natal cladogram and any variance, between DO and IA + Search, due to heuristic approach. The utility of this procedure in heuristic cladogram searches using DO and the improvement of heuristic cladogram cost calculations are discussed.

  7. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega

    PubMed Central

    Sievers, Fabian; Wilm, Andreas; Dineen, David; Gibson, Toby J; Karplus, Kevin; Li, Weizhong; Lopez, Rodrigo; McWilliam, Hamish; Remmert, Michael; Söding, Johannes; Thompson, Julie D; Higgins, Desmond G

    2011-01-01

    Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam. PMID:21988835

  8. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

    PubMed

    Sievers, Fabian; Wilm, Andreas; Dineen, David; Gibson, Toby J; Karplus, Kevin; Li, Weizhong; Lopez, Rodrigo; McWilliam, Hamish; Remmert, Michael; Söding, Johannes; Thompson, Julie D; Higgins, Desmond G

    2011-10-11

    Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

  9. FASMA: a service to format and analyze sequences in multiple alignments.

    PubMed

    Costantini, Susan; Colonna, Giovanni; Facchiano, Angelo M

    2007-12-01

    Multiple sequence alignments are successfully applied in many studies for under- standing the structural and functional relations among single nucleic acids and protein sequences as well as whole families. Because of the rapid growth of sequence databases, multiple sequence alignments can often be very large and difficult to visualize and analyze. We offer a new service aimed to visualize and analyze the multiple alignments obtained with different external algorithms, with new features useful for the comparison of the aligned sequences as well as for the creation of a final image of the alignment. The service is named FASMA and is available at http://bioinformatica.isa.cnr.it/FASMA/.

  10. Choice of Reference Sequence and Assembler for Alignment of Listeria monocytogenes Short-Read Sequence Data Greatly Influences Rates of Error in SNP Analyses

    PubMed Central

    Pightling, Arthur W.; Petronella, Nicholas; Pagotto, Franco

    2014-01-01

    The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should

  11. Complete nucleotide sequence of a potyvirus causing maize dwarf mosaic disease in central China.

    PubMed

    Liu, X; Wang, X; Zhao, Y; Zheng, C; Zhou, G

    2003-01-01

    The full-length nucleotide sequence of a potyvirus causing the maize dwarf mosaic (MDM) disease in Henan province, central China, was obtained by reverse transcription-polymerase chain reaction (RT-PCR) and rapid amplification of the cDNA 5'-end (5'-RACE). The viral genome comprised of 9596 nucleotides except the polyA tail and encoded a putative polyprotein of 3603 amino acids. The entire genomic sequence of this isolate shared identities of 94.2% and 98.3% with Sugarcane mosaic virus (SCMV) HZ isolate at the nucleotide and deduced amino acid levels, respectively, but only a 69.1% identity with MDM virus (MDMV) Bulgarian isolate (MDMV-Bg) at the nucleotide level. Phylogenetical tree analysis of the complete nucleotide sequences indicated that the Henan isolate of a potyvirus causing MDM disease is in fact a Henan strain of SCMV (SCMV-HN).

  12. Performance evaluation of Warshall algorithm and dynamic programming for Markov chain in local sequence alignment.

    PubMed

    Khan, Mohammad Ibrahim; Kamal, Md Sarwar

    2015-03-01

    Markov Chain is very effective in prediction basically in long data set. In DNA sequencing it is always very important to find the existence of certain nucleotides based on the previous history of the data set. We imposed the Chapman Kolmogorov equation to accomplish the task of Markov Chain. Chapman Kolmogorov equation is the key to help the address the proper places of the DNA chain and this is very powerful tools in mathematics as well as in any other prediction based research. It incorporates the score of DNA sequences calculated by various techniques. Our research utilize the fundamentals of Warshall Algorithm (WA) and Dynamic Programming (DP) to measures the score of DNA segments. The outcomes of the experiment are that Warshall Algorithm is good for small DNA sequences on the other hand Dynamic Programming are good for long DNA sequences. On the top of above findings, it is very important to measure the risk factors of local sequencing during the matching of local sequence alignments whatever the length.

  13. Nucleotide sequence of the Lactococcus lactis NCDO 763 (ML3) rpoD gene.

    PubMed

    Gansel, X; Hartke, A; Boutibonnes, P; Auffray, Y

    1993-10-19

    The complete nucleotide sequence of rpoD gene from Lactococcus lactis has been determined. The nucleotide data have indicated the presence of an open reading frame of 1020 base pairs encoding a polypeptide which shares the framework structure for principal sigma factors of eubacteria strains.

  14. Nucleotide sequence of a lysine transfer ribonucleic Acid from bakers' yeast.

    PubMed

    Madison, J T; Boguslawski, S J; Teetor, G H

    1972-05-12

    The nucleotide sequence of one of the two major lysine transfer RNA's from bakers' yeast has been determined. Its structure is compared to that of a lysine tRNA from a haploid yeast. A total of 21 nucleotides differ in the two molecules. Only the T-psi-C-G (thymidine-pseudouridine-cytidine-guanosine) loop and its supporting stem are identical.

  15. Single nucleotide polymorphism mapping and alignment of recombinant chromosome substitution lines in barley.

    PubMed

    Sato, Kazuhiro; Close, Timothy J; Bhat, Prasanna; Muñoz-Amatriaín, María; Muehlbauer, Gary J

    2011-05-01

    Single nucleotide polymorphism (SNP) genotyping is useful for assessing genetic variation in germplasm collections, genetic map development and detection of alien chromosome substitutions. In this study, a diversity analysis using 1,301 SNPs on a set of 37 barley accessions was conducted. This analysis showed a high polymorphism rate between the malting barley cultivar 'Haruna Nijo' and the food barley cultivar 'Akashinriki'. Haruna Nijo and Akashinriki are donors of the barley expressed sequence tag (EST) collections. A doubled haploid (DH) population derived from the cross between Haruna Nijo and Akashinriki was genotyped with 1,448 SNPs. Of these 1,448 SNPs, 734 were polymorphic and distributed on barley linkage groups (chromosomes) as follows: 1H (86), 2H (125), 3H (120), 4H (100), 5H (127), 6H (88) and 7H (88). By using cMAP, we integrated the SNP markers across high-density maps. The SNPs were also used to genotype 98 BC(3)F(4) recombinant chromosome substitution lines (RCSLs) developed from the same cross (Haruna Nijo/Akashinriki). These data were used to create graphical genotypes for each line and thus estimate the location, extent and total number of introgressions from Akashinriki in the Haruna Nijo background. The 35 selected RCSLs sample most of the Akashinriki food barley genome, with only a few missing segments. These resources bring new alleles into the malting barley gene pool from food barley.

  16. Does protein relatedness require sequence matching? Alignment via networks in sequence space.

    PubMed

    Frenkel, Zakharia M

    2008-10-01

    To establish possible function of a newly discovered protein, alignment of its sequence with other known sequences is required. When the similarity is marginal, the function remains uncertain. A principally new approach is suggested: to use networks in the protein sequence space. The functionality of the protein is firmly established via networks forming chains of consecutive pair-wise matching fragments. The distant relatives are, thus, considered as relatives, though in some cases, there is even no sequence match between the ends of the chain, while the entire chain belongs to the same functional and structural network.

  17. Variation in the nucleotide sequence of a prolamin gene family in wild rice.

    PubMed

    Barbier, P; Ishihama, A

    1990-07-01

    Variation in the DNA sequence of the 10 kDa prolamin gene family within the wild rice species Oryza rufipogon was probed using the direct sequencing of PCR-amplified genes. A comparison of the nucleotide and deduced amino-acid sequences of eight Asian strains of O. rufipogon and one strain of the related African species O. longistaminata is presented.

  18. MSA-PAD: DNA multiple sequence alignment framework based on PFAM accessed domain information.

    PubMed

    Balech, Bachir; Vicario, Saverio; Donvito, Giacinto; Monaco, Alfonso; Notarangelo, Pasquale; Pesole, Graziano

    2015-08-01

    Here we present the MSA-PAD application, a DNA multiple sequence alignment framework that uses PFAM protein domain information to align DNA sequences encoding either single or multiple protein domains. MSA-PAD has two alignment options: gene and genome mode.

  19. Finding the right coverage: the impact of coverage and sequence quality on single nucleotide polymorphism genotyping error rates.

    PubMed

    Fountain, Emily D; Pauli, Jonathan N; Reid, Brendan N; Palsbøll, Per J; Peery, M Zachariah

    2016-07-01

    Restriction-enzyme-based sequencing methods enable the genotyping of thousands of single nucleotide polymorphism (SNP) loci in nonmodel organisms. However, in contrast to traditional genetic markers, genotyping error rates in SNPs derived from restriction-enzyme-based methods remain largely unknown. Here, we estimated genotyping error rates in SNPs genotyped with double digest RAD sequencing from Mendelian incompatibilities in known mother-offspring dyads of Hoffman's two-toed sloth (Choloepus hoffmanni) across a range of coverage and sequence quality criteria, for both reference-aligned and de novo-assembled data sets. Genotyping error rates were more sensitive to coverage than sequence quality and low coverage yielded high error rates, particularly in de novo-assembled data sets. For example, coverage ≥5 yielded median genotyping error rates of ≥0.03 and ≥0.11 in reference-aligned and de novo-assembled data sets, respectively. Genotyping error rates declined to ≤0.01 in reference-aligned data sets with a coverage ≥30, but remained ≥0.04 in the de novo-assembled data sets. We observed approximately 10- and 13-fold declines in the number of loci sampled in the reference-aligned and de novo-assembled data sets when coverage was increased from ≥5 to ≥30 at quality score ≥30, respectively. Finally, we assessed the effects of genotyping coverage on a common population genetic application, parentage assignments, and showed that the proportion of incorrectly assigned maternities was relatively high at low coverage. Overall, our results suggest that the trade-off between sample size and genotyping error rates be considered prior to building sequencing libraries, reporting genotyping error rates become standard practice, and that effects of genotyping errors on inference be evaluated in restriction-enzyme-based SNP studies.

  20. Complete nucleotide sequence of the 23S rRNA gene of the Cyanobacterium, Anacystis nidulans.

    PubMed Central

    Douglas, S E; Doolittle, W F

    1984-01-01

    The nucleotide sequence of the Anacystis nidulans 23S rRNA gene, including the 5'- and 3'-flanking regions has been determined. The gene is 2876 nucleotides long and shows higher primary sequence homology to the 23S rRNAs of plastids (84.5%) than to that of E. coli (79%). The predicted rRNA transcript also shares many secondary structural features with those of plastids, reinforcing the endosymbiont hypothesis for the origin of these organelles. PMID:6326060

  1. Statistical analysis of nucleotide sequences of the hemagglutinin gene of human influenza A viruses.

    PubMed Central

    Ina, Y; Gojobori, T

    1994-01-01

    To examine whether positive selection operates on the hemagglutinin 1 (HA1) gene of human influenza A viruses (H1 subtype), 21 nucleotide sequences of the HA1 gene were statistically analyzed. The nucleotide sequences were divided into antigenic and nonantigenic sites. The nucleotide diversities for antigenic and nonantigenic sites of the HA1 gene were computed at synonymous and nonsynonymous sites separately. For nonantigenic sites, the nucleotide diversities were larger at synonymous sites than at nonsynonymous sites. This is consistent with the neutral theory of molecular evolution. For antigenic sites, however, the nucleotide diversities at nonsynonymous sites were larger than those at synonymous sites. These results suggest that positive selection operates on antigenic sites of the HA1 gene of human influenza A viruses (H1 subtype). PMID:8078892

  2. RNA Secondary Structures Having a Compatible Sequence of Certain Nucleotide Ratios.

    PubMed

    Barrett, Christopher L; Li, Thomas J X; Reidys, Christian M

    2016-11-01

    Given a random RNA secondary structure, S, we study RNA sequences having fixed ratios of nucleotides that are compatible with S. We perform this analysis for RNA secondary structures subject to various base-pairing rules and minimum arc- and stack-length restrictions. Our main result reads as follows: in the simplex of nucleotide ratios, there exists a convex region, in which, in the limit of long sequences, a random structure asymptotically almost surely (a.a.s.) has compatible sequence with these ratios and outside of which a.a.s. a random structure has no such compatible sequence. We localize this region for RNA secondary structures subject to various base-pairing rules and minimum arc- and stack-length restrictions. In particular, for GC-sequences (GC denoting the nucleotides guanine and cytosine, respectively) having a ratio of G nucleotides smaller than 1/3, a random RNA secondary structure without any minimum arc- and stack-length restrictions has a.a.s. no such compatible sequence. For sequences having a ratio of G nucleotides larger than 1/3, a random RNA secondary structure has a.a.s. such compatible sequences. We discuss our results in the context of various families of RNA structures.

  3. Nucleotide sequence of Neurospora crassa cytoplasmic initiator tRNA.

    PubMed Central

    Gillum, A M; Hecker, L I; Silberklang, M; Schwartzbach, S D; RajBhandary, U L; Barnett, W E

    1977-01-01

    Initiator methionine tRNA from the cytoplasm of Neurospora crassa has been purified and sequenced. The sequence is: pAGCUGCAUm1GGCGCAGCGGAAGCGCM22GCY*GGGCUCAUt6AACCCGGAGm7GU (or D) - CACUCGAUCGm1AAACGAG*UUGCAGCUACCAOH. Similar to initiator tRNAs from the cytoplasm of other eukaryotes, this tRNA also contains the sequence -AUCG- instead of the usual -TphiCG (or A)- found in loop IV of other tRNAs. The sequence of the N. crassa cytoplasmic initiator tRNA is quite different from that of the corresponding mitochondrial initiator tRNA. Comparison of the sequence of N. crassa cytoplasmic initiator tRNA to those of yeast, wheat germ and vertebrate cytoplasmic initiator tRNA indicates that the sequences of the two fungal tRNAs are no more similar to each other than they are to those of other initiator tRNAs. Images PMID:146192

  4. Cloning and nucleotide sequence of the aroA gene of Bordetella pertussis.

    PubMed Central

    Maskell, D J; Morrissey, P; Dougan, G

    1988-01-01

    The aroA locus of Bordetella pertussis, encoding 5-enolpyruvylshikimate 3-phosphate synthase, has been cloned into Escherichia coli by using a cosmid vector. The gene is expressed in E. coli and complemented an E. coli aroA mutant. The nucleotide sequence of the B. pertussis aroA gene was determined and contains an open reading frame encoding 442 amino acids, with a calculated molecular weight for 5-enolpyruvylshikimate 3-phosphate synthase of 46,688. The amino acid sequence derived from the nucleotide sequence shows homology with the published amino acid sequences of aroA gene products of other microorganisms. PMID:2897356

  5. SeqAPASS: Sequence alignment to predict across-species ...

    EPA Pesticide Factsheets

    Efforts to shift the toxicity testing paradigm from whole organism studies to those focused on the initiation of toxicity and relevant pathways have led to increased utilization of in vitro and in silico methods. Hence the emergence of high through-put screening (HTS) programs, such as U.S. EPA ToxCast, and application of the adverse outcome pathway (AOP) framework for identifying and defining biological key events triggered upon perturbation of molecular initiating events and leading to adverse outcomes occuring at a level of organization relevant for risk assessment [1]. With these recent initiatives to harness the power of “the pathway” in describing and evaluating toxicity comes the need to extrapolate data beyond the model species. Sequence alignment to predict across-species susceptibilty (SeqAPASS) is a web-based tool that allows the user to begin to understand how broadly HTS data or AOP constructs may plausibly be extrapolated across species, while describing the relative intrinsic susceptibiltiy of different taxa to chemicals with known modes of action (e.g., pharmaceuticals and pesticides). The tool rapidly and strategically assesses available molecular target information to describe protein sequence similarity at the primary amino acid sequence, conserved domain, and individual amino acid residue levels. This in silico approach to species extrapolation was designed to automate and streamline the relatively complex and time-consuming process of co

  6. Kraken: ultrafast metagenomic sequence classification using exact alignments

    PubMed Central

    2014-01-01

    Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at http://ccb.jhu.edu/software/kraken/. PMID:24580807

  7. Isolation and complete nucleotide sequence of the measles virus IMB-1 strain in China.

    PubMed

    Ma, Shao-hui; Wang, Li-chun; Liu, Jian-sheng; Shi, Hai-jing; Liu, Long-ding; Li, Qi-han

    2010-12-01

    The complete nucleotide sequence of the measles virus strain IMB-1, which was isolated in China, was determined. As in other measles viruses, its genome is 15,894 nucleotides in length and encodes six proteins. The full-length nucleotide sequence of the IMB-1 isolate differed from vaccine strains (including wild-type Edmonston strain) by 4%-5% at the nucleotide sequence level. This isolate has amino acid variations over the full genome, including in the hemagglutinin and fusion genes. This report is the first to describe the full-length genome of a genotype H1 strain and provide an overview of the diversity of genetic characteristics of a circulating measles virus.

  8. Analysing the performance of personal computers based on Intel microprocessors for sequence aligning bioinformatics applications.

    PubMed

    Nair, Pradeep S; John, Eugene B

    2007-01-01

    Aligning specific sequences against a very large number of other sequences is a central aspect of bioinformatics. With the widespread availability of personal computers in biology laboratories, sequence alignment is now often performed locally. This makes it necessary to analyse the performance of personal computers for sequence aligning bioinformatics benchmarks. In this paper, we analyse the performance of a personal computer for the popular BLAST and FASTA sequence alignment suites. Results indicate that these benchmarks have a large number of recurring operations and use memory operations extensively. It seems that the performance can be improved with a bigger L1-cache.

  9. Nucleotide sequence and genetic organization of Hungarian grapevine chrome mosaic nepovirus RNA2.

    PubMed Central

    Brault, V; Hibrand, L; Candresse, T; Le Gall, O; Dunez, J

    1989-01-01

    The complete nucleotide sequence of hungarian grapevine chrome mosaic nepovirus (GCMV) RNA2 has been determined. The RNA sequence is 4441 nucleotides in length, excluding the poly(A) tail. A polyprotein of 1324 amino acids with a calculated molecular weight of 146 kDa is encoded in a single long open reading frame extending from nucleotides 218 to 4190. This polyprotein is homologous with the protein encoded by the S strain of tomato black ring virus (TBRV) RNA2, the only other nepovirus sequenced so far. Direct sequencing of the viral coat protein and in vitro translation of transcripts derived from cDNA sequences demonstrate that, as for comoviruses, the coat protein is located at the carboxy terminus of the polyprotein. A model for the expression of GCMV RNA2 is presented. Images PMID:2798129

  10. Insertion sites and the terminal nucleotide sequences of the Tn4 transposon.

    PubMed

    Hyde, D R; Tu, C P

    1982-07-10

    The nucleotide sequences at the ends of the Tn4 transposon (mercury spectinomycin and sulfonamide resistance) have been determined. They are inverted repeated sequences of 38 nucleotides with three mismatched base pairs. These sequences are strongly homologous with the terminal sequences of Tn501 (mercury resistance) but less so with those of Tn3 (ampicillin resistance). The Tn4 transposon generates pentanucleotide members (Tn3, Tn1000, Tn501, Tn551, IS2) with the exception of Tn1721 and bacteriophage Mu. Among the three Tn4 insertion sites examined here, two of them occurred near a nonanucleotide sequence in perfect homology with part of the terminal inverted-repeat sequence of Tn4 and the third insertion occurred near a sequence of partial homology to one end of Tn4. All three insertions were in the same orientation such that IRb is proximal to its homologous sequence on the recipient DNA.

  11. Alignment-free sequence comparison based on next-generation sequencing reads.

    PubMed

    Song, Kai; Ren, Jie; Zhai, Zhiyuan; Liu, Xuemei; Deng, Minghua; Sun, Fengzhu

    2013-02-01

    Next-generation sequencing (NGS) technologies have generated enormous amounts of shotgun read data, and assembly of the reads can be challenging, especially for organisms without template sequences. We study the power of genome comparison based on shotgun read data without assembly using three alignment-free sequence comparison statistics, D(2), D(*)(2) and D(s)(2), both theoretically and by simulations. Theoretical formulas for the power of detecting the relationship between two sequences related through a common motif model are derived. It is shown that both D(*)(2) and D(s)(2), outperform D(2) for detecting the relationship between two sequences based on NGS data. We then study the effects of length of the tuple, read length, coverage, and sequencing error on the power of D(*)(2) and D(s)(2). Finally, variations of these statistics, d(2), d(*)(2) and d(s)(2), respectively, are used to first cluster five mammalian species with known phylogenetic relationships, and then cluster 13 tree species whose complete genome sequences are not available using NGS shotgun reads. The clustering results using d(s)(2) are consistent with biological knowledge for the 5 mammalian and 13 tree species, respectively. Thus, the statistic d(s)(2) provides a powerful alignment-free comparison tool to study the relationships among different organisms based on NGS read data without assembly.

  12. Fast single-pass alignment and variant calling using sequencing data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sequencing research requires efficient computation. Few programs use already known information about DNA variants when aligning sequence data to the reference map. New program findmap.f90 reads the previous variant list before aligning sequence, calling variant alleles, and summing the allele counts...

  13. Nucleotide sequence of the luxC gene encoding fatty acid reductase of the lux operon from Photobacterium leiognathi.

    PubMed

    Lin, J W; Chao, Y F; Weng, S F

    1993-02-26

    The nucleotide sequence of the luxC gene (EMBL Accession No. 65156) encoding fatty acid reductase (FAR) of the lux operon from Photobacterium leiognathi PL741 was determined and the encoded amino acid sequence deduced. The fatty acid reductase is a component of the fatty acid reductase complex. The complex is responsible for converting fatty acid to aldehyde which serves as the substrate in the luciferase-catalyzed bioluminescent reaction. The protein comprises 478 amino acid residues and has a calculated M(r) of 53,858. Alignment and comparison of the fatty acid reductase of P. leiognathi with that of Vibrio harveyi B392 and Vibrio fischeri ATCC 7744 shows that there is 70% and 59% amino acid residues identity, respectively.

  14. Complete nucleotide sequences of a distinct bipartite begomovirus, bitter gourd yellow vein virus, infecting Momordica charantia.

    PubMed

    Tahir, Muhammad; Haider, Muhammad Saleem; Briddon, Rob W

    2010-11-01

    Momordica charantia (Cucurbitaceae), a vegetable crop commonly cultivated throughout Pakistan, and begomoviruses, a serious threat to crop plants, are natives of tropical and subtropical regions of the world. Leaf samples of M. charantia with yellow vein symptoms typical of begomovirus infections and samples from apparently healthy plants were collected from areas around Lahore in 2004. Full-length clones of a bipartite begomovirus were isolated from symptomatic samples. The complete nucleotide sequences of the components of one isolate were determined, and these showed the arrangement of genes typical of Old World begomoviruses. The complete nucleotides sequence of DNA A showed the highest nucleotide sequence identity (86.9%) to an isolate of Tomato leaf curl New Delhi virus (ToLCNDV), confirming it to belong to a distinct species of begomovirus, for which the name Bitter gourd yellow vein virus (BGYVV) is proposed. Sequence comparisons showed that BGYVV likely emerged as a result of inter-specific recombination between ToLCNDV and tomato leaf curl Bangladesh virus (ToLCBDV). The complete nucleotide sequence of DNA B showed 97.2% nucleotide sequence identity to that of an Indian strain of Squash leaf curl China virus.

  15. Multi-Harmony: detecting functional specificity from sequence alignment.

    PubMed

    Brandt, Bernd W; Feenstra, K Anton; Heringa, Jaap

    2010-07-01

    Many protein families contain sub-families with functional specialization, such as binding different ligands or being involved in different protein-protein interactions. A small number of amino acids generally determine functional specificity. The identification of these residues can aid the understanding of protein function and help finding targets for experimental analysis. Here, we present multi-Harmony, an interactive web sever for detecting sub-type-specific sites in proteins starting from a multiple sequence alignment. Combining our Sequence Harmony (SH) and multi-Relief (mR) methods in one web server allows simultaneous analysis and comparison of specificity residues; furthermore, both methods have been significantly improved and extended. SH has been extended to cope with more than two sub-groups. mR has been changed from a sampling implementation to a deterministic one, making it more consistent and user friendly. For both methods Z-scores are reported. The multi-Harmony web server produces a dynamic output page, which includes interactive connections to the Jalview and Jmol applets, thereby allowing interactive analysis of the results. Multi-Harmony is available at http://www.ibi.vu.nl/ programs/shmrwww.

  16. Diverse nucleotide compositions and sequence fluctuation in Rubisco protein genes

    NASA Astrophysics Data System (ADS)

    Holden, Todd; Dehipawala, S.; Cheung, E.; Bienaime, R.; Ye, J.; Tremberger, G., Jr.; Schneider, P.; Lieberman, D.; Cheung, T.

    2011-10-01

    The Rubisco protein-enzyme is arguably the most abundance protein on Earth. The biology dogma of transcription and translation necessitates the study of the Rubisco genes and Rubisco-like genes in various species. Stronger correlation of fractal dimension of the atomic number fluctuation along a DNA sequence with Shannon entropy has been observed in the studied Rubisco-like gene sequences, suggesting a more diverse evolutionary pressure and constraints in the Rubisco sequences. The strategy of using metal for structural stabilization appears to be an ancient mechanism, with data from the porphobilinogen deaminase gene in Capsaspora owczarzaki and Monosiga brevicollis. Using the chi-square distance probability, our analysis supports the conjecture that the more ancient Rubisco-like sequence in Microcystis aeruginosa would have experienced very different evolutionary pressure and bio-chemical constraint as compared to Bordetella bronchiseptica, the two microbes occupying either end of the correlation graph. Our exploratory study would indicate that high fractal dimension Rubisco sequence would support high carbon dioxide rate via the Michaelis- Menten coefficient; with implication for the control of the whooping cough pathogen Bordetella bronchiseptica, a microbe containing a high fractal dimension Rubisco-like sequence (2.07). Using the internal comparison of chi-square distance probability for 16S rRNA (~ E-22) versus radiation repair Rec-A gene (~ E-05) in high GC content Deinococcus radiodurans, our analysis supports the conjecture that high GC content microbes containing Rubisco-like sequence are likely to include an extra-terrestrial origin, relative to Deinococcus radiodurans. Similar photosynthesis process that could utilize host star radiation would not compete with radiation resistant process from the biology dogma perspective in environments such as Mars and exoplanets.

  17. Nucleotide sequence of the Agrobacterium tumefaciens octopine Ti plasmid-encoded tmr gene.

    PubMed Central

    Heidekamp, F; Dirkse, W G; Hille, J; van Ormondt, H

    1983-01-01

    The nucleotide sequence of the tmr gene, encoded by the octopine Ti plasmid from Agrobacterium tumefaciens (pTiAch5), was determined. The T-DNA, which encompasses this gene, is involved in tumor formation and maintenance, and probably mediates the cytokinin-independent growth of transformed plant cells. The nucleotide sequence of the tmr gene displays a continuous open reading frame specifying a polypeptide chain of 240 amino acids. The 5'- terminus of the polyadenylated tmr mRNA isolated from octopine tobacco tumor cell lines was determined by nuclease S1 mapping. The nucleotide sequence 5'-TATAAAA-3', which sequence is identical to the canonical "TATA" box, was found 29 nucleotides upstream from the major initiation site for RNA synthesis. Two potential polyadenylation signals 5'-AATAAA-3' were found at 207 and 275 nucleotides downstream from the TAG stopcodon of the tmr gene. A comparison was made of nucleotide stretches, involved in transcription control of T-DNA genes. Images PMID:6312414

  18. Phylogenetic analysis of beta-papillomaviruses as inferred from nucleotide and amino acid sequence data.

    PubMed

    Gottschling, Marc; Köhler, Anja; Stockfleth, Eggert; Nindl, Ingo

    2007-01-01

    Human papillomaviruses (HPV) of the beta-group seem to be involved in the pathogenesis of non-melanoma skin cancer. Papillomaviruses are host specific and are considered closely co-evolving with their hosts. Evolutionary incongruence between early genes and late genes has been reported among oncogenic genital alpha-papillomaviruses and considerably challenge phylogenetic reconstructions. We investigated the relationships of 29 beta-HPV (25 types plus four putative new types, subtypes, or variants) as inferred from codon aligned and amino acid sequence data of the genes E1, E2, E6, E7, L1, and L2 using likelihood, distance, and parsimony approaches. An analysis of a L1 fragment included additional nucleotide and amino acid sequences from seven non-human beta-papillomaviruses. Early genes and late genes evolution did not conflict significantly in beta-papillomaviruses based on partition homogeneity tests (p > or = 0.001). As inferred from the complete genome analyses, beta-papillomaviruses were monophyletic and segregated into four highly supported monophyletic assemblages corresponding to the species 1, 2, 3, and fused 4/5. They basically split into the species 1 and the remainder of beta-papillomaviruses, whose species 3, 4, and 5 constituted the sistergroup of species 2. beta-Papillomaviruses have been isolated from humans, apes, and monkeys, and phylogenetic analyses of the L1 fragment showed non-human papillomaviruses highly polyphyletic nesting within the HPV species. Thus, host and virus phylogenies were not congruent in beta-papillomaviruses, and multiple invasions across species borders may contribute (additionally to host-linked evolution) to their diversification.

  19. The nucleotide sequence of tomato mottle virus, a new geminivirus isolated from tomatoes in Florida.

    PubMed

    Abouzid, A M; Polston, J E; Hiebert, E

    1992-12-01

    A new geminivirus, tomato mottle virus (TMoV), affecting tomato production in Florida has been cloned and sequenced. Sequence analysis of the cloned replicative forms of TMoV revealed four potential coding regions for the A component [2601 nucleotides (nt)] and two for the B component (2541 nt). Comparisons of the nucleotide sequence of the TMoV genome with those of other whitefly-transmitted geminiviruses indicate that TMoV is a typical bipartite geminivirus of the New World and is closely related to but distinct from abutilon mosaic virus.

  20. Constructing sequence alignments from a Markov decision model with estimated parameter values.

    PubMed

    Hunt, Fern Y; Kearsley, Anthony J; O'Gallagher, Agnes

    2004-01-01

    Current methods for aligning biological sequences are based on dynamic programming algorithms. If large numbers of sequences or a number of long sequences are to be aligned, the required computations are expensive in memory and central processing unit (CPU) time. In an attempt to bring the tools of large-scale linear programming (LP) methods to bear on this problem, we formulate the alignment process as a controlled Markov chain and construct a suggested alignment based on policies that minimise the expected total cost of the alignment. We discuss the LP associated with the total expected discounted cost and show the results of a solution of the problem based on a primal-dual interior point method. Model parameters, estimated from aligned sequences, along with cost function parameters are used to construct the objective and constraint conditions of the LP problem. This article concludes with a discussion of some alignments obtained from the LP solutions of problems with various cost function parameter values.

  1. Nucleotide sequences of 5S rRNAs from four jellyfishes.

    PubMed

    Hori, H; Ohama, T; Kumazaki, T; Osawa, S

    1982-11-25

    The nucleotide sequences of 5S rRNAs from four jellyfishes, Spirocodon saltatrix, Nemopsis dofleini, Aurelia aurita and Chrysaora quinquecirrha have been determined. The sequences are highly similar to each other. A fairly high similarity was also found between these jellyfishes and a sea anemone, Anthopleura japonica.

  2. Mayaro virus: complete nucleotide sequence and phylogenetic relationships with other alphaviruses.

    PubMed

    Lavergne, Anne; de Thoisy, Benoît; Lacoste, Vincent; Pascalis, Hervé; Pouliquen, Jean-François; Mercier, Véronique; Tolou, Hugues; Dussart, Philippe; Morvan, Jacques; Talarmin, Antoine; Kazanji, Mirdad

    2006-05-01

    Mayaro (MAY) virus is a member of the genus Alphavirus in the family Togaviridae. Alphaviruses are distributed throughout the world and cause a wide range of diseases in humans and animals. Here, we determined the complete nucleotide sequence of MAY from a viral strain isolated from a French Guianese patient. The deduced MAY genome was 11,429 nucleotides in length, excluding the 5' cap nucleotide and 3' poly(A) tail. Nucleotide and amino acid homologies, as well as phylogenetic analyses of the obtained sequence confirmed that MAY is not a recombinant virus and belongs to the Semliki Forest complex according to the antigenic complex classification. Furthermore, analyses based on the E1 region revealed that MAY is closely related to Una virus, the only other South American virus clustering with the Old World viruses. On the basis of our results and of the alphaviruses diversity and pathogenicity, we suggest that alphaviruses may have an Old World origin.

  3. Nucleotide sequence conservation in paramyxoviruses; the concept of codon constellation.

    PubMed

    Rima, Bert K

    2015-05-01

    The stability and conservation of the sequences of RNA viruses in the field and the high error rates measured in vitro are paradoxical. The field stability indicates that there are very strong selective constraints on sequence diversity. The nature of these constraints is discussed. Apart from constraints on variation in cis-acting RNA and the amino acid sequences of viral proteins, there are other ones relating to the presence of specific dinucleotides such CpG and UpA as well as the importance of RNA secondary structures and RNA degradation rates. Recent other constraints identified in other RNA viruses, such as effects of secondary RNA structure on protein folding or modification of cellular tRNA complements, are also discussed. Using the family Paramyxoviridae, I show that the codon usage pattern (CUP) is (i) specific for each virus species and (ii) that it is markedly different from the host - it does not vary even in vaccine viruses that have been derived by passage in a number of inappropriate host cells. The CUP might thus be an additional constraint on variation, and I propose the concept of codon constellation to indicate the informational content of the sequences of RNA molecules relating not only to stability and structure but also to the efficiency of translation of a viral mRNA resulting from the CUP and the numbers and position of rare codons.

  4. Nucleotide sequence of a human tRNA gene heterocluster

    SciTech Connect

    Chang, Y.N.; Pirtle, I.L.; Pirtle, R.M.

    1986-05-01

    Leucine tRNA from bovine liver was used as a hybridization probe to screen a human gene library harbored in Charon-4A of bacteriophage lambda. The human DNA inserts from plaque-pure clones were characterized by restriction endonuclease mapping and Southern hybridization techniques, using both (3'-/sup 32/P)-labeled bovine liver leucine tRNA and total tRNA as hybridization probes. An 8-kb Hind III fragment of one of these ..gamma..-clones was subcloned into the Hind III site of pBR322. Subsequent fine restriction mapping and DNA sequence analysis of this plasmid DNA indicated the presence of four tRNA genes within the 8-kb DNA fragment. A leucine tRNA gene with an anticodon of AAG and a proline tRNA gene with an anticodon of AGG are in a 1.6-kb subfragment. A threonine tRNA gene with an anticodon of UGU and an as yet unidentified tRNA gene are located in a 1.1-kb subfragment. These two different subfragments are separated by 2.8 kb. The coding regions of the three sequenced genes contain characteristic internal split promoter sequences and do not have intervening sequences. The 3'-flanking region of these three genes have typical RNA polymerase III termination sites of at least four consecutive T residues.

  5. Methods for making nucleotide probes for sequencing and synthesis

    DOEpatents

    Church, George M; Zhang, Kun; Chou, Joseph

    2014-07-08

    Compositions and methods for making a plurality of probes for analyzing a plurality of nucleic acid samples are provided. Compositions and methods for analyzing a plurality of nucleic acid samples to obtain sequence information in each nucleic acid sample are also provided.

  6. Nucleotide sequence and taxonomical distribution of the bacteriocin gene lin cloned from Brevibacterium linens M18.

    PubMed

    Valdes-Stauber, N; Scherer, S

    1996-04-01

    Linocin M18 is an antilisterial bacteriocin produced by the red smear cheese bacterium Brevibacterium linens M18. Oligonucleotide probes based on the N-terminal amino acid sequence were used to locate its single copy gene, lin, on the chromosomal DNA. The amino acid composition, N-terminal sequence, and molecular mass derived from the nucleotide sequence of an open reading frame of 798 nucleotides coding for 266 amino acids found on a 3-kb BamHI restriction fragment correspond closely to those obtained from the purified protein (N. Valdés-Stauber and S. Scherer, Appl. Environ. Microbiol. 60:3809-3814, 1994). No sequence homology to any protein or nucleotide sequences deposited in databases was found. Comparison of the nucleotide sequence and the N-terminal amino acid sequence derived from the protein suggests that B. linens M18 produces an N-formyl-methionyl-CAC tRNA. A wide taxonomical distribution of the gene within coryneform bacteria has been demonstrated by PCR amplification. The structural gene from linocin M18 is present at least in three Brevibacterium species, five Arthrobacter species, and five Corynebacterium species.

  7. Clusters of nucleotide substitutions and insertion/deletion mutations are associated with repeat sequences.

    PubMed

    McDonald, Michael J; Wang, Wei-Chi; Huang, Hsien-Da; Leu, Jun-Yi

    2011-06-01

    The genome-sequencing gold rush has facilitated the use of comparative genomics to uncover patterns of genome evolution, although their causal mechanisms remain elusive. One such trend, ubiquitous to prokarya and eukarya, is the association of insertion/deletion mutations (indels) with increases in the nucleotide substitution rate extending over hundreds of base pairs. The prevailing hypothesis is that indels are themselves mutagenic agents. Here, we employ population genomics data from Escherichia coli, Saccharomyces paradoxus, and Drosophila to provide evidence suggesting that it is not the indels per se but the sequence in which indels occur that causes the accumulation of nucleotide substitutions. We found that about two-thirds of indels are closely associated with repeat sequences and that repeat sequence abundance could be used to identify regions of elevated sequence diversity, independently of indels. Moreover, the mutational signature of indel-proximal nucleotide substitutions matches that of error-prone DNA polymerases. We propose that repeat sequences promote an increased probability of replication fork arrest, causing the persistent recruitment of error-prone DNA polymerases to specific sequence regions over evolutionary time scales. Experimental measures of the mutation rates of engineered DNA sequences and analyses of experimentally obtained collections of spontaneous mutations provide molecular evidence supporting our hypothesis. This study uncovers a new role for repeat sequences in genome evolution and provides an explanation of how fine-scale sequence contextual effects influence mutation rates and thereby evolution.

  8. A direct method for computing extreme value (Gumbel) parameters for gapped biological sequence alignments.

    PubMed

    Quinn, Terrance; Sinkala, Zachariah

    2014-01-01

    We develop a general method for computing extreme value distribution (Gumbel, 1958) parameters for gapped alignments. Our approach uses mixture distribution theory to obtain associated BLOSUM matrices for gapped alignments, which in turn are used for determining significance of gapped alignment scores for pairs of biological sequences. We compare our results with parameters already obtained in the literature.

  9. Complete nucleotide sequence of Alfalfa mosaic virus isolated from alfalfa (Medicago sativa L.) in Argentina.

    PubMed

    Trucco, Verónica; de Breuil, Soledad; Bejerman, Nicolás; Lenardon, Sergio; Giolitti, Fabián

    2014-06-01

    The complete nucleotide sequence of an Alfalfa mosaic virus (AMV) isolate infecting alfalfa (Medicago sativa L.) in Argentina, AMV-Arg, was determined. The virus genome has the typical organization described for AMV, and comprises 3,643, 2,593, and 2,038 nucleotides for RNA1, 2 and 3, respectively. The whole genome sequence and each encoding region were compared with those of other four isolates that have been completely sequenced from China, Italy, Spain and USA. The nucleotide identity percentages ranged from 95.9 to 99.1 % for the three RNAs and from 93.7 to 99 % for the protein 1 (P1), protein 2 (P2), movement protein and coat protein (CP) encoding regions, whereas the amino acid identity percentages of these proteins ranged from 93.4 to 99.5 %, the lowest value corresponding to P2. CP sequences of AMV-Arg were compared with those of other 25 available isolates, and the phylogenetic analysis based on the CP gene was carried out. The highest percentage of nucleotide sequence identity of the CP gene was 98.3 % with a Chinese isolate and 98.6 % at the amino acid level with four isolates, two from Italy, one from Brazil and the remaining one from China. The phylogenetic analysis showed that AMV-Arg is closely related to subgroup I of AMV isolates. To our knowledge, this is the first report of a complete nucleotide sequence of AMV from South America and the first worldwide report of complete nucleotide sequence of AMV isolated from alfalfa as natural host.

  10. Nucleotide sequence of an Escherichia coli chromosomal hemolysin.

    PubMed Central

    Felmlee, T; Pellett, S; Welch, R A

    1985-01-01

    We determined the DNA sequence of an 8,211-base-pair region encompassing the chromosomal hemolysin, molecularly cloned from an O4 serotype strain of Escherichia coli. All four hemolysin cistrons (transcriptional order, C, A, B, and D) were encoded on the same DNA strand, and their predicted molecular masses were, respectively, 19.7, 109.8, 79.9, and 54.6 kilodaltons. The identification of pSF4000-encoded polypeptides in E. coli minicells corroborated the assignment of the predicted polypeptides for hlyC, hlyA, and hlyD. However, based on the minicell results, two polypeptides appeared to be encoded on the hlyB region, one similar in size to the predicted molecular mass of 79.9 kilodaltons, and the other a smaller 46-kilodalton polypeptide. The four hemolysin gene displayed similar codon usage, which is atypical for E. coli. This reflects the low guanine-plus-cytosine content (40.2%) of the hemolysin DNA sequence and suggests the non-E. coli origin of the hemolysin determinant. In vitro-derived deletions of the hemolysin recombinant plasmid pSF4000 indicated that a region between 433 and 301 base pairs upstream of the putative start of hlyC is necessary for hemolysin synthesis. Based on the DNA sequence, a stem-loop transcription terminator-like structure (a 16-base-pair stem followed by seven uridylates) in the mRNA was predicted distal to the C-terminal end of hlyA. A model for the general transcriptional organization of the E. coli hemolysin determinant is presented. Images PMID:3891743

  11. Complete nucleotide sequence of the polymerase 3 gene of human influenza virus A/WSN/33.

    PubMed Central

    Kaptein, J S; Nayak, D P

    1982-01-01

    The complete nucleotide sequence of polymerase 3 (P3) gene of a human influenza virus (A/WSN/33) has been determined using cDNA clones except for the last 11 nucleotides which were obtained by direct RNA sequencing. The WSN P3 gene contains 2,341 nucleotides and codes for a protein of 759 amino acids (molecular weight 85,800). The WSN P3 protein, as deduced from the plus-strand DNA sequence, is basic and enriched in positively charged amino acids. In addition, it contains clusters of basic amino acids which may provide sites for the interaction of P3 protein with the capped primer, template, and/or other polymerase proteins during the transcriptive and replicative processes of influenza viral RNA. PMID:7045393

  12. Nucleotide sequence of the capsid protein gene of papaya leaf-distortion mosaic potyvirus.

    PubMed

    Maoka, T; Kashiwazaki, S; Tsuda, S; Usugi, T; Hibino, H

    1996-01-01

    The DNA complementary to the 3'-terminal 1 404 nucleotides [excluding the poly(A) tail] of papaya leaf-distortion mosaic potyvirus (PLDMV) RNA was cloned and sequenced. The sequence starts within a long open reading frame (ORF) of 1 195 nucleotides and is followed by a 3' non-coding region of 209 nucleotides. Capsid protein (CP) is encoded at the 3' terminus of the ORF. The CP contains 293 residues and has a Mr of 33 277. The CP of PLDMV exhibits 49 to 59% sequence similarity at the amino acid level to the CPs of papaya ringspot potyvirus (PRSV) and other potyviruses. This result is consistent with the absence of a serological relationship between PLDMV and PRSV or other potyviruses. The results support the assignment of PLDMV as a distinct member of the genus Potyvirus.

  13. Nucleotide Sequence of the Protective Antigen Gene of Bacillus Anthracis

    DTIC Science & Technology

    1988-02-02

    transcription and translation of the Bacillus megaterium protein C gene. J. Bacteriol. 158:e09-813. 9. Friedlander, A, M. 1986. Macrophages are sensitive to...of the Protective Antigen Gene of Bacillus anthracis 6. pEaltranalO opl. AMPOA’T B*u~iA S. L. Welkos, J. R. Lowe, F. Eden-McCutchan, M. Vodkin, S. M... Bacillus anthracls and the 5’ and 3’ flanking sequences were determined. Protective antigen ie one of three proteins comprising anthrax toxin. The open

  14. Intraspecific nucleotide sequence differences in the major noncoding region of human mitochondrial DNA.

    PubMed Central

    Horai, S; Hayasaka, K

    1990-01-01

    Nucleotide sequences of the major noncoding region of human mitochondrial DNA (mtDNA) from 95 human placentas have been determined. These sequences include at least a 482-bp-long region encompassing most of the D-loop-forming region. Comparisons of these sequences with those previously determined have revealed remarkable features of nucleotide substitutions and insertion/deletion events. The nucleotide diversity among the sequences is estimated as 1.45%, which is three- to fourfold higher than the corresponding value estimated from restriction-enzyme analysis of whole mtDNA genome. A hypervariable region has also been defined. In this 14-bp region, 17 different sequences were detected. More than 97% of the base changes are transitions. A significantly nonrandom distribution of nucleotide substitutions and sequence length variations were also noted. The phylogenetic analysis indicates that diversity among the negroids is much larger than that among the caucasoids or the mongoloids. In fact, part of the negroids first diverged from other humans in the phylogenetic tree. A striking finding in the phylogenetic analysis is that the mongoloids can be separated into two distinct groups. Divergence of part of the mongoloids follows the earliest divergence of part of the negroids. The remainder of the mongoloids subsequently diverged together with the caucasoids. This observation confirmed our earlier study, which clearly demonstrated, by the restriction-enzyme analysis, existence of two distinct groups in the Japanese. Images Figure 3 PMID:2316527

  15. Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications

    PubMed Central

    Cvicek, Vaclav; Goddard, William A.; Abrol, Ravinder

    2016-01-01

    The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs

  16. An Integrated System for DNA Sequencing by Synthesis Using Novel Nucleotide Analogues

    PubMed Central

    Guo, Jia; Yu, Lin; Turro, Nicholas J.; Ju, Jingyue

    2010-01-01

    Conspectus The Human Genome Project has concluded, but its successful completion has increased, rather than decreased, the need for high-throughput DNA sequencing technologies. The possibility of clinically screening a full genome for an individual's mutations offers tremendous benefits, both for pursuing personalized medicine as well as uncovering the genomic contributions to diseases. The Sanger sequencing method—although enormously productive for more than 30 years—requires an electrophoretic separation step that, unfortunately, remains a key technical obstacle for achieving economically acceptable full-genome results. Alternative sequencing approaches thus focus on innovations that can reduce costs. The DNA sequencing by synthesis (SBS) approach has shown great promise as a new sequencing platform, with particular progress reported recently. The general fluorescent SBS approach involves (i) incorporation of nucleotide analogs bearing fluorescent reporters, (ii) identification of the incorporated nucleotide by its fluorescent emissions, and (iii) cleavage of the fluorophore, along with the reinitiation of the polymerase reaction for continuing sequence determination. In this Account, we review the construction of a DNA-immobilized chip and the development of novel nucleotide reporters for the SBS sequencing platform. Click chemistry, with its high selectivity and coupling efficiency, was explored for surface immobilization of DNA. The first generation (G-1) modified nucleotides for SBS feature a small chemical moiety capping the 3′-OH and a fluorophore tethered to the base through a chemically cleavable linker; the design ensures that the nucleotide reporters are good substrates for the polymerase. The 3′-capping moiety and the fluorophore on the DNA extension products, generated by the incorporation of the G-1 modified nucleotides, are cleaved simultaneously to reinitiate the polymerase reaction. The sequence of a DNA template immobilized on a surface

  17. The complete nucleotide sequence and genomic characterization of tropical soda apple mosaic virus.

    PubMed

    Fillmer, Kornelia; Adkins, Scott; Pongam, Patchara; D'Elia, Tom

    2016-08-01

    We report the first complete genome sequence of tropical soda apple mosaic virus (TSAMV), a tobamovirus originally isolated from tropical soda apple (Solanum viarum) collected in Okeechobee, Florida. The complete genome of TSAMV is 6,350 nucleotides long and contains four open reading frames encoding the following proteins: i) 126-kDa methyltransferase/helicase (3354 nt), ii) 183-kDa polymerase (4839 nt), iii) movement protein (771 nt) and iv) coat protein (483 nt). The complete genome sequence of TSAMV shares 80.4 % nucleotide sequence identity with pepper mild mottle virus (PMMoV) and 71.2-74.2 % identity with other tobamoviruses naturally infecting members of the Solanaceae plant family. Phylogenetic analysis of the deduced amino acid sequences of the 126-kDa and 183-kDa proteins and the complete genome sequence place TSAMV in a subcluster with PMMoV within the Solanaceae-infecting subgroup of tobamoviruses.

  18. Cloning and nucleotide sequence of wild type and a mutant histidine decarboxylase from Lactobacillus 30a.

    PubMed

    Vanderslice, P; Copeland, W C; Robertus, J D

    1986-11-15

    Prohistidine decarboxylase from Lactobacillus 30a is a protein that autoactivates to histidine decarboxylase by cleaving its peptide chain between serines 81 and 82 and converting Ser-82 to a pyruvoyl moiety. The pyruvoyl group serves as the prosthetic group for the decarboxylation reaction. We have cloned and determined the nucleotide sequence of the gene for this enzyme from a wild type strain and from a mutant with altered autoactivation properties. The nucleotide sequence modifies the previously determined amino acid sequence of the protein. A tripeptide missed in the chemical sequence is inserted, and three other amino acids show conservative changes. The activation mutant shows a single change of Gly-58 to an Asp. Sequence analysis up- and downstream from the gene suggests that histidine decarboxylase is part of a polycistronic message, and that the transcriptional promotor region is strongly homologous to those of other Gram-positive organisms.

  19. Population genetics and phylogenetic analysis of the vrs1 nucleotide sequence in wild and cultivated barley.

    PubMed

    Ren, Xifeng; Wang, Yonggang; Yan, Songxian; Sun, Dongfa; Sun, Genlou

    2014-04-01

    Spike morphology is a key characteristic in the study of barley genetics, breeding, and domestication. Variation at the six-rowed spike 1 (vrs1) locus is sufficient to control the development and fertility of the lateral spikelet of barley. To study the genetic variation of vrs1 in wild barley (Hordeum vulgare subsp. spontaneum) and cultivated barley (Hordeum vulgare subsp. vulgare), nucleotide sequences of vrs1 were examined in 84 wild barleys (including 10 six-rowed) and 20 cultivated barleys (including 10 six-rowed) from four populations. The length of the vrs1 sequence amplified was 1536 bp. A total of 40 haplotypes were identified in the four populations. The highest nucleotide diversity, haplotype diversity, and per-site nucleotide diversity were observed in the Southwest Asian wild barley population. The nucleotide diversity, number of haplotypes, haplotype diversity, and per-site nucleotide diversity in two-rowed barley were higher than those in six-rowed barley. The phylogenetic analysis of the vrs1 sequences partially separated the six-rowed and the two-rowed barley. The six-rowed barleys were divided into four groups.

  20. Nucleotide composition of CO1 sequences in Chelicerata (Arthropoda): detecting new mitogenomic rearrangements.

    PubMed

    Arabi, Juliette; Judson, Mark L I; Deharveng, Louis; Lourenço, Wilson R; Cruaud, Corinne; Hassanin, Alexandre

    2012-02-01

    Here we study the evolution of nucleotide composition in third codon-positions of CO1 sequences of Chelicerata, using a phylogenetic framework, based on 180 taxa and three markers (CO1, 18S, and 28S rRNA; 5,218 nt). The analyses of nucleotide composition were also extended to all CO1 sequences of Chelicerata found in GenBank (1,701 taxa). The results show that most species of Chelicerata have a positive strand bias in CO1, i.e., in favor of C nucleotides, including all Amblypygi, Palpigradi, Ricinulei, Solifugae, Uropygi, and Xiphosura. However, several taxa show a negative strand bias, i.e., in favor of G nucleotides: all Scorpiones, Opisthothelae spiders and several taxa within Acari, Opiliones, Pseudoscorpiones, and Pycnogonida. Several reversals of strand-specific bias can be attributed to either a rearrangement of the control region or an inversion of a fragment containing the CO1 gene. Key taxa for which sequencing of complete mitochondrial genomes will be necessary to determine the origin and nature of mtDNA rearrangements involved in the reversals are identified. Acari, Opiliones, Pseudoscorpiones, and Pycnogonida were found to show a strong variability in nucleotide composition. In addition, both mitochondrial and nuclear genomes have been affected by higher substitution rates in Acari and Pseudoscorpiones. The results therefore indicate that these two orders are more liable to fix mutations of all types, including base substitutions, indels, and genomic rearrangements.

  1. Nucleotide sequence and genome organization of a new proposed crinivirus, tetterwort vein chlorosis virus.

    PubMed

    Zhao, Fumei; Yoo, Ran Hee; Lim, Seungmo; Igori, Davaajargal; Lee, Su-Heon; Moon, Jae Sun

    2015-11-01

    The genome of tetterwort vein chlorosis virus (TVCV) from South Korea has been completely sequenced. Its genomic organization resembles those of other criniviruses, with several new features, indicating that TVCV is a member of a new species in the genus Crinivirus, family Closteroviridae. RNA1 contains 8467 nucleotides, with at least four opening reading frames (ORFs). ORF1a encodes a protein with predicted papain-like protease, methyltransferase, and helicase activities. ORF1b encodes a putative RNA-dependent RNA polymerase that is apparently expressed through a +1 ribosomal frameshift. RNA2 contains 8113 nucleotides encoding at least nine proteins, similar to most crinivirus RNA2s. The 3' untranslated regions of the bipartite RNA genome share 82.1% nucleotide sequence identity.

  2. Complete nucleotide sequence of the new potexvirus "Alstroemeria virus X". Brief report.

    PubMed

    Fuji, S; Shinoda, K; Ikeda, M; Furuya, H; Naito, H; Fukumoto, F

    2005-11-01

    A flexuous virus was isolated in Japan from an alstroemeria plant showing mosaic symptoms. The virus had a broad host range but had systemically latent infectivity in alstroemeria. The virus was assigned to the genus Potexvirus based on morphology and physical properties and on an analysis of the complete nucleotide sequence. The genomic RNA of the virus was 7,009 nucleotides in length, excluding the 3'-terminal poly (A) tail. It contained five open reading frames (ORFs), which was consistent with other members of the genus Potexvirus. Although nucleotide sequences of the ORFs differ from previously reported potexviruses, a phylogenetic analysis placed it phylogenetically close to Narcissus mosaic virus and Scallion virus X. Therefore, we propose that this virus should be designated as Alstroemeria virus X (AlsVX).

  3. Complete nucleotide sequence of a begomovirus and associated betasatellite infecting croton (Croton bonplandianus) in Pakistan.

    PubMed

    Hussain, Khadim; Hussain, Mazhar; Mansoor, Shahid; Briddon, Rob W

    2011-06-01

    The complete sequences of a begomovirus and an associated betasatellite isolated from Croton bonplandianus originating from Pakistan were determined. The sequence of the begomovirus showed the highest level of nucleotide sequence identity (88.9%) to an isolate of papaya leaf curl virus and thus represents a new species, for which we propose the name Croton yellow vein virus (CYVV). The sequence of the betasatellite showed the highest levels of sequence identity (82 to 98.4%) to six sequences in the databases that have yet to be reported, followed by isolates of tomato leaf curl Joydebpur betasatellite (48.7 to 52.5%). This indicates that the betasatellite identified here (and the six sequences in the databases) is an isolate of a newly identified species for which the name Croton yellow vein mosaic betasatellite (CroYVMB) is proposed. For the begomovirus, an analysis of the sequence indicates that it has a recombinant origin.

  4. Complete nucleotide sequence of a novel strain of fig fleck-associated virus from China.

    PubMed

    He, Zhen; Mijit, Mahmut; Li, Shifang; Zhang, Zhixiang

    2017-04-01

    The complete nucleotide sequence of fig fleck-associated virus from Xinjiang Uygur Autonomous Region of China (FFkaV-CN) was determined. The 6,723-nucleotide-long viral genome, excluding a terminal poly(A) tail, contains three open reading frames (ORFs). Pairwise comparisons showed that FFkaV-CN shares 83% and 92% sequence identity with FFkaV-Italy based on the complete genomic sequence and CP aa sequence, respectively, slightly higher than the species demarcation criterion for the genus Maculavirus. Phylogenetic analysis showed that FFkaV-CN and FFkaV-Italy clustered into one group. These results indicate that FFkaV-CN is a novel strain of FFkaV with a genome organization somewhat different from what was reported for FFkaV-Italy.

  5. Dependence of the E. coli promoter strength and physical parameters upon the nucleotide sequence

    PubMed Central

    Berezhnoy, Andrey Y.; Shckorbatov, Yuriy G.

    2005-01-01

    The energy of interaction between complementary nucleotides in promoter sequences of E. coli was calculated and visualized. The graphic method for presentation of energy properties of promoter sequences was elaborated on. Data obtained indicated that energy distribution through the length of promoter sequence results in picture with minima at −35, −8 and +7 regions corresponding to areas with elevated AT (adenine-thymine) content. The most important difference from the random sequences area is related to −8. Four promoter groups and their energy properties were revealed. The promoters with minimal and maximal energy of interaction between complementary nucleotides have low strengths, the strongest promoters correspond to promoter clusters characterized by intermediate energy values. PMID:16252339

  6. On the feasibility of using the intrinsic fluorescence of nucleotides for DNA sequencing.

    SciTech Connect

    Chowdhury, M. H.; Ray, K.; Johnson, R. L.; Gray, S. K.; Pond, J.; Lakowicz, J. R.; Univ. of Maryland; Univ. of Virginia; Lumerical Solutions, Inc.

    2010-04-29

    There is presently a worldwide effort to increase the speed and decrease the cost of DNA sequencing as exemplified by the goal of the National Human Genome Research Institute (NHGRI) to sequence a human genome for under $1000. Several high throughput technologies are under development. Among these, single strand sequencing using exonuclease appear very promising. However, this approach requires complete labeling of at least two bases at a time, with extrinsic high quantum yield probes. This is necessary because nucleotides absorb in the deep ultraviolet (UV) and emit with extremely low quantum yields. Hence intrinsic emission from DNA and nucleotides is not being exploited for DNA sequencing. In the present paper we consider the possibility of identifying single nucleotides using their intrinsic emission. We used the finite-difference time-domain (FDTD) method to calculate the effects of aluminum nanoparticles on nearby fluorophores that emit in the UV. We find that the radiated power of UV fluorophores is significantly increased when they are in close proximity to aluminum nanostructures. We show that there will be increased localized excitation near aluminum particles at wavelengths used to excite intrinsic nucleotide emission. Using FDTD simulation we show that a typical DNA base when coupled to appropriate aluminum nanostructures leads to highly directional emission. Additionally we present experimental results showing that a thin film of nucleotides show enhanced emission when in close proximity to aluminum nanostructures. Finally we provide Monte Carlo simulations that predict high levels of base calling accuracy for an assumed number of photons that is derived from the emission spectra of the intrinsic fluorescence of the bases. Our results suggest that single nucleotides can be detected and identified using aluminum nanostructures that enhance their intrinsic emission. This capability would be valuable for the ongoing efforts toward the $1000 genome.

  7. Molecular cloning and nucleotide sequencing of human immunoglobulin epsilon chain cDNA.

    PubMed Central

    Seno, M; Kurokawa, T; Ono, Y; Onda, H; Sasada, R; Igarashi, K; Kikuchi, M; Sugino, Y; Nishida, Y; Honjo, T

    1983-01-01

    DNA complementary to mRNA of human immunoglobulin E heavy chain (epsilon chain) isolated and purified from U266 cells has been synthesized and inserted into the PstI site of pBR322 by G-C tailing. This recombinant plasmid was used to transform E. coli chi 1776 to screen 1445 tetracycline resistant colonies. Nine clones (pGETI - 9) containing cDNA coding for the human epsilon chain were recognized by colony hybridization and Southern blotting analysis with a nick-translated human IgE genome fragment. The nucleotide sequence of the longest cDNA contained in pGET2 was determined. The results indicate that the sequence of 1657 nucleotides codes for 494 amino acids covering a part of the variable region and all of the constant region of the human epsilon chain. Most of the amino acid sequence deduced from the nucleotide sequence is in substantial agreement with that reported. Furthermore a termination codon after the -COOH terminal amino acid marks the beginning of a 3' untranslated region of 125 nucleotides with a poly A tail. Taking this into account, the structure of the human epsilon chain mRNA, except a part of the 5' end, is conserved fairly well in the cDNA insert in pGET2. Images PMID:6300763

  8. Complete Nucleotide Sequence of a Citrobacter freundii Plasmid Carrying KPC-2 in a Unique Genetic Environment

    PubMed Central

    Yao, Yancheng; Imirzalioglu, Can; Hain, Torsten; Kaase, Martin; Gatermann, Soeren; Exner, Martin; Mielke, Martin; Hauri, Anja; Dragneva, Yolanta; Bill, Rita; Wendt, Constanze; Wirtz, Angela; Chakraborty, Trinad

    2014-01-01

    The complete and annotated nucleotide sequence of a 54,036-bp plasmid harboring a blaKPC-2 gene that is clonally present in Citrobacter isolates from different species is presented. The plasmid belongs to incompatibility group N (IncN) and harbors the class A carbapenemase KPC-2 in a unique genetic environment. PMID:25395635

  9. A Novel Method for Alignment-free DNA Sequence Similarity Analysis Based on the Characterization of Complex Networks

    PubMed Central

    Zhou, Jie; Zhong, Pianyu; Zhang, Tinghui

    2016-01-01

    Determination of sequence similarity is one of the major steps in computational phylogenetic studies. One of the major tasks of computational biologists is to develop novel mathematical descriptors for similarity analysis. DNA clustering is an important technology that automatically identifies inherent relationships among large-scale DNA sequences. The comparison between the DNA sequences of different species helps determine phylogenetic relationships among species. Alignment-free approaches have continuously gained interest in various sequence analysis applications such as phylogenetic inference and metagenomic classification/clustering, particularly for large-scale sequence datasets. Here, we construct a novel and simple mathematical descriptor based on the characterization of cis sequence complex DNA networks. This new approach is based on a code of three cis nucleotides in a gene that could code for an amino acid. In particular, for each DNA sequence, we will set up a cis sequence complex network that will be used to develop a characterization vector for the analysis of mitochondrial DNA sequence phylogenetic relationships among nine species. The resulting phylogenetic relationships among the nine species were determined to be in agreement with the actual situation. PMID:27746676

  10. A Novel Method for Alignment-free DNA Sequence Similarity Analysis Based on the Characterization of Complex Networks.

    PubMed

    Zhou, Jie; Zhong, Pianyu; Zhang, Tinghui

    2016-01-01

    Determination of sequence similarity is one of the major steps in computational phylogenetic studies. One of the major tasks of computational biologists is to develop novel mathematical descriptors for similarity analysis. DNA clustering is an important technology that automatically identifies inherent relationships among large-scale DNA sequences. The comparison between the DNA sequences of different species helps determine phylogenetic relationships among species. Alignment-free approaches have continuously gained interest in various sequence analysis applications such as phylogenetic inference and metagenomic classification/clustering, particularly for large-scale sequence datasets. Here, we construct a novel and simple mathematical descriptor based on the characterization of cis sequence complex DNA networks. This new approach is based on a code of three cis nucleotides in a gene that could code for an amino acid. In particular, for each DNA sequence, we will set up a cis sequence complex network that will be used to develop a characterization vector for the analysis of mitochondrial DNA sequence phylogenetic relationships among nine species. The resulting phylogenetic relationships among the nine species were determined to be in agreement with the actual situation.

  11. The nucleotide sequence and genome structure of mung bean yellow mosaic geminivirus.

    PubMed

    Morinaga, T; Ikegami, M; Miura, K

    1993-01-01

    Complete nucleotide sequences of the infectious cloned DNA components (DNA 1 and DNA 2) of mung bean yellow mosaic virus (MYMV) were determined. MYMV DNA 1 and DNA 2 consists of 2,723 and 2,675 nucleotides respectively. DNA 1 and DNA 2 have little sequence similarity except for a region of approximately 200 bases which is almost identical in the two molecules. Analysis of open reading frames revealed nine potential coding regions for proteins of mol. wt. > 10,000, six in DNA 1 and three in DNA 2. The nucleotide sequence of MYMV DNA was compared with that of bean golden mosaic virus (BGMV), tomato golden mosaic virus (TGMV) and African cassava mosaic virus (ACMV). The 200-base region common to the two DNAs of each virus had little sequence similarity, except for a highly conserved 33-36 base sequence potentially capable of forming a stable hairpin structure. The potential coding regions in the MYMV DNAs had counterparts in the BGMV, TGMV and ACMV, suggesting an overall similarity in genome organization, except for absence of 1L3 in MYMV DNA 1. The most highly conserved ORFs, MYMV 1R1, BGMV 1R1, TGMV 1R1 and ACMV 1R1, are the putative genes for the coat proteins of MYMV, BGMV, TGMV and ACMV, respectively. MYMV 1L1 has also a high degree of sequence similarity with BGMV 1L1, TGMV 1L1 and ACMV 1L1.

  12. Nucleotide sequence of the 3'-terminal region of potato virus YN RNA.

    PubMed

    van der Vlugt, R; Allefs, S; de Haan, P; Goldbach, R

    1989-01-01

    The sequence of the 3'-terminal 1611 nucleotides of the genome of the tobacco veinal necrosis strain of potato virus Y (PVYN) was determined. The sequence revealed an open reading frame of 1285 nucleotides, of which the start was not identified, and an untranslated region of 316 nucleotides upstream of a poly(A) tract. Comparison of the open reading frame with the amino-terminal sequence of the viral coat protein enabled mapping of the start of the coat protein at amino acid -267, and indicated that maturation of this protein requires proteolytic processing from a larger polyprotein precursor at a glutamine/glycine dipeptide sequence. The coat protein of PVYN displayed significant (51 to 63%) sequence homology to the coat proteins of four other potyviruses, tobacco etch virus, tobacco vein mottling virus, plum pox virus and sugarcane mosaic virus. Even higher sequence homology (91%) was detected with the coat protein of a fifth potyvirus, pepper mottle virus (PeMV). This homology was of the same level as found between the coat proteins of PVYN and a second strain of this virus, PVYD. Since, moreover, PVYN and PeMV were the only potyviruses displaying homology in the 3'-terminal, non-translated regions of their genomes, we conclude that PeMV should be regarded as a strain of PVY.

  13. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.

    PubMed

    Remmert, Michael; Biegert, Andreas; Hauser, Andreas; Söding, Johannes

    2011-12-25

    Sequence-based protein function and structure prediction depends crucially on sequence-search sensitivity and accuracy of the resulting sequence alignments. We present an open-source, general-purpose tool that represents both query and database sequences by profile hidden Markov models (HMMs): 'HMM-HMM-based lightning-fast iterative sequence search' (HHblits; http://toolkit.genzentrum.lmu.de/hhblits/). Compared to the sequence-search tool PSI-BLAST, HHblits is faster owing to its discretized-profile prefilter, has 50-100% higher sensitivity and generates more accurate alignments.

  14. Prediction of human rotavirus serotype by nucleotide sequence analysis of the VP7 protein gene.

    PubMed Central

    Green, K Y; Sears, J F; Taniguchi, K; Midthun, K; Hoshino, Y; Gorziglia, M; Nishikawa, K; Urasawa, S; Kapikian, A Z; Chanock, R M

    1988-01-01

    Human rotavirus field isolates were characterized by direct sequence analysis of the gene encoding the serotype-specific major neutralization protein (VP7). Single-stranded RNA transcripts were prepared from virus particles obtained directly from stool specimens or after two or three passages in MA-104 cells. Two regions of the gene (nucleotides 307 through 351 and 670 through 711) which had previously been shown to contain regions of sequence divergence among rotavirus serotypes were sequenced by the dideoxynucleotide method with two different synthetic oligonucleotide primers. The resulting nucleotide sequences were compared with the corresponding sequences from rotaviruses of known serotype (serotype 1, 2, 3, or 4). A total of 25 field isolates and 10 laboratory strains examined by this method exhibited marked sequence identity in both areas of the gene with the corresponding regions of 1 of the 4 reference strains. In addition, the predicted serotype from the sequence analysis correlated in each case with the serotype determined when the rotaviruses were examined by plaque reduction neutralization or reactivity with serotype-specific monoclonal antibodies. These data suggest that as a result of the high degree of sequence conservation observed among rotaviruses of the same serotype, it is possible to predict the serotype of a rotavirus isolate by direct sequence analysis of its VP7 gene. PMID:2833626

  15. AlexSys: a knowledge-based expert system for multiple sequence alignment construction and analysis.

    PubMed

    Aniba, Mohamed Radhouene; Poch, Olivier; Marchler-Bauer, Aron; Thompson, Julie Dawn

    2010-10-01

    Multiple sequence alignment (MSA) is a cornerstone of modern molecular biology and represents a unique means of investigating the patterns of conservation and diversity in complex biological systems. Many different algorithms have been developed to construct MSAs, but previous studies have shown that no single aligner consistently outperforms the rest. This has led to the development of a number of 'meta-methods' that systematically run several aligners and merge the output into one single solution. Although these methods generally produce more accurate alignments, they are inefficient because all the aligners need to be run first and the choice of the best solution is made a posteriori. Here, we describe the development of a new expert system, AlexSys, for the multiple alignment of protein sequences. AlexSys incorporates an intelligent inference engine to automatically select an appropriate aligner a priori, depending only on the nature of the input sequences. The inference engine was trained on a large set of reference multiple alignments, using a novel machine learning approach. Applying AlexSys to a test set of 178 alignments, we show that the expert system represents a good compromise between alignment quality and running time, making it suitable for high throughput projects. AlexSys is freely available from http://alnitak.u-strasbg.fr/∼aniba/alexsys.

  16. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

    SciTech Connect

    Daily, Jeffrey A.

    2016-02-10

    Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.

  17. BarraCUDA - a fast short read sequence aligner using graphics processing units

    PubMed Central

    2012-01-01

    Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497

  18. Identification of high-quality single-nucleotide polymorphisms in Glycine latifolia using a heterologous reference genome sequence.

    PubMed

    Chang, Sungyul; Hartman, Glen L; Singh, Ram J; Lambert, Kris N; Hobbs, Houston A; Domier, Leslie L

    2013-06-01

    Like many widely cultivated crops, soybean [Glycine max (L.) Merr.] has a relatively narrow genetic base, while its perennial distant relatives in the subgenus Glycine Willd. are more genetically diverse and display desirable traits not present in cultivated soybean. To identify single-nucleotide polymorphisms (SNPs) between a pair of G. latifolia accessions that were resistant or susceptible to Sclerotinia sclerotiorum (Lib.) de Bary, reduced-representations of DNAs from each accession were sequenced. Approximately 30 % of the 36 million 100-nt reads produced from each of the two G. latifolia accessions aligned primarily to gene-rich euchromatic regions on the distal arms of G. max chromosomes. Because a genome sequence was not available for G. latifolia, the G. max genome sequence was used as a reference to identify 9,303 G. latifolia SNPs that aligned to unique positions in the G. max genome with at least 98 % identity and no insertions and deletions. To validate a subset of the SNPs, nine TaqMan and 384 GoldenGate allele-specific G. latifolia SNP assays were designed and analyzed in F2 G. latifolia populations derived from G. latifolia plant introductions (PI) 559298 and 559300. All nine TaqMan markers and 91 % of the 291 polymorphic GoldenGate markers segregated in a 1:2:1 ratio. Genetic linkage maps were assembled for G. latifolia, nine of which were uninterrupted and nearly collinear with the homoeologous G. max chromosomes. These results made use of a heterologous reference genome sequence to identify more than 9,000 informative high-quality SNPs for G. latifolia, a subset of which was used to generate the first genetic maps for any perennial Glycine species.

  19. Nucleotide sequence of miRNA precursor contributes to cleavage site selection by Dicer.

    PubMed

    Starega-Roslan, Julia; Galka-Marciniak, Paulina; Krzyzosiak, Wlodzimierz J

    2015-12-15

    The ribonuclease Dicer excises mature miRNAs from a diverse group of precursors (pre-miRNAs), most of which contain various secondary structure motifs in their hairpin stem. In this study, we analyzed Dicer cleavage in hairpin substrates deprived of such motifs. We searched for the factors other than the secondary structure, which may influence the length diversity and heterogeneity of miRNAs. We found that the nucleotide sequence at the Dicer cleavage site influences both of these miRNA characteristics. With regard to cleavage mechanism, we demonstrate that the Dicer RNase IIIA domain that cleaves within the 3' arm of the pre-miRNA is more sensitive to the nucleotide sequence of its substrate than is the RNase IIIB domain. The RNase IIIA domain avoids releasing miRNAs with G nucleotide and prefers to generate miRNAs with a U nucleotide at the 5' end. We also propose that the sequence restrictions at the Dicer cleavage site might be the factor that contributes to the generation of miRNA duplexes with 3' overhangs of atypical lengths. This finding implies that the two RNase III domains forming the single processing center of Dicer may exhibit some degree of flexibility, which allows for the formation of these non-standard 3' overhangs.

  20. Nucleotide sequence of miRNA precursor contributes to cleavage site selection by Dicer

    PubMed Central

    Starega-Roslan, Julia; Galka-Marciniak, Paulina; Krzyzosiak, Wlodzimierz J.

    2015-01-01

    The ribonuclease Dicer excises mature miRNAs from a diverse group of precursors (pre-miRNAs), most of which contain various secondary structure motifs in their hairpin stem. In this study, we analyzed Dicer cleavage in hairpin substrates deprived of such motifs. We searched for the factors other than the secondary structure, which may influence the length diversity and heterogeneity of miRNAs. We found that the nucleotide sequence at the Dicer cleavage site influences both of these miRNA characteristics. With regard to cleavage mechanism, we demonstrate that the Dicer RNase IIIA domain that cleaves within the 3′ arm of the pre-miRNA is more sensitive to the nucleotide sequence of its substrate than is the RNase IIIB domain. The RNase IIIA domain avoids releasing miRNAs with G nucleotide and prefers to generate miRNAs with a U nucleotide at the 5′ end. We also propose that the sequence restrictions at the Dicer cleavage site might be the factor that contributes to the generation of miRNA duplexes with 3′ overhangs of atypical lengths. This finding implies that the two RNase III domains forming the single processing center of Dicer may exhibit some degree of flexibility, which allows for the formation of these non-standard 3′ overhangs. PMID:26424848

  1. Structure-based evaluation of sequence comparison and fold recognition alignment accuracy.

    PubMed

    Domingues, F S; Lackner, P; Andreeva, A; Sippl, M J

    2000-04-07

    The biological role, biochemical function, and structure of uncharacterized protein sequences is often inferred from their similarity to known proteins. A constant goal is to increase the reliability, sensitivity, and accuracy of alignment techniques to enable the detection of increasingly distant relationships. Development, tuning, and testing of these methods benefit from appropriate benchmarks for the assessment of alignment accuracy.Here, we describe a benchmark protocol to estimate sequence-to-sequence and sequence-to-structure alignment accuracy. The protocol consists of structurally related pairs of proteins and procedures to evaluate alignment accuracy over the whole set. The set of protein pairs covers all the currently known fold types. The benchmark is challenging in the sense that it consists of proteins lacking clear sequence similarity. Correct target alignments are derived from the three-dimensional structures of these pairs by rigid body superposition. An evaluation engine computes the accuracy of alignments obtained from a particular algorithm in terms of alignment shifts with respect to the structure derived alignments. Using this benchmark we estimate that the best results can be obtained from a combination of amino acid residue substitution matrices and knowledge-based potentials.

  2. Isolation and nucleotide sequence of a cDNA clone encoding rat mitochondrial malate dehydrogenase.

    PubMed Central

    Grant, P M; Tellam, J; May, V L; Strauss, A W

    1986-01-01

    We have determined the complete sequence of the rat mitochondrial malate dehydrogenase (mMDH) precursor derived from nucleotide sequence of the cDNA. A single synthetic oligodeoxynucleotide probe was used to screen a rat atrial cDNA library constructed in lambda gt10. A 1.2 kb full-length cDNA clone provided the first complete amino acid sequence of pre-mMDH. The 1014 nucleotide-long open reading frame encodes the 314 residue long mature mMDH protein and a 24 amino acid NH2-terminal extension which directs mitochondrial import and is cleaved from the precursor after import to generate mature mMDH. The amino acid composition of the transit peptide is polar and basic. The pre-mMDH transit peptide shows marked homology with those of two other enzymes targeted to the rat mitochondrial matrix. Images PMID:3755817

  3. Nucleotide sequence and genome organization of atractylodes mottle virus, a new member of the genus Carlavirus.

    PubMed

    Zhao, Fumei; Igori, Davaajargal; Lim, Seungmo; Yoo, Ran Hee; Lee, Su-Heon; Moon, Jae Sun

    2015-11-01

    The complete genome sequence of a member of a distinct species of the genus Carlavirus in the family Betaflexiviridae, tentatively named atractylodes mottle virus (AtrMoV), has been determined. Analysis of its genomic organization indicates that it has a single-stranded, positive-sense genomic RNA of 8866 nucleotides, excluding the poly(A) tail, and consists of six open reading frames typical of members of the genus Carlavirus. The individual open reading frames of AtrMoV show moderately low sequence similarity to those of other carlaviruses at the nucleotide and amino acid sequence levels. Pairwise comparison and phylogenetic analysis suggest that AtrMoV is most closely related to chrysanthemum virus B.

  4. A novel HLA-B*51 allele (B*5116) identified by nucleotide sequencing.

    PubMed

    Tamouza, R; Carbonnelle, E; Schaeffer, V; Sadki, K; Abed, Y; Marzais, F; Poirier, J C; Fortier, C; Toubert, A; Raffoux, C; Charron, D

    2000-02-01

    We report here an additional HLA-B*51 variant designated HLA-B*5116. Detected by an abnormal serological reactivity pattern, this variant was identified as a B*51 allele by polymerase chain reaction using sequence-specific primers (PCR-SSP) and characterized by nucleotide sequencing. The new variant sequence match closely with the classical HLA-B*5101 excepted two adjacent nucleotide substitutions at positions 216 and 217 of the third exon and the subsequent Leucine to Glutamic acid change at codon 163 of the alpha2 domain (CTG-->GAG). This new variant was not detected in three different ethnic groups (French, Algerian and Lebanese) suggesting a very rare frequency.

  5. Nucleotide sequence and genome organization of Dweet mottle virus and its relationship to members of the family Betaflexiviridae

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The nucleotide sequence of Dweet mottle virus (DMV) was determined and compared to sequences of members of the family Alpha- and Beta-flexiviridae. The DMV genome has 8747 nucleotides (nt) excluding the poly-(A) tail at the 3’ end of the genome. The overall G+C content of DMV genomic RNA is 40%. D...

  6. Complete nucleotide sequence analysis of a Dengue-1 virus isolated on Easter Island, Chile.

    PubMed

    Cáceres, C; Yung, V; Araya, P; Tognarelli, J; Villagra, E; Vera, L; Fernández, J

    2008-01-01

    Dengue-1 viruses responsible for the dengue fever outbreak in Easter Island in 2002 were isolated from acute-phase sera of dengue fever patients. In order to analyze the complete genome sequence, we designed primers to amplify contiguous segments across the entire sequence of the viral genome. RT-PCR products obtained were cloned, and complete nucleotide and deduced amino acid sequences were determined. This report constitutes the first complete genetic characterization of a DENV-1 isolate from Chile. Phylogenetic analysis shows that an Easter Island isolate is most closely related to Pacific DENV-1 genotype IV viruses.

  7. Complete nucleotide sequence of a subviral DNA molecule of porcine circovirus type 2.

    PubMed

    Wen, Han

    2016-07-01

    Porcine circovirus type 2 (PCV2) is a member of the genus Circovirus in the family Circoviridae. Most subgenomic molecules of PCV2 have been mapped. Here, the first full-length sequence of a subviral molecule of PCV2 (CH-IVT12) containing a reverse complement sequence of the PCV2 genome was determined by sequencing DNA extracted from PK15 cells infected with PCV2. The circular CH-IVT12 DNA consists of 1136 nucleotides and contains one major open reading frame.

  8. Nucleotide sequence of a new isolate of ribgrass mosaic tobamovirus infecting Impatiens New Guinea.

    PubMed

    Wetzel, T; Njapo Ngangom, H O; Chotewutmontri, S; Krczal, G

    2006-04-01

    The complete nucleotide sequence of a tobamovirus isolated from Impatiens New Guinea was determined. The genome was 6302 nt long, and its genomic organisation was similar to those of other crucufer tobamoviruses. Sequence comparisons with the corresponding sequences of other crucifer tobamoviruses revealed highest levels of identity with the ribgrass mosaic virus (Shanghai isolate). A small open reading frame putatively encoding a 4.5-kDa protein with a low degree of similarity to the ORF6 of tobacco mosaic virus was found nested in the movement protein gene.

  9. Nucleotide sequences of the coat protein genes of two Japanese zucchini yellow mosaic virus isolates.

    PubMed

    Kundu, A K; Ohshima, K; Sako, N

    1997-10-01

    The nucleotide (nt) sequences of the coat protein (CP) genes of two Japanese zucchini yellow mosaic virus (ZYMV) isolates (ZYMV-169 and ZYMV-M) were determined. The CP genes of both isolates were 837 nt long and encoded 279 amino acids (aa). The nt and deduced aa sequence similarities between the two isolates were 92% and 94.6%, respectively. The deduced aa sequences of CPs of the Japanese isolates were compared with those of previously reported ZYMV isolates by phylogenetic analysis. This comparison lead us to divide all ZMYV isolates into 3 groups in which ZYMV-169 formed its own distinct group.

  10. Sequence selective naked-eye detection of DNA harnessing extension of oligonucleotide-modified nucleotides.

    PubMed

    Verga, Daniela; Welter, Moritz; Marx, Andreas

    2016-02-01

    DNA polymerases can efficiently and sequence selectively incorporate oligonucleotide (ODN)-modified nucleotides and the incorporated oligonucleotide strand can be employed as primer in rolling circle amplification (RCA). The effective amplification of the DNA primer by Φ29 DNA polymerase allows the sequence-selective hybridisation of the amplified strand with a G-quadruplex DNA sequence that has horse radish peroxidase-like activity. Based on these findings we develop a system that allows DNA detection with single-base resolution by naked eye.

  11. The nucleotide sequence at the termini of adenovirus type 5 DNA.

    PubMed Central

    Steenbergh, P H; Maat, J; van Ormondt, H; Sussenbach, J S

    1977-01-01

    The sequences of the first 194 base pairs at both termini of adenovirus type 5 (Ad5) DNA have been determined, using the chemical degradation technique developed by Maxam and Gilbert (Proc. Nat. Acad. Sci. USA 74 (1977), pp. 560-564). The nucleotide sequences 1-75 were confirmed by analysis of labeled RNA transcribed from the terminal HhaI fragments in vitro. The sequence data show that Ad5 DNA has a perfect inverted terminal repetition of 103 base pairs long. Images PMID:600799

  12. Flexible structural protein alignment by a sequence of local transformations

    PubMed Central

    Rocha, Jairo; Segura, Joan; Wilson, Richard C.; Dasgupta, Swagata

    2009-01-01

    Motivation: Throughout evolution, homologous proteins have common regions that stay semi-rigid relative to each other and other parts that vary in a more noticeable way. In order to compare the increasing number of structures in the PDB, flexible geometrical alignments are needed, that are reliable and easy to use. Results: We present a protein structure alignment method whose main feature is the ability to consider different rigid transformations at different sites, allowing for deformations beyond a global rigid transformation. The performance of the method is comparable with that of the best ones from 10 aligners tested, regarding both the quality of the alignments with respect to hand curated ones, and the classification ability. An analysis of some structure pairs from the literature that need to be matched in a flexible fashion are shown. The use of a series of local transformations can be exported to other classifiers, and a future golden protein similarity measure could benefit from it. Availability: A public server for the program is available at http://dmi.uib.es/ProtDeform/. Contact: jairo@uib.es Supplementary information: All data used, results and examples are available at http://dmi.uib.es/people/jairo/bio/ProtDeform.Supplementary data are available at Bioinformatics online. PMID:19417057

  13. PatMatch: a program for finding patterns in peptide and nucleotide sequences

    PubMed Central

    Yan, Thomas; Yoo, Danny; Berardini, Tanya Z.; Mueller, Lukas A.; Weems, Dan C.; Weng, Shuai; Cherry, J. Michael; Rhee, Seung Y.

    2005-01-01

    Here, we present PatMatch, an efficient, web-based pattern-matching program that enables searches for short nucleotide or peptide sequences such as cis-elements in nucleotide sequences or small domains and motifs in protein sequences. The program can be used to find matches to a user-specified sequence pattern that can be described using ambiguous sequence codes and a powerful and flexible pattern syntax based on regular expressions. A recent upgrade has improved performance and now supports both mismatches and wildcards in a single pattern. This enhancement has been achieved by replacing the previous searching algorithm, scan_for_matches [D'Souza et al. (1997), Trends in Genetics, 13, 497–498], with nondeterministic-reverse grep (NR-grep), a general pattern matching tool that allows for approximate string matching [Navarro (2001), Software Practice and Experience, 31, 1265–1312]. We have tailored NR-grep to be used for DNA and protein searches with PatMatch. The stand-alone version of the software can be adapted for use with any sequence dataset and is available for download at The Arabidopsis Information Resource (TAIR) at . The PatMatch server is available on the web at for searching Arabidopsis thaliana sequences. PMID:15980466

  14. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences.

    PubMed

    Chen, Wei; Lin, Hao; Chou, Kuo-Chen

    2015-10-01

    With the avalanche of DNA/RNA sequences generated in the post-genomic age, it is urgent to develop automated methods for analyzing the relationship between the sequences and their functions. Towards this goal, a series of sequence-based methods have been proposed and applied to analyze various character-unknown DNA/RNA sequences in order for in-depth understanding their action mechanisms and processes. Compared with the classical sequence-based methods, the pseudo nucleotide composition or PseKNC approach developed very recently has the following advantages: (1) it can convert length-different DNA/RNA sequences into dimension-fixed digital vectors that can be directly handled by all the existing machine-learning algorithms or operation engines; (2) it can contain the desired features and properties according to the selection or definition of users; (3) it can cover considerable sequence pattern information, both local and global. This minireview is focused on the concept of pseudo nucleotide composition, its development and applications.

  15. Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data

    PubMed Central

    Kosugi, Shunichi; Natsume, Satoshi; Yoshida, Kentaro; MacLean, Daniel; Cano, Liliana; Kamoun, Sophien; Terauchi, Ryohei

    2013-01-01

    Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. PMID:24116042

  16. Linking the human cytogenetic map with nucleotide sequence: the CCAP clone set.

    PubMed

    Jang, Wonhee; Yonescu, Raluca; Knutsen, Turid; Brown, Theresa; Reppert, Tricia; Sirotkin, Karl; Schuler, Gregory D; Ried, Thomas; Kirsch, Ilan R

    2006-07-15

    We present the completed dataset and clone repository of the Cancer Chromosome Aberration Project (CCAP), an initiative developed and funded through the intramural program of the U.S. National Cancer Institute, to provide seamless linkage of human cytogenetic markers with the primary nucleotide sequence of the human genome. Spaced at 1-2 Mb intervals across the human genome, 1,339 bacterial artificial chromosome (BAC) clones have been localized to chromosomal bands through high-resolution fluorescence in situ hybridization (FISH) mapping. Of these clones, 99.8% can be positioned on the primary human genome sequence and 95% are placed at or close to their precise nucleotide starts and stops. This dataset can be studied and manipulated within generally available public Web sites. The clones are available from a commercial repository. The CCAP BAC clone set provides anchors for the interrogation of gene and sequence involvement in oncogenic and developmental disorders when the starting point is the recognition of a structural, numerical, or interstitial chromosomal aberration. This dataset also provides a current view of the quality and coherence of the available genome sequence and insight into the nucleotide and three-dimensional structures that manifest as Giemsa light and dark chromosomal banding patterns.

  17. Nucleotide sequence and expression of the 14-3-3 from the halotolerant alga Dunaliella salina.

    PubMed

    Wang, Tian-yun; Jing, Chang-Qin; Dong, Wei-Hua; Zhang, Jun-He; Zhang, Yu

    2010-02-01

    Previously we reported the nucleotide sequence of a 14-3-3 cDNA cloned from the unicellular green alga Dunaliella salina, however, the nucleotide sequence of this gene have not been reported so far. In the present study, the cloning and characterization of the nucleotide sequence, the gene copy and expression were undertaken. The coding sequence of the gene was found to be interrupted by five introns of 132, 266, 153, 152 and 625 bp, respectively. Introns 3-5 were found in conserved positions as compared to the Chlamydomonas reinhardtii 14-3-3 gene. D. salina 14-3-3 cDNA was inserted into the prokaryotic expression plasmid pET-28 and transformed into E. coli BL21, and the recombinant expressed 14-3-3 protein was purified from E. coli and immunized the rabbit. Indirect ELISA coated with 14-3-3 illustrated that the rabbit antisera titration was 1:1.00E + 06. Western blotting assays confirmed that prepared rabbit antibodies could recognize the recombinant 14-3-3 protein. Southern blotting results showed that there was only one copy of the 14-3-3 present in the genome of D. salina and 14-3-3 expression did not change throughout the Dnualiella cell cycle.

  18. Nucleotide binding database NBDB – a collection of sequence motifs with specific protein-ligand interactions

    PubMed Central

    Zheng, Zejun; Goncearenco, Alexander; Berezovsky, Igor N.

    2016-01-01

    NBDB database describes protein motifs, elementary functional loops (EFLs) that are involved in binding of nucleotide-containing ligands and other biologically relevant cofactors/coenzymes, including ATP, AMP, ATP, GMP, GDP, GTP, CTP, PAP, PPS, FMN, FAD(H), NAD(H), NADP, cAMP, cGMP, c-di-AMP and c-di-GMP, ThPP, THD, F-420, ACO, CoA, PLP and SAM. The database is freely available online at http://nbdb.bii.a-star.edu.sg. In total, NBDB contains data on 249 motifs that work in interactions with 24 ligands. Sequence profiles of EFL motifs were derived de novo from nonredundant Uniprot proteome sequences. Conserved amino acid residues in the profiles interact specifically with distinct chemical parts of nucleotide-containing ligands, such as nitrogenous bases, phosphate groups, ribose, nicotinamide, and flavin moieties. Each EFL profile in the database is characterized by a pattern of corresponding ligand–protein interactions found in crystallized ligand–protein complexes. NBDB database helps to explore the determinants of nucleotide and cofactor binding in different protein folds and families. NBDB can also detect fragments that match to profiles of particular EFLs in the protein sequence provided by user. Comprehensive information on sequence, structures, and interactions of EFLs with ligands provides a foundation for experimental and computational efforts on design of required protein functions. PMID:26507856

  19. Comparative analysis of ITS1 nucleotide sequence reveals distinct genetic difference between Brugia malayi from Northeast Borneo and Thailand.

    PubMed

    Fong, Mun-Yik; Noordin, Rahmah; Lau, Yee-Ling; Cheong, Fei-Wen; Yunus, Muhammad Hafiznur; Idris, Zulkarnain Md

    2013-01-01

    Brugia malayi is one of the parasitic worms which causes lymphatic filariasis in humans. Its geographical distribution includes a large part of Asia. Despite its wide distribution, very little is known about the genetic variation and molecular epidemiology of this species. In this study, the internal transcribed spacer 1 (ITS1) nucleotide sequences of B. malayi from microfilaria-positive human blood samples in Northeast Borneo Island were determined, and compared with published ITS1 sequences of B. malayi isolated from cats and humans in Thailand. Multiple alignment analysis revealed that B. malayi ITS1 sequences from Northeast Borneo were more similar to each other than to those from Thailand. Phylogenetic trees inferred using Neighbour-Joining and Maximum Parsimony methods showed similar topology, with 2 distinct B. malayi clusters. The first cluster consisted of Northeast Borneo B. malayi isolates, whereas the second consisted of the Thailand isolates. The findings of this study suggest that B. malayi in Borneo Island has diverged significantly from those of mainland Asia, and this has implications for the diagnosis of B. malayi infection across the region using ITS1-based molecular techniques.

  20. IP-MSA: Independent order of progressive multiple sequence alignments using different substitution matrices

    NASA Astrophysics Data System (ADS)

    Boraik, Aziz Nasser; Abdullah, Rosni; Venkat, Ibrahim

    2014-12-01

    Multiple sequence alignment (MSA) is an essential process for many biological sequence analyses. There are many algorithms developed to solve MSA, but an efficient computation method with very high accuracy is still a challenge. Progressive alignment is the most widely used approach to compute the final MSA. In this paper, we present a simple and effective progressive approach. Based on the independent order of sequences progressive alignment which proposed in QOMA, this method has been modified to align the whole sequences to maximize the score of MSA. Moreover, in order to further improve the accuracy of the method, we estimate the similarity of any pair of input sequences by using their percent identity, and based on this measure, we choose different substitution matrices during the progressive alignment. In addition, we have included horizontal information to alignment by adjusting the weights of amino acid residues based on their neighboring residues. The experimental results have been tested on popular benchmark of global protein sequences BAliBASE 3.0 and local protein sequences IRMBASE 2.0. The results of the proposed approach outperform the original method in QOMA in terms of sum-of-pair score and column score by up to 14% and 7% respectively.

  1. An assessment of the phylogenetic relationship among sugarcane and related taxa based on the nucleotide sequence of 5S rRNA intergenic spacers.

    PubMed

    Pan, Y B; Burner, D M; Legendre, B L

    2000-01-01

    5S rRNA intergenic spacers were amplified from two elite sugarcane (Saccharum hybrids) cultivars and their related taxa by polymerase chain reaction (PCR) with 5S rDNA consensus primers. Resulting PCR products were uniform in length from each accession but exhibited some degree of length variation among the sugarcane accessions and related taxa. These PCR products did not always cross hybridize in Southern blot hybridization experiments. These PCR products were cloned into a commercial plasmid vector PCR 2.1 and sequenced. Direct sequencing of cloned PCR products revealed spacer length of 231-237 bp for S. officinarum, 233-237 for sugarcane cultivars, 228-238 bp for S. spontaneum, 239-252 bp for S. giganteum, 385-410 bp for Erianthus spp., 226-230 bp for Miscanthus sinensis Zebra, 206-207 bp for M. sinensis IMP 3057, 207-209 bp for Sorghum bicolor, and 247-249 bp for Zea mays. Nucleotide sequence polymorphism were found at both the segment and single nucleotide level. A consensus sequence for each taxon was obtained by Align X. Multiple sequences were aligned and phylogenetic trees constructed using Align X. CLUSTAL and DNAMAN programs. In general, accessions of the following taxa tended to group together to form distinct clusters: S. giganteum, Erianthus spp., M. sinensis, S. bicolor, and Z. mays. However, the two S. officinarum clones and two sugarcane cultivars did not form distinct clusters but interrelated within the S. spontaneum cluster. The disclosure of these 5S rRNA intergenic spacer sequences will facilitate marker-assisted breeding in sugarcane.

  2. Quadfinder: server for identification and analysis of quadruplex-forming motifs in nucleotide sequences

    PubMed Central

    Scaria, Vinod; Hariharan, Manoj; Arora, Amit; Maiti, Souvik

    2006-01-01

    G-quadruplex secondary structures, which play a structural role in repetitive DNA such as telomeres, may also play a functional role at other genomic locations as targetable regulatory elements which control gene expression. The recent interest in application of quadruplexes in biological systems prompted us to develop a tool for the identification and analysis of quadruplex-forming nucleotide sequences especially in the RNA. Here we present Quadfinder, an online server for prediction and bioinformatics of uni-molecular quadruplex-forming nucleotide sequences. The server is designed to be user-friendly and needs minimal intervention by the user, while providing flexibility of defining the variants of the motif. The server is freely available at URL . PMID:16845097

  3. Nucleotide sequence and replication properties of the Bacillus borstelensis cryptic plasmid pHT926.

    PubMed Central

    Ebisu, S; Murahashi, Y; Takagi, H; Kadowaki, K; Yamaguchi, K; Yamagata, H; Udaka, S

    1995-01-01

    The nucleotide sequence of pHT926, a cryptic plasmid found in Bacillus borstelensis HP926, was determined. pHT926 replicates by a rolling-circle mechanism and belongs to the pC194 plasmid family. The copy number of pHT926 was fourfold higher than that of pUB110 and very stably maintained in Bacillus choshinensis. PMID:7487045

  4. Nucleotide sequence of alkyl-dihydroxyacetonephosphate synthase cDNA from Dictyostelium discoideum.

    PubMed

    de Vet, E C; van den Bosch, H

    1998-11-27

    The nucleotide sequence is reported of alkyl-dihydroxyacetonephosphate synthase cDNA from the cellular slime mold Dictyostelium discoideum. The open reading frame encodes a protein of 611 amino acids which shows a 33% amino acid identity to the human enzyme. This D. discoideum homolog carries a variant of the peroxisomal targeting signal type 1 at its C-terminus (PKL). Expression of the cDNA in Escherichia coli yielded an enzymatically active protein.

  5. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence...

  6. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence...

  7. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence...

  8. The complete nucleotide sequence of the mitochondrial genome of Phthonandria atrilineata (Lepidoptera: Geometridae).

    PubMed

    Yang, Ling; Wei, Zhao-Jun; Hong, Gui-Yun; Jiang, Shao-Tong; Wen, Long-Ping

    2009-07-01

    Using long-polymerase chain reaction (Long-PCR) method, we determined the complete nucleotide sequence of the mitochondrial genome (mitogenome) of Phthonandria atrilineata. The complete mtDNA from P. atrilineata was 15,499 base pairs in length and contained 13 protein-coding genes (PCGs), 2 rRNA genes, 22 tRNA genes, and a control region. The P. atrilineata genes were in the same order and orientation as the completely sequenced mitogenomes of other lepidopteran species. The nucleotide composition of P. atrilineata mitogenome was biased toward A + T nucleotides (81.02%), and the 13 PCGs show different A + T contents that range from 73.25% (cox1) to 92.12% (atp8). Phthonandria had the canonical set of 22 tRNA genes, that fold in the typical cloverleaf structure described for metazoan mt tRNAs, with the unique exception of trnS(AGN). The phylogenetic relationships were reconstructed with the concatenated sequences of the 13 PCGs of the mitochondrial genome, which confirmed that P. atrilineata is most closely related to the superfamily Bombycoidea.

  9. Nucleotide sequence variation of the envelope protein gene identifies two distinct genotypes of yellow fever virus.

    PubMed Central

    Chang, G J; Cropp, B C; Kinney, R M; Trent, D W; Gubler, D J

    1995-01-01

    The evolution of yellow fever virus over 67 years was investigated by comparing the nucleotide sequences of the envelope (E) protein genes of 20 viruses isolated in Africa, the Caribbean, and South America. Uniformly weighted parsimony algorithm analysis defined two major evolutionary yellow fever virus lineages designated E genotypes I and II. E genotype I contained viruses isolated from East and Central Africa. E genotype II viruses were divided into two sublineages: IIA viruses from West Africa and IIB viruses from America, except for a 1979 virus isolated from Trinidad (TRINID79A). Unique signature patterns were identified at 111 nucleotide and 12 amino acid positions within the yellow fever virus E gene by signature pattern analysis. Yellow fever viruses from East and Central Africa contained unique signatures at 60 nucleotide and five amino acid positions, those from West Africa contained unique signatures at 25 nucleotide and two amino acid positions, and viruses from America contained such signatures at 30 nucleotide and five amino acid positions in the E gene. The dissemination of yellow fever viruses from Africa to the Americas is supported by the close genetic relatedness of genotype IIA and IIB viruses and genetic evidence of a possible second introduction of yellow fever virus from West Africa, as illustrated by the TRINID79A virus isolate. The E protein genes of American IIB yellow fever viruses had higher frequencies of amino acid substitutions than did genes of yellow fever viruses of genotypes I and IIA on the basis of comparisons with a consensus amino acid sequence for the yellow fever E gene. The great variation in the E proteins of American yellow fever virus probably results from positive selection imposed by virus interaction with different species of mosquitoes or nonhuman primates in the Americas. PMID:7637022

  10. Nucleotide sequencing and characterization of the genes encoding benzene oxidation enzymes of Pseudomonas putida.

    PubMed Central

    Irie, S; Doi, S; Yorifuji, T; Takagi, M; Yano, K

    1987-01-01

    The nucleotide sequence of the genes from Pseudomonas putida encoding oxidation of benzene to catechol was determined. Five open reading frames were found in the sequence. Four corresponding protein molecules were detected by a DNA-directed in vitro translation system. Escherichia coli cells containing the fragment with the four open reading frames transformed benzene to cis-benzene glycol, which is an intermediate of the oxidation of benzene to catechol. The relation between the product of each cistron and the components of the benzene oxidation enzyme system is discussed. Images PMID:3667527

  11. Remote access to ACNUC nucleotide and protein sequence databases at PBIL.

    PubMed

    Gouy, Manolo; Delmotte, Stéphane

    2008-04-01

    The ACNUC biological sequence database system provides powerful and fast query and extraction capabilities to a variety of nucleotide and protein sequence databases. The collection of ACNUC databases served by the Pôle Bio-Informatique Lyonnais includes the EMBL, GenBank, RefSeq and UniProt nucleotide and protein sequence databases and a series of other sequence databases that support comparative genomics analyses: HOVERGEN and HOGENOM containing families of homologous protein-coding genes from vertebrate and prokaryotic genomes, respectively; Ensembl and Genome Reviews for analyses of prokaryotic and of selected eukaryotic genomes. This report describes the main features of the ACNUC system and the access to ACNUC databases from any internet-connected computer. Such access was made possible by the definition of a remote ACNUC access protocol and the implementation of Application Programming Interfaces between the C, Python and R languages and this communication protocol. Two retrieval programs for ACNUC databases, Query_win, with a graphical user interface and raa_query, with a command line interface, are also described. Altogether, these bioinformatics tools provide users with either ready-to-use means of querying remote sequence databases through a variety of selection criteria, or a simple way to endow application programs with an extensive access to these databases. Remote access to ACNUC databases is open to all and fully documented (http://pbil.univ-lyon1.fr/databases/acnuc/acnuc.html).

  12. Analysis of a cloned colicin Ib gene: complete nucleotide sequence and implications for regulation of expression.

    PubMed Central

    Varley, J M; Boulnois, G J

    1984-01-01

    The complete nucleotide sequence of a 2,971 base pair EcoRI fragment carrying the structural gene for colicin Ib has been determined. The length of the gene is 1,881 nucleotides which is predicted to produce a protein of 626 amino acids and of molecular weight 71,364. The structural gene is flanked by likely promoter and terminator signals and in between the promoter and the ribosome binding site is an inverted repeat sequence which resembles other sequences known to bind the LexA protein. Further analysis of the 5' flanking sequences revealed a second region which may act either as a second LexA binding site and/or in the binding of cyclic AMP receptor protein. Comparison of the predicted amino acid sequence of colicin Ib with that of colicins A and E1 reveals localised homology. The implications of these similarities in the proteins and of regulation of the colicin Ib structural gene are discussed. Images PMID:6091036

  13. Nucleotide sequences of immunoglobulin eta genes of chimpanzee and orangutan: DNA molecular clock and hominoid evolution

    SciTech Connect

    Sakoyama, Y.; Hong, K.J.; Byun, S.M.; Hisajima, H.; Ueda, S.; Yaoita, Y.; Hayashida, H.; Miyata, T.; Honjo, T.

    1987-02-01

    To determine the phylogenetic relationships among hominoids and the dates of their divergence, the complete nucleotide sequences of the constant region of the immunoglobulin eta-chain (C/sub eta1/) genes from chimpanzee and orangutan have been determined. These sequences were compared with the human eta-chain constant-region sequence. A molecular clock (silent molecular clock), measured by the degree of sequence divergence at the synonymous (silent) positions of protein-encoding regions, was introduced for the present study. From the comparison of nucleotide sequences of ..cap alpha../sub 1/-antitrypsin and ..beta..- and delta-globulin genes between humans and Old World monkeys, the silent molecular clock was calibrated: the mean evolutionary rate of silent substitution was determined to be 1.56 x 10/sup -9/ substitutions per site per year. Using the silent molecular clock, the mean divergence dates of chimpanzee and orangutan from the human lineage were estimated as 6.4 +/- 2.6 million years and 17.3 +/- 4.5 million years, respectively. It was also shown that the evolutionary rate of primate genes is considerably slower than those of other mammalian genes.

  14. Studying long 16S rDNA sequences with ultrafast-metagenomic sequence classification using exact alignments (Kraken).

    PubMed

    Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco

    2016-03-01

    Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences.

  15. SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data

    PubMed Central

    2016-01-01

    Next-generation sequencing (NGS) technologies have led to a huge amount of genomic data that need to be analyzed and interpreted. This fact has a huge impact on the DNA sequence alignment process, which nowadays requires the mapping of billions of small DNA sequences onto a reference genome. In this way, sequence alignment remains the most time-consuming stage in the sequence analysis workflow. To deal with this issue, state of the art aligners take advantage of parallelization strategies. However, the existent solutions show limited scalability and have a complex implementation. In this work we introduce SparkBWA, a new tool that exploits the capabilities of a big data technology as Spark to boost the performance of one of the most widely adopted aligner, the Burrows-Wheeler Aligner (BWA). The design of SparkBWA uses two independent software layers in such a way that no modifications to the original BWA source code are required, which assures its compatibility with any BWA version (future or legacy). SparkBWA is evaluated in different scenarios showing noticeable results in terms of performance and scalability. A comparison to other parallel BWA-based aligners validates the benefits of our approach. Finally, an intuitive and flexible API is provided to NGS professionals in order to facilitate the acceptance and adoption of the new tool. The source code of the software described in this paper is publicly available at https://github.com/citiususc/SparkBWA, with a GPL3 license. PMID:27182962

  16. Complete nucleotide sequence of a circular plasmid from the Lyme disease spirochete, Borrelia burgdorferi.

    PubMed Central

    Dunn, J J; Buchstein, S R; Butler, L L; Fisenne, S; Polin, D S; Lade, B N; Luft, B J

    1994-01-01

    We have determined the complete nucleotide sequence of a small circular plasmid from the spirochete Borrelia burgdorferi Ip21, the agent of Lyme disease. The plasmid (cp8.3/Ip21) is 8,303 bp long, has a 76.6% A+T content, and is unstable upon passage of cells in vitro. An analysis of the sequence revealed the presence of two nearly perfect copies of a 184-bp inverted repeat sequence separated by 2,675 bp containing three closely spaced, but nonoverlapping, open reading frames (ORFs). Each inverted repeat ends in sequences that may function as signals for the initiation of transcription and translation of flanking plasmid sequences. A unique oligonucleotide probe based on the repeated sequence showed that the DNA between the repeats is present predominantly in a single orientation. Additional copies of the repeat were not detected elsewhere in the Ip21 genome. An analysis for potential ORFs indicates that the plasmid has nine highly probable protein-coding ORFs and one that is less probable; together, they occupy almost 71% of the nucleotide sequence. Analysis of the deduced amino acid sequences of the ORFs revealed one (ORF-9) with features in common with Borrelia lipoproteins and another (ORF-2) having limited homology with a replication protein, RepC, from a gram-positive plasmid that replicates by a rolling circle (RC) mechanism. Known collectively as RC plasmids, such plasmids require a double-stranded origin at which the Rep protein nicks the DNA to generate a single-stranded replication intermediate. cp8.3/Ip21 has three copies of the heptameric motif characteristically found at a nick site of most RC plasmids. These observations suggest that cp8.3/Ip21 may replicate by an RC mechanism. Images PMID:8169221

  17. The mouse collagen X gene: complete nucleotide sequence, exon structure and expression pattern.

    PubMed Central

    Elima, K; Eerola, I; Rosati, R; Metsäranta, M; Garofalo, S; Perälä, M; De Crombrugghe, B; Vuorio, E

    1993-01-01

    Overlapping genomic clones covering the 7.2 kb mouse alpha 1(X) collagen gene, 0.86 kb of promoter and 1.25 kb of 3'-flanking sequences were isolated from two genomic libraries and characterized by nucleotide sequencing. Typical features of the gene include a unique three-exon structure, similar to that in the chick gene, with the entire triple-helical domain of 463 amino acids coded by a single large exon. The highest degree of amino acid and nucleotide sequence conservation was seen in the coding region for the collagenous and C-terminal non-collagenous domains between the mouse and known chick, bovine and human collagen type X sequences. More divergence between the sequences occurred in the N-terminal non-collagenous domain. Similarity between the mammalian collagen X sequences extended into the 3'-untranslated sequence, particularly near the polyadenylation site. The promoter of the mouse collagen X gene was found to contain two TATAA boxes 159 bp apart; primer extension analyses of the transcription start site revealed that both were functional. The promoter has an unusual structure with a very low G + C content of 28% between positions -220 and -1 of the upstream transcription start site. Northern and in situ hybridization analyses confirmed that the expression of the alpha 1(X) collagen gene is restricted to hypertrophic chondrocytes in tissues undergoing endochondral calcification. The detailed sequence information of the gene is useful for studies on the promoter activity of the gene and for generation of transgenic mice. Images Figure 3 Figure 5 Figure 6 PMID:8424763

  18. Nucleotide sequence of the internal transcribed spacers and 5.8S region of ribosomal DNA in Pinus pinea L.

    PubMed

    Marrocco, R; Gelati, M T; Maggini, F

    1996-01-01

    The nucleotide sequence of the first internal transcribed spacer (ITS1) belonging to different ribosomal RNA genes from Pinus pinea are reported. The analyzed ITS1 can be distinguished on the basis of their length, being one 2631 bp and the other 271 bp long. Nucleotide comparison of these regions did not show appreciable sequence homology. The larger ITS1 contains five tandem arranged subrepeats with size ranging between 219 bp and 237 bp. The nucleotide sequence of the 5.8S and the ITS2 regions belonging to the larger ribosomal RNA gene are also reported.

  19. Nucleotide sequence and analysis of the mgl operon of Escherichia coli K12.

    PubMed

    Hogg, R W; Voelker, C; Von Carlowitz, I

    1991-10-01

    The nucleotide sequence of the Escherichia coli K12 beta-methylgalactoside transport operon, mgl, was determined. Primer extension analysis indicated that the synthesis of mRNA initiates at guanine residue 145 of the determined sequence. The operon contains three open reading frames (ORF). The operator proximal ORF, mglB, encodes the galactose binding protein, a periplasmic protein of 332 amino acids including the 23 residue amino-terminal signal peptide. Following a 62 nucleotide spacer, the second ORF, mglA, is capable of encoding a protein of 506 amino acids. The amino-terminal and carboxyl-terminal halves of this protein are homologous to each other and each half contains a putative nucleotide binding site. The third ORF, mglC, is capable of encoding a hydrophobic protein of 336 amino acids which is thought to generate the transmembrane pore. The overall organization of the mglBAC operon and its potential to encode three proteins is similar to that of the ara FGH high affinity transport operon, located approximately 1 min away on the E. coli K12 chromosome.

  20. A distributed system for fast alignment of next-generation sequencing data.

    PubMed

    Srimani, Jaydeep K; Wu, Po-Yen; Phan, John H; Wang, May D

    2010-12-01

    We developed a scalable distributed computing system using the Berkeley Open Interface for Network Computing (BOINC) to align next-generation sequencing (NGS) data quickly and accurately. NGS technology is emerging as a promising platform for gene expression analysis due to its high sensitivity compared to traditional genomic microarray technology. However, despite the benefits, NGS datasets can be prohibitively large, requiring significant computing resources to obtain sequence alignment results. Moreover, as the data and alignment algorithms become more prevalent, it will become necessary to examine the effect of the multitude of alignment parameters on various NGS systems. We validate the distributed software system by (1) computing simple timing results to show the speed-up gained by using multiple computers, (2) optimizing alignment parameters using simulated NGS data, and (3) computing NGS expression levels for a single biological sample using optimal parameters and comparing these expression levels to that of a microarray sample. Results indicate that the distributed alignment system achieves approximately a linear speed-up and correctly distributes sequence data to and gathers alignment results from multiple compute clients.

  1. iPBA: a tool for protein structure comparison using sequence alignment strategies

    PubMed Central

    Gelly, Jean-Christophe; Joseph, Agnel Praveen; Srinivasan, Narayanaswamy; de Brevern, Alexandre G.

    2011-01-01

    With the immense growth in the number of available protein structures, fast and accurate structure comparison has been essential. We propose an efficient method for structure comparison, based on a structural alphabet. Protein Blocks (PBs) is a widely used structural alphabet with 16 pentapeptide conformations that can fairly approximate a complete protein chain. Thus a 3D structure can be translated into a 1D sequence of PBs. With a simple Needleman–Wunsch approach and a raw PB substitution matrix, PB-based structural alignments were better than many popular methods. iPBA web server presents an improved alignment approach using (i) specialized PB Substitution Matrices (SM) and (ii) anchor-based alignment methodology. With these developments, the quality of ∼88% of alignments was improved. iPBA alignments were also better than DALI, MUSTANG and GANGSTA+ in >80% of the cases. The webserver is designed to for both pairwise comparisons and database searches. Outputs are given as sequence alignment and superposed 3D structures displayed using PyMol and Jmol. A local alignment option for detecting subs-structural similarity is also embedded. As a fast and efficient ‘sequence-based’ structure comparison tool, we believe that it will be quite useful to the scientific community. iPBA can be accessed at http://www.dsimb.inserm.fr/dsimb_tools/ipba/. PMID:21586582

  2. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

    DOE PAGES

    Daily, Jeffrey A.

    2016-02-10

    Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less

  3. A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives

    PubMed Central

    Thompson, Julie D.; Linard, Benjamin; Lecompte, Odile; Poch, Olivier

    2011-01-01

    Multiple comparison or alignmentof protein sequences has become a fundamental tool in many different domains in modern molecular biology, from evolutionary studies to prediction of 2D/3D structure, molecular function and inter-molecular interactions etc. By placing the sequence in the framework of the overall family, multiple alignments can be used to identify conserved features and to highlight differences or specificities. In this paper, we describe a comprehensive evaluation of many of the most popular methods for multiple sequence alignment (MSA), based on a new benchmark test set. The benchmark is designed to represent typical problems encountered when aligning the large protein sequence sets that result from today's high throughput biotechnologies. We show that alignmentmethods have significantly progressed and can now identify most of the shared sequence features that determine the broad molecular function(s) of a protein family, even for divergent sequences. However,we have identified a number of important challenges. First, the locally conserved regions, that reflect functional specificities or that modulate a protein's function in a given cellular context,are less well aligned. Second, motifs in natively disordered regions are often misaligned. Third, the badly predicted or fragmentary protein sequences, which make up a large proportion of today's databases, lead to a significant number of alignment errors. Based on this study, we demonstrate that the existing MSA methods can be exploited in combination to improve alignment accuracy, although novel approaches will still be needed to fully explore the most difficult regions. We then propose knowledge-enabled, dynamic solutions that will hopefully pave the way to enhanced alignment construction and exploitation in future evolutionary systems biology studies. PMID:21483869

  4. Mouse mammary tumor virus-like nucleotide sequences in canine and feline mammary tumors.

    PubMed

    Hsu, Wei-Li; Lin, Hsing-Yi; Chiou, Shyan-Song; Chang, Chao-Chin; Wang, Szu-Pong; Lin, Kuan-Hsun; Chulakasian, Songkhla; Wong, Min-Liang; Chang, Shih-Chieh

    2010-12-01

    Mouse mammary tumor virus (MMTV) has been speculated to be involved in human breast cancer. Companion animals, dogs, and cats with intimate human contacts may contribute to the transmission of MMTV between mouse and human. The aim of this study was to detect MMTV-like nucleotide sequences in canine and feline mammary tumors by nested PCR. Results showed that the presence of MMTV-like env and LTR sequences in canine malignant mammary tumors was 3.49% (3/86) and 18.60% (16/86), respectively. For feline malignant mammary tumors, the presence of both env and LTR sequences was found to be 22.22% (2/9). Nevertheless, the MMTV-like LTR and env sequences also were detected in normal mammary glands of dogs and cats. In comparisons of the MMTV-like DNA sequences of our findings to those of NIH 3T3 (MMTV-positive murine cell line) and human breast cancer cells, the sequence similarities ranged from 94 to 98%. Phylogenetic analysis revealed that intermixing among sequences identified from tissues of different hosts, i.e., mouse, dog, cat, and human, indicated the MMTV-like DNA existing in these hosts. Moreover, the env transcript was detected in 1 of the 19 MMTV-positive samples by reverse transcription-PCR. Taken together, our study provides evidence for the existence and expression of MMTV-like sequences in neoplastic and normal mammary glands of dogs and cats.

  5. Mouse Mammary Tumor Virus-Like Nucleotide Sequences in Canine and Feline Mammary Tumors▿

    PubMed Central

    Hsu, Wei-Li; Lin, Hsing-Yi; Chiou, Shyan-Song; Chang, Chao-Chin; Wang, Szu-Pong; Lin, Kuan-Hsun; Chulakasian, Songkhla; Wong, Min-Liang; Chang, Shih-Chieh

    2010-01-01

    Mouse mammary tumor virus (MMTV) has been speculated to be involved in human breast cancer. Companion animals, dogs, and cats with intimate human contacts may contribute to the transmission of MMTV between mouse and human. The aim of this study was to detect MMTV-like nucleotide sequences in canine and feline mammary tumors by nested PCR. Results showed that the presence of MMTV-like env and LTR sequences in canine malignant mammary tumors was 3.49% (3/86) and 18.60% (16/86), respectively. For feline malignant mammary tumors, the presence of both env and LTR sequences was found to be 22.22% (2/9). Nevertheless, the MMTV-like LTR and env sequences also were detected in normal mammary glands of dogs and cats. In comparisons of the MMTV-like DNA sequences of our findings to those of NIH 3T3 (MMTV-positive murine cell line) and human breast cancer cells, the sequence similarities ranged from 94 to 98%. Phylogenetic analysis revealed that intermixing among sequences identified from tissues of different hosts, i.e., mouse, dog, cat, and human, indicated the MMTV-like DNA existing in these hosts. Moreover, the env transcript was detected in 1 of the 19 MMTV-positive samples by reverse transcription-PCR. Taken together, our study provides evidence for the existence and expression of MMTV-like sequences in neoplastic and normal mammary glands of dogs and cats. PMID:20881168

  6. Nucleotide sequence of the cell wall proteinase gene of Streptococcus cremoris Wg2.

    PubMed Central

    Kok, J; Leenhouts, K J; Haandrikman, A J; Ledeboer, A M; Venema, G

    1988-01-01

    A 6.5-kilobase HindIII fragment that specifies the proteolytic activity of Streptococcus cremoris Wg2 was sequenced entirely. The nucleotide sequence revealed two open reading frames (ORFs), a small ORF1 with 295 codons and a large ORF2 containing 1,772 codons. For both ORFs, there was no stop codon on the HindIII fragment. A partially overlapping PstI fragment was used to locate the translation stop of the large ORF2. The entire ORF2 contained 1,902 coding triplets, followed by an apparently rho-independent terminator sequence. The inferred amino acid sequence would result in a protein of 200 kilodaltons. Both ORFs have their putative transcription and translation signals in a 345-base-pair ClaI fragment. ORF2 is preceded by a promoter region containing a 15-base-pair complementary direct repeat. Both the truncated 33- and the 200-kilodalton proteins have a signal peptide-like N-terminal amino acid sequence. The protein specified by ORF2 contained regions of extensive homology with serine proteases of the subtilisin family. Specifically, amino acid sequences involved in the formation of the active site (viz., Asp-32, His-64, and Ser-221 of the subtilisins) are well conserved in the S. cremoris Wg2 proteinase. The homologous sequences are separated by nonhomologous regions which contain several inserts, most notably a sequence of approximately 200 amino acids between the His and Ser residues of the active site. PMID:3278687

  7. Total chemical synthesis of a 77-nucleotide-long RNA sequence having methionine-acceptance activity.

    PubMed Central

    Ogilvie, K K; Usman, N; Nicoghosian, K; Cedergren, R J

    1988-01-01

    Chemical synthesis is described of a 77-nucleotide-long RNA molecule that has the sequence of an Escherichia coli Ado-47-containing tRNA(fMet) species in which the modified nucleosides have been substituted by their unmodified parent nucleosides. The sequence was assembled on a solid-phase, controlled-pore glass support in a stepwise manner with an automated DNA synthesizer. The ribonucleotide building blocks used were fully protected 5'-monomethoxytrityl-2'-silyl-3'-N,N-diisopropylaminophosphoram idites. p-Nitro-phenylethyl groups were used to protect the O6 of guanine residues. The fully deprotected tRNA analogue was characterized by polyacrylamide gel electrophoresis (sizing), terminal nucleotide analysis, sequencing, and total enzyme degradation, all of which indicated that the sequence was correct and contained only 3-5 linkages. The 77-mer was then assayed for amino acid acceptor activity by using E. coli methionyl-tRNA synthetase. The results indicated that the synthetic product, lacking modified bases, is a substrate for the enzyme and has an amino acid acceptance 11% of that of the major native species, tRNA(fMet) containing 7-methylguanosine at position 47. Images PMID:3413059

  8. Mitochondrial DNA in the sea urchin Arbacia lixula: evolutionary inferences from nucleotide sequence analysis.

    PubMed

    De Giorgi, C; Lanave, C; Musci, M D; Saccone, C

    1991-07-01

    From the stirodont Arbacia lixula we determined the sequence of 5,127 nucleotides of mitochondrial DNA (mtDNA) encompassing 18 tRNAs, two complete coding genes, parts of three other coding genes, and part of the 12S ribosomal RNA (rRNA). The sequence confirms that the organization of mtDNA is conserved within echinoids. Furthermore, it underlines the following peculiar features of sea urchin mtDNA: the clustering of tRNAs, the short noncoding regulatory sequence, and the separation by the ND1 and ND2 genes of the two rRNA genes. Comparison with the orthologous sequences from the camarodont species Paracentrotus lividus and Strongylocentrotus purpuratus revealed that (1) echinoids have an extra piece on the amino terminus of the ND5 gene that is probably the remnant of an old leucine tRNA gene; (2) third-position codon nucleotide usage has diverged between A. lixula and the camarodont species to a significant extent, implying different directional mutational pressures; and (3) the stirodont-camarodont divergence occurred twice as long ago as did the P. lividus-S. purpuratus divergence.

  9. Cloning, nucleotide sequence, and expression of the Pasteurella haemolytica A1 glycoprotease gene.

    PubMed Central

    Abdullah, K M; Lo, R Y; Mellors, A

    1991-01-01

    Pasteurella haemolytica serotype A1 secretes a glycoprotease which is specific for O-sialoglycoproteins such as glycophorin A. The gene encoding the glycoprotease enzyme has been cloned in the recombinant plasmid pH1, and its nucleotide sequence has been determined. The gene (designated gcp) codes for a protein of 35.2 kDa, and an active enzyme protein of this molecular mass can be observed in Escherichia coli clones carrying pPH1. In vivo labeling of plasmid-encoded proteins in E. coli maxicells demonstrated the expression of a 35-kDa protein from pPH1. The amino-terminal sequence of the heterologously expressed protein corresponds to that predicted from the nucleotide sequence. The glycoprotease is a neutral metalloprotease, and the predicted amino acid sequence of the glycoprotease contains a putative zinc-binding site. The gene shows no significant homology with the genes for other proteases of procaryotic or eucaryotic origin. However, there is substantial homology between gcp and an E. coli gene, orfX, whose product is believed to function in the regulation of macromolecule biosynthesis. Images PMID:1885539

  10. A novel multi-alignment pipeline for high-throughput sequencing data.

    PubMed

    Huang, Shunping; Holt, James; Kao, Chia-Yu; McMillan, Leonard; Wang, Wei

    2014-01-01

    Mapping reads to a reference sequence is a common step when analyzing allele effects in high-throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending on the genetic distances of the target sequences from the reference. To avoid this bias, researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single-reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins. Database URL: http://csbio.unc.edu/CCstatus/index.py?run=Pseudo.

  11. Support for linguistic macrofamilies from weighted sequence alignment.

    PubMed

    Jäger, Gerhard

    2015-10-13

    Computational phylogenetics is in the process of revolutionizing historical linguistics. Recent applications have shed new light on controversial issues, such as the location and time depth of language families and the dynamics of their spread. So far, these approaches have been limited to single-language families because they rely on a large body of expert cognacy judgments or grammatical classifications, which is currently unavailable for most language families. The present study pursues a different approach. Starting from raw phonetic transcription of core vocabulary items from very diverse languages, it applies weighted string alignment to track both phonetic and lexical change. Applied to a collection of ∼1,000 Eurasian languages and dialects, this method, combined with phylogenetic inference, leads to a classification in excellent agreement with established findings of historical linguistics. Furthermore, it provides strong statistical support for several putative macrofamilies contested in current historical linguistics. In particular, there is a solid signal for the Nostratic/Eurasiatic macrofamily.

  12. The complete nucleotide sequence and genome organization of a novel carmovirus - Honeysuckle ringspot virus isolated from honeysuckle.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A virus associated with yellow to purple ringspot on honeysuckle plants has been detected and tentatively named as Honeysuckle ringspot virus (HnRSV). The complete nucleotide sequence of HnRSV has been determined from infected honeysuckle. The genomic RNA of HnRSV is 3,956 nucleotides in length and ...

  13. Skeleton-based human action recognition using multiple sequence alignment

    NASA Astrophysics Data System (ADS)

    Ding, Wenwen; Liu, Kai; Cheng, Fei; Zhang, Jin; Li, YunSong

    2015-05-01

    Human action recognition and analysis is an active research topic in computer vision for many years. This paper presents a method to represent human actions based on trajectories consisting of 3D joint positions. This method first decompose action into a sequence of meaningful atomic actions (actionlets), and then label actionlets with English alphabets according to the Davies-Bouldin index value. Therefore, an action can be represented using a sequence of actionlet symbols, which will preserve the temporal order of occurrence of each of the actionlets. Finally, we employ sequence comparison to classify multiple actions through using string matching algorithms (Needleman-Wunsch). The effectiveness of the proposed method is evaluated on datasets captured by commodity depth cameras. Experiments of the proposed method on three challenging 3D action datasets show promising results.

  14. A resource of genome-wide single-nucleotide polymorphisms generated by RAD tag sequencing in the critically endangered European eel.

    PubMed

    Pujolar, J M; Jacobsen, M W; Frydenberg, J; Als, T D; Larsen, P F; Maes, G E; Zane, L; Jian, J B; Cheng, L; Hansen, M M

    2013-07-01

    Reduced representation genome sequencing such as restriction-site-associated DNA (RAD) sequencing is finding increased use to identify and genotype large numbers of single-nucleotide polymorphisms (SNPs) in model and nonmodel species. We generated a unique resource of novel SNP markers for the European eel using the RAD sequencing approach that was simultaneously identified and scored in a genome-wide scan of 30 individuals. Whereas genomic resources are increasingly becoming available for this species, including the recent release of a draft genome, no genome-wide set of SNP markers was available until now. The generated SNPs were widely distributed across the eel genome, aligning to 4779 different contigs and 19,703 different scaffolds. Significant variation was identified, with an average nucleotide diversity of 0.00529 across individuals. Results varied widely across the genome, ranging from 0.00048 to 0.00737 per locus. Based on the average nucleotide diversity across all loci, long-term effective population size was estimated to range between 132,000 and 1,320,000, which is much higher than previous estimates based on microsatellite loci. The generated SNP resource consisting of 82,425 loci and 376,918 associated SNPs provides a valuable tool for future population genetics and genomics studies and allows for targeting specific genes and particularly interesting regions of the eel genome.

  15. SARA-Coffee web server, a tool for the computation of RNA sequence and structure multiple alignments

    PubMed Central

    Di Tommaso, Paolo; Bussotti, Giovanni; Kemena, Carsten; Capriotti, Emidio; Chatzou, Maria; Prieto, Pablo; Notredame, Cedric

    2014-01-01

    This article introduces the SARA-Coffee web server; a service allowing the online computation of 3D structure based multiple RNA sequence alignments. The server makes it possible to combine sequences with and without known 3D structures. Given a set of sequences SARA-Coffee outputs a multiple sequence alignment along with a reliability index for every sequence, column and aligned residue. SARA-Coffee combines SARA, a pairwise structural RNA aligner with the R-Coffee multiple RNA aligner in a way that has been shown to improve alignment accuracy over most sequence aligners when enough structural data is available. The server can be accessed from http://tcoffee.crg.cat/apps/tcoffee/do:saracoffee. PMID:24972831

  16. High-affinity L-arabinose transport operon. Nucleotide sequence and analysis of gene products.

    PubMed

    Scripture, J B; Voelker, C; Miller, S; O'Donnell, R T; Polgar, L; Rade, J; Horazdovsky, B F; Hogg, R W

    1987-09-05

    The nucleotide sequence of the "high-affinity" L-arabinose transport operon has been determined 3' from the regulatory region and found to contain three open reading frames designated araF, araG and araH. The first gene 3' to the regulatory region, araF, encodes the 23-residue signal peptide and the 306-residue mature form of the L-arabinose binding protein (33,200 Mr). The binding protein, which has been described elsewhere, is hydrophilic, soluble and found in the periplasm of Escherichia coli. This gene is followed by an intragenic space of 72 nucleotides, which contains a region of dyad symmetry 23 nucleotides long capable of forming an 11-member stem-loop. The second gene, designated araG, contains an open reading frame capable of encoding an equally hydrophilic protein containing 504 residues (55,000 Mr). Following a 14-nucleotide spacer, which does not appear to have any secondary structure, the third open reading frame, herein designated araH, is capable of encoding a hydrophobic protein containing 329 residues (34,000 Mr) that can only be envisioned as having an integral membrane location. 3' to araH there is a T-rich region containing a 24-nucleotide area of dyad symmetry centered 55 nucleotides from the termination codon. Analysis of the derived primary sequences of the araG and araH products indicates the nature and potential features of these components. The araG protein was found to possess internal homology between its amino and carboxyl-terminal halves, suggesting a common origin. The araG gene product has been shown to be homologous to the rbsA gene product, the hisP product, the ptsB product and the malK product, all of which presumably play similar roles in their respective transport systems. Putative ATP binding sites are observed within the regions of homology. The araH gene product has been shown to be homologous to the rbsC gene product, which is the first observed homology between two purported membrane proteins.

  17. PyMod: sequence similarity searches, multiple sequence-structure alignments, and homology modeling within PyMOL

    PubMed Central

    2012-01-01

    Background In recent years, an exponential growing number of tools for protein sequence analysis, editing and modeling tasks have been put at the disposal of the scientific community. Despite the vast majority of these tools have been released as open source software, their deep learning curves often discourages even the most experienced users. Results A simple and intuitive interface, PyMod, between the popular molecular graphics system PyMOL and several other tools (i.e., [PSI-]BLAST, ClustalW, MUSCLE, CEalign and MODELLER) has been developed, to show how the integration of the individual steps required for homology modeling and sequence/structure analysis within the PyMOL framework can hugely simplify these tasks. Sequence similarity searches, multiple sequence and structural alignments generation and editing, and even the possibility to merge sequence and structure alignments have been implemented in PyMod, with the aim of creating a simple, yet powerful tool for sequence and structure analysis and building of homology models. Conclusions PyMod represents a new tool for the analysis and the manipulation of protein sequences and structures. The ease of use, integration with many sequence retrieving and alignment tools and PyMOL, one of the most used molecular visualization system, are the key features of this tool. Source code, installation instructions, video tutorials and a user's guide are freely available at the URL http://schubert.bio.uniroma1.it/pymod/index.html PMID:22536966

  18. Nucleotide sequence of the bean strain of southern bean mosaic virus.

    PubMed

    Othman, Y; Hull, R

    1995-01-10

    The genome of the bean strain of southern bean mosaic virus (SBMV-B) comprises 4109 nucleotides and thus is slightly shorter than those of the two other sequenced sobemoviruses (southern bean mosaic virus, cowpea strain (SBMV-C) and rice yellow mottle virus (RYMV)). SBMV-B has an overall sequence similarity with SBMV-C of 55% and with RYMV of 45%. Three potential open reading frames (ORFs) were recognized in SBMV-B which were in similar positions in the genomes of SBMV-C and RYMV. However, there was no analog of SBMV-C and RYMV ORF 3. From a comparison of the predicted sequences of the ORFs of these three sobemoviruses and of the noncoding regions, it is suggested that the two SBMV strains differ from one another as much as they do from RYMV and that they should be considered as different viruses.

  19. Nucleotide sequence of a satellite RNA associated with carrot motley dwarf in parsley and carrot.

    PubMed

    Menzel, Wulf; Maiss, Edgar; Vetten, H Josef

    2009-02-01

    Carrot motley dwarf (CMD) is known to result from a mixed infection by two viruses, the polerovirus Carrot red leaf virus and one of the umbraviruses Carrot mottle mimic virus or Carrot mottle virus. Some umbraviruses have been shown to be associated with small satellite (sat) RNAs, but none have been reported for the latter two. A CMD-affected parsley plant was used for sap transmission to test plants, that were used for dsRNA isolation. The presence of a 0.8-kbp dsRNA indicated the occurrence of a hitherto unrecognized satRNA associated with CMD. The satRNAs of the CMD isolate from parsley and an isolate from carrot have been sequenced and showed 94% sequence identity. Nucleotide sequences and putative translation products had no significant similarities to GenBank entries. To our knowledge, this is the first report of satRNAs associated with CMD.

  20. Conservation of nucleotide sequences for molecular diagnosis of Middle East respiratory syndrome coronavirus, 2015.

    PubMed

    Furuse, Yuki; Okamoto, Michiko; Oshitani, Hitoshi

    2015-11-01

    Infection due to the Middle East respiratory syndrome coronavirus (MERS-CoV) is widespread. The present study was performed to assess the protocols used for the molecular diagnosis of MERS-CoV by analyzing the nucleotide sequences of viruses detected between 2012 and 2015, including sequences from the large outbreak in eastern Asia in 2015. Although the diagnostic protocols were established only 2 years ago, mismatches between the sequences of primers/probes and viruses were found for several of the assays. Such mismatches could lead to a lower sensitivity of the assay, thereby leading to false-negative diagnosis. A slight modification in the primer design is suggested. Protocols for the molecular diagnosis of viral infections should be reviewed regularly after they are established, particularly for viruses that pose a great threat to public health such as MERS-CoV.

  1. Single nucleotide polymorphisms from Theobroma cacao expressed sequence tags associated with witches' broom disease in cacao.

    PubMed

    Lima, L S; Gramacho, K P; Carels, N; Novais, R; Gaiotto, F A; Lopes, U V; Gesteira, A S; Zaidan, H A; Cascardo, J C M; Pires, J L; Micheli, F

    2009-07-14

    In order to increase the efficiency of cacao tree resistance to witches' broom disease, which is caused by Moniliophthora perniciosa (Tricholomataceae), we looked for molecular markers that could help in the selection of resistant cacao genotypes. Among the different markers useful for developing marker-assisted selection, single nucleotide polymorphisms (SNPs) constitute the most common type of sequence difference between alleles and can be easily detected by in silico analysis from expressed sequence tag libraries. We report the first detection and analysis of SNPs from cacao-M. perniciosa interaction expressed sequence tags, using bioinformatics. Selection based on analysis of these SNPs should be useful for developing cacao varieties resistant to this devastating disease.

  2. Nucleotide sequence of yeast GDH1 encoding nicotinamide adenine dinucleotide phosphate-dependent glutamate dehydrogenase.

    PubMed

    Moye, W S; Amuro, N; Rao, J K; Zalkin, H

    1985-07-15

    The yeast GDH1 gene encodes NADP-dependent glutamate dehydrogenase. This gene was isolated by complementation of an Escherichia coli glutamate auxotroph. NADP-dependent glutamate dehydrogenase was overproduced 6-10-fold in Saccharomyces cerevisiae bearing GDH1 on a multicopy plasmid. The nucleotide sequence of the 1362-base pair coding region and 5' and 3' flanking sequences were determined. Transcription start sites were located by S1 nuclease mapping. Regulation of GDH1 was not maintained when the gene was present on a multicopy plasmid. Protein secondary structure predictions identified a region with potential to form the dinucleotide-binding domain. The amino acid sequences of the yeast and Neurospora crassa enzymes are 63% conserved. Unlike the N. crassa gene, yeast GDH1 has no introns.

  3. Nucleotide-sequence-specific de novo methylation in a somatic murine cell line.

    PubMed Central

    Szyf, M; Schimmer, B P; Seidman, J G

    1989-01-01

    DNA fragments encoding the mouse steroid 21-hydroxylase (C21 or Cyp21A1) gene are de novo methylated when introduced into the mouse adrenocortical tumor cell line Y1 by DNA-mediated gene transfer. Although CCGG sequences within the C21 gene are de novo methylated, CCGG sites within flanking vector sequences, other mammalian gene sequences driven by the C21 promoter, and the neomycin-resistance gene, which was cotransfected with the C21 gene, do not become methylated. At least two separate signals for de novo methylation are encoded within the gene since three fragments derived from the C21 gene were methylated de novo. Specific de novo methylation of C21-derived sequences does not occur in L cells or Y1 kin8 cells; this suggests that the cellular factors needed for de novo methylation of the C21 gene are not ubiquitous. Most DNA sequences are not de novo methylated when introduced into somatic cells and DNA sequences other than the C21 gene are not de novo methylated when introduced into Y1 cells. Several groups have suggested that de novo methylation occurs in early embryonic cells and that somatic cells strictly maintain their methylation pattern by a semiconservative methyltransferase. Our results suggest that de novo methylation of specific nucleotide sequences can occur in some mammalian somatic cells. Images PMID:2789380

  4. Flexible, Fast and Accurate Sequence Alignment Profiling on GPGPU with PaSWAS

    PubMed Central

    Warris, Sven; Yalcin, Feyruz; Jackson, Katherine J. L.; Nap, Jan Peter

    2015-01-01

    Motivation To obtain large-scale sequence alignments in a fast and flexible way is an important step in the analyses of next generation sequencing data. Applications based on the Smith-Waterman (SW) algorithm are often either not fast enough, limited to dedicated tasks or not sufficiently accurate due to statistical issues. Current SW implementations that run on graphics hardware do not report the alignment details necessary for further analysis. Results With the Parallel SW Alignment Software (PaSWAS) it is possible (a) to have easy access to the computational power of NVIDIA-based general purpose graphics processing units (GPGPUs) to perform high-speed sequence alignments, and (b) retrieve relevant information such as score, number of gaps and mismatches. The software reports multiple hits per alignment. The added value of the new SW implementation is demonstrated with two test cases: (1) tag recovery in next generation sequence data and (2) isotype assignment within an immunoglobulin 454 sequence data set. Both cases show the usability and versatility of the new parallel Smith-Waterman implementation. PMID:25830241

  5. Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

    PubMed

    Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

    2011-01-01

    The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

  6. Developing Single Nucleotide Polymorphism (SNP) markers from transcriptome sequences for the identification of longan (Dimocarpus longan) germplasm

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Longan (Dimocarpus longan Lour.) is an important tropical fruit tree crop. Accurate varietal identification is essential for germplasm management and breeding. Using longan transcriptome sequences from public databases, we developed single nucleotide polymorphism (SNP) markers; validated 60 SNPs in...

  7. Complete Nucleotide Sequence of an Australian Isolate of Turnip mosaic virus before and after Seven Years of Serial Passaging

    PubMed Central

    Pretorius, Lara; Moyle, Richard L.; Dalton-Morgan, Jessica; Hussein, Nasser

    2016-01-01

    The complete genome sequence of an Australian isolate of Turnip mosaic virus was determined by Sanger sequencing. After seven years of serial passaging by mechanical inoculation, the isolate was resequenced by RNA sequencing (RNA-Seq). Eighteen single nucleotide polymorphisms were identified between the isolates. Both isolates had 96% identity to isolate AUST10. PMID:27856582

  8. Distant neighbor base sequence context effects in human nucleotide excision repair of a benzo[a]pyrene-derived DNA lesion.

    PubMed

    Cai, Yuqin; Kropachev, Konstantin; Xu, Rong; Tang, Yijin; Kolbanovskii, Marina; Kolbanovskii, Alexander; Amin, Shantu; Patel, Dinshaw J; Broyde, Suse; Geacintov, Nicholas E

    2010-06-11

    The effects of non-nearest base sequences, beyond the nucleotides flanking a DNA lesion on either side, on nucleotide excision repair (NER) in extracts from human cells were investigated. We constructed two duplexes containing the same minor groove-aligned 10S (+)-trans-anti-B[a]P-N(2)-dG (G*) DNA adduct, derived from the environmental carcinogen benzo[a]pyrene (B[a]P): 5'-C-C-A-T-C-G*-C-T-A-C-C-3' (CG*C-I), and 5'-C-A-C3-A4-C5-G*-C-A-C-A-C-3' (CG*C-II). We used polyacrylamide gel electrophoresis to compare the extent of DNA bending, and molecular dynamics simulations to analyze the structural characteristics of these two DNA duplexes. The NER efficiencies are 1.6(+/-0.2)-fold greater in the case of the CG*C-II than the CG*C-I sequence context in 135-mer duplexes. Gel electrophoresis and self-ligation circularization experiments revealed that the CG*C-II duplex is more bent than the CG*C-I duplex, while molecular dynamics simulations showed that the unique -C3-A4-C5- segment in the CG*C-II duplex plays a key role. The presence of a minor groove-positioned guanine amino group, the Watson-Crick partner to C3, acts as a wedge; facilitated by a highly deformable local -C3-A4- base step, this amino group allows the B[a]P ring system to produce a more enlarged minor groove in CG*C-II than in CG*C-I, as well as a local untwisting and enlarged and flexible Roll only in the CG*C-II sequence. These structural properties fit well with our earlier findings that in the case of the family of minor groove 10S (+)-trans-anti-B[a]P-N(2)-dG lesions, flexible bends and enlarged minor groove widths constitute NER recognition signals, and extend our understanding of sequence context effects on NER to the neighbors that are distant to the lesion.

  9. Nucleotide sequence and expression of the Enterobacter aerogenes alpha-acetolactate decarboxylase gene in brewer's yeast.

    PubMed Central

    Sone, H; Fujii, T; Kondo, K; Shimizu, F; Tanaka, J; Inoue, T

    1988-01-01

    The nucleotide sequence of a 1.4-kilobase DNA fragment containing the alpha-acetolactate decarboxylase gene of Enterobacter aerogenes was determined. The sequence contains an entire protein-coding region of 780 nucleotides which encodes an alpha-acetolactate decarboxylase of 260 amino acids. The DNA sequence coding for alpha-acetolactate decarboxylase was placed under the control of the alcohol dehydrogenase I promoter of the yeast Saccharomyces cerevisiae in a plasmid capable of autonomous replication in both S. cerevisiae and Escherichia coli. Brewer's yeast cells transformed by this plasmid showed alpha-acetolactate decarboxylase activity and were used in laboratory-scale fermentation experiments. These experiments revealed that the diacetyl concentration in wort fermented by the plasmid-containing yeast strain was significantly lower than that in wort fermented by the parental strain. These results indicated that the alpha-acetolactate decarboxylase activity produced by brewer's yeast cells degraded alpha-acetolactate and that this degradation caused a decrease in diacetyl production. PMID:3278689

  10. The nucleotide sequence of sacbrood virus of the honey bee: an insect picorna-like virus.

    PubMed

    Ghosh, R C; Ball, B V; Willcocks, M M; Carter, M J

    1999-06-01

    We have determined the nucleotide sequence of sacbrood virus (SBV), which causes a fatal infection of honey bee larvae. The genomic RNA of SBV is longer than that of typical mammalian picornaviruses (8832 nucleotides) and contains a single, large open reading frame (179-8752) encoding a polyprotein of 2858 amino acids. Sequence comparison with other virus polyproteins revealed regions of similarity to characterized helicase, protease and RNA-dependent RNA polymerase domains; structural genes were located at the 5' terminus with non-structural genes at the 3' end. Picornavirus-like agents of insects have two distinct genomic organizations; some resemble mammalian picornaviruses with structural genes at the 5' end and non-structural genes at the 3' end, and others resemble caliciviruses in which this order is reversed; SBV thus belongs to the former type. Sequence comparison suggested that SBV is distantly related to infectious flacherie virus (IFV) of the silk worm, which possesses an RNA of similar size and gene order.

  11. Infectivity and complete nucleotide sequence of cucumber fruit mottle mosaic virus isolate Cm cDNA.

    PubMed

    Rhee, Sun-Ju; Hong, Jin-Sung; Lee, Gung Pyo

    2014-07-01

    Three isolates of cucumber fruit mottle mosaic virus (CFMMV) were collected from melon, cucumber, and pumpkin plants in Korea. A full-length cDNA clone of CFMMV-Cm (melon isolate) was produced and evaluated for infectivity after T7 transcription in vitro (pT7CF-Cmflc). The complete CFMMV genome sequence of the infectious clone pT7CF-Cmflc was determined. The genome of CFMMV-Cm consisted of 6,571 nucleotides and shared high nucleotide sequence identity (98.8 %) with the Israel isolate of CFMMV. Based on the infectious clone pT7CF-Cmflc, a CaMV 35S-promoter driven cDNA clone (p35SCF-Cmflc) was subsequently constructed and sequenced. Mechanical inoculation with RNA transcripts of pT7CF-Cmflc and agro-inoculation with p35SCF-Cmflc resulted in systemic infection of cucumber and melon, producing symptoms similar to those produced by CFMMV-Cm. Progeny virus in infected plants was detected by RT-PCR, western blot assay, and transmission electron microscopy.

  12. Structure and nucleotide sequence of the rat intestinal vitamin D-dependent calcium binding protein gene.

    PubMed Central

    Krisinger, J; Darwish, H; Maeda, N; DeLuca, H F

    1988-01-01

    The vitamin D-dependent intestinal calcium binding protein (ICaBP, 9 kDa) is under transcriptional regulation by 1,25-dihydroxyvitamin D3 [1,25-(OH)2D3], the hormonal active form of the vitamin. To study the mechanism of gene regulation by 1,25-(OH)2D3, we isolated the rat ICaBP gene by using a cDNA probe. Its nucleotide sequence revealed 3 exons separated by 2 introns within approximately 3 kilobases. The first exon represents only noncoding sequences, while the second and third encode the two calcium binding domains of the protein. The gene contains a 15-base-pair imperfect palindrome in the first intron that shows high homology to the estrogen-responsive element. This sequence may represent the vitamin D-responsive element involved in the regulation of the ICaBP gene. The second intron shows an 84-base-pair-long simple nucleotide repeat that implicates Z-DNA formation. Genomic Southern analysis shows that the rat gene is represented as a single copy. Images PMID:3194402

  13. Complete nucleotide sequence and genome organization of a Cactus virus X strain from Hylocereus undatus (Cactaceae).

    PubMed

    Liou, M R; Chen, Y R; Liou, R F

    2004-05-01

    The complete nucleotide sequence of a strain of Cactus virus X (CVX-Hu) isolated from Hylocereus undatus (Cactaceae) has been determined. Excluding the poly(A) tail, the sequence is 6614 nucleotides in length and contains seven open reading frames (ORFs). The genome organization of CVX is similar to that of other potexviruses. ORF1 encodes the putative viral replicase with conserved methyltransferase, helicase, and polymerase motifs. Within ORF1, two other ORFs were located separately in the +2 reading frame, we call these ORF6 and ORF7. ORF2, 3, and 4, which form the "triple gene block" characteristic of the potexviruses, encode proteins with molecular mass of 25, 12, and 7 KDa, respectively. ORF5 encodes the coat protein with an estimated molecular mass of 24 KDa. Sequence analysis indicated that proteins encoded by ORF1-5 display certain degree of homology to the corresponding proteins of other potexviruses. Putative product of ORF6, however, shows no significant similarity to those of other potexviruses. Phylogenetic analyses based on the replicase (the methyltransferase, helicase, and polymerase domains) and coat protein demonstrated a closer relationship of CVX with Bamboo mosaic virus, Cassava common mosaic virus, Foxtail mosaic virus, Papaya mosaic virus, and Plantago asiatica mosaic virus.

  14. Nucleotide sequence conservation of novel and established cis-regulatory sites within the tyrosine hydroxylase gene promoter

    PubMed Central

    Wang, Meng; Banerjee, Kasturi; Baker, Harriet; Cave, John W.

    2015-01-01

    Tyrosine hydroxylase (TH) is the rate-limiting enzyme in catecholamine biosynthesis and its gene proximal promoter ( < 1 kb upstream from the transcription start site) is essential for regulating transcription in both the developing and adult nervous systems. Several putative regulatory elements within the TH proximal promoter have been reported, but evolutionary conservation of these elements has not been thoroughly investigated. Since many vertebrate species are used to model development, function and disorders of human catecholaminergic neurons, identifying evolutionarily conserved transcription regulatory mechanisms is a high priority. In this study, we align TH proximal promoter nucleotide sequences from several vertebrate species to identify evolutionarily conserved motifs. This analysis identified three elements (a TATA box, cyclic AMP response element (CRE) and a 5′-GGTGG-3′ site) that constitute the core of an ancient vertebrate TH promoter. Focusing on only eutherian mammals, two regions of high conservation within the proximal promoter were identified: a ∼250 bp region adjacent to the transcription start site and a ∼85 bp region located approximately 350 bp further upstream. Within both regions, conservation of previously reported cis-regulatory motifs and human single nucleotide variants was evaluated. Transcription reporter assays in a TH -expressing cell line demonstrated the functionality of highly conserved motifs in the proximal promoter regions and electromobility shift assays showed that brain-region specific complexes assemble on these motifs. These studies also identified a non-canonical CRE binding (CREB) protein recognition element in the proximal promoter. Together, these studies provide a detailed analysis of evolutionary conservation within the TH promoter and identify potential cis-regulatory motifs that underlie a core set of regulatory mechanisms in mammals. PMID:25774193

  15. Self-consistently optimized statistical mechanical energy functions for sequence structure alignment.

    PubMed Central

    Koretke, K. K.; Luthey-Schulten, Z.; Wolynes, P. G.

    1996-01-01

    A quantitative form of the principle of minimal frustration is used to obtain from a database analysis statistical mechanical energy functions and gap parameters for aligning sequences to three-dimensional structures. The analysis that partially takes into account correlations in the energy landscape improves upon the previous approximations of Goldstein et al. (1994, 1995) (Goldstein R, Luthey-Schulten Z, Wolynes P, 1994, Proceedings of the 27th Hawaii International Conference on System Sciences. Los Alamitos, California: IEEE Computer Society Press. pp 306-315; Goldstein R, Luthey-Schulten Z, Wolynes P, 1995, In: Elber R, ed. New developments in theoretical studies of proteins. Singapore: World Scientific). The energy function allows for ordering of alignments based on the compatibility of a sequence to be in a given structure (i.e., lowest energy) and therefore removes the necessity of using percent identity or similarity as scoring parameters. The alignments produced by the energy function on distant homologues with low percent identity (less than 21%) are generally better than those generated with evolutionary information. The lowest energy alignment generated with the energy function for sequences containing prosite signatures but unknown structures is a structure containing the same prosite signature, providing a check on the robustness of the algorithm. Finally, the energy function can make use of known experimental evidence as constraints within the alignment algorithm to aid in finding the correct structural alignment. PMID:8762136

  16. Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains

    PubMed Central

    Liao, Weinan; Ren, Jie; Wang, Kun; Wang, Shun; Zeng, Feng; Wang, Ying; Sun, Fengzhu

    2016-01-01

    The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com. PMID:27876823

  17. Nucleotide sequence of the transforming gene of m1 murine sarcoma virus.

    PubMed Central

    Brow, M A; Sen, A; Sutcliffe, J G

    1984-01-01

    The v-mosm1 nucleotide sequence codes for a protein that is 376 amino acids long. Although the N-terminus is homologous with that of the v-mos124 protein, the C-terminus is substantially different from the C-termini of all other examined mos proteins, suggesting that this region is nonessential and perhaps cleaved. Overall, v-mosm1 has greater homology with c-mos than does v-mos124, but mutually exclusive differences between c-mos and each of the v-mos genes preclude linear descent and suggest a common ancestral murine sarcoma virus. PMID:6319757

  18. The Complete Nucleotide Sequence of the Mitochondrial Genome of Bactrocera minax (Diptera: Tephritidae)

    PubMed Central

    Zhang, Bin; Nardi, Francesco; Hull-Sanders, Helen; Wan, Xuanwu; Liu, Yinghong

    2014-01-01

    The complete 16,043 bp mitochondrial genome (mitogenome) of Bactrocera minax (Diptera: Tephritidae) has been sequenced. The genome encodes 37 genes usually found in insect mitogenomes. The mitogenome information for B. minax was compared to the homologous sequences of Bactrocera oleae, Bactrocera tryoni, Bactrocera philippinensis, Bactrocera carambolae, Bactrocera papayae, Bactrocera dorsalis, Bactrocera correcta, Bactrocera cucurbitae and Ceratitis capitata. The analysis indicated the structure and organization are typical of, and similar to, the nine closely related species mentioned above, although it contains the lowest genome-wide A+T content (67.3%). Four short intergenic spacers with a high degree of conservation among the nine tephritid species mentioned above and B. minax were observed, which also have clear counterparts in the control regions (CRs). Correlation analysis among these ten tephritid species revealed close positive correlation between the A+T content of zero-fold degenerate sites (P0FD), the ratio of nucleotide substitution frequency at P0FD sites to all degenerate sites (zero-fold degenerate sites, two-fold degenerate sites and four-fold degenerate sites) and amino acid sequence distance (ASD) were found. Further, significant positive correlation was observed between the A+T content of four-fold degenerate sites (P4FD) and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites; however, we found significant negative correlation between ASD and the A+T content of P4FD, and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites. A higher nucleotide substitution frequency at non-synonymous sites compared to synonymous sites was observed in nad4, the first time that has been observed in an insect mitogenome. A poly(T) stretch at the 5′ end of the CR followed by a [TA(A)]n-like stretch was also found. In addition, a highly conserved G+A-rich sequence block was observed in front of the

  19. The complete nucleotide sequence of the mitochondrial genome of Bactrocera minax (Diptera: Tephritidae).

    PubMed

    Zhang, Bin; Nardi, Francesco; Hull-Sanders, Helen; Wan, Xuanwu; Liu, Yinghong

    2014-01-01

    The complete 16,043 bp mitochondrial genome (mitogenome) of Bactrocera minax (Diptera: Tephritidae) has been sequenced. The genome encodes 37 genes usually found in insect mitogenomes. The mitogenome information for B. minax was compared to the homologous sequences of Bactrocera oleae, Bactrocera tryoni, Bactrocera philippinensis, Bactrocera carambolae, Bactrocera papayae, Bactrocera dorsalis, Bactrocera correcta, Bactrocera cucurbitae and Ceratitis capitata. The analysis indicated the structure and organization are typical of, and similar to, the nine closely related species mentioned above, although it contains the lowest genome-wide A+T content (67.3%). Four short intergenic spacers with a high degree of conservation among the nine tephritid species mentioned above and B. minax were observed, which also have clear counterparts in the control regions (CRs). Correlation analysis among these ten tephritid species revealed close positive correlation between the A+T content of zero-fold degenerate sites (P0FD), the ratio of nucleotide substitution frequency at P0FD sites to all degenerate sites (zero-fold degenerate sites, two-fold degenerate sites and four-fold degenerate sites) and amino acid sequence distance (ASD) were found. Further, significant positive correlation was observed between the A+T content of four-fold degenerate sites (P4FD) and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites; however, we found significant negative correlation between ASD and the A+T content of P4FD, and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites. A higher nucleotide substitution frequency at non-synonymous sites compared to synonymous sites was observed in nad4, the first time that has been observed in an insect mitogenome. A poly(T) stretch at the 5' end of the CR followed by a [TA(A)]n-like stretch was also found. In addition, a highly conserved G+A-rich sequence block was observed in front of the

  20. Within-Host Nucleotide Diversity of Virus Populations: Insights from Next-Generation Sequencing

    PubMed Central

    Nelson, Chase W.; Hughes, Austin L.

    2014-01-01

    Next-generation sequencing (NGS) technology offers new opportunities for understanding the evolution and dynamics of viral populations within individual hosts over the course of infection. We review simple methods for estimating synonymous and nonsynonymous nucleotide diversity in viral genes from NGS data without the need for inferring linkage. We discuss the potential usefulness of these data for addressing questions of both practical and theoretical interest, including fundamental questions regarding the effective population sizes of within-host viral populations and the modes of natural selection acting on them. PMID:25481279

  1. Nanoparticle-Based Discrimination of Single-Nucleotide Polymorphism in Long DNA Sequences.

    PubMed

    Sanromán-Iglesias, María; Lawrie, Charles H; Liz-Marzán, Luis M; Grzelczak, Marek

    2017-03-01

    Circulating DNA (ctDNA) and specifically the detection cancer-associated mutations in liquid biopsies promises to revolutionize cancer detection. The main difficulty however is that the length of typical ctDNA fragments (∼150 bases) can form secondary structures potentially obscuring the mutated fragment from detection. We show that an assay based on gold nanoparticles (65 nm) stabilized with DNA (Au@DNA) can discriminate single nucleotide polymorphism in clinically relevant ssDNA sequences (70-140 bases). The preincubation step was crucial to this process, allowing sequential bridging of Au@DNA, so that single base mutation can be discriminated, down to 100 pM concentration.

  2. Complete nucleotide sequence of a virus associated with rusty mottle disease of sweet cherry (Prunus avium).

    PubMed

    Villamor, D V; Druffel, K L; Eastwell, K C

    2013-08-01

    Cherry rusty mottle is a disease of sweet cherries first described in 1940 in western North America. Because of the graft-transmissible nature of the disease, a viral nature of the disease was assumed. Here, the complete genomic nucleotide sequences of virus isolates from two trees expressing cherry rusty mottle disease symptoms are characterized; the virus is designated cherry rusty mottle associated virus (CRMaV). The biological and molecular characteristics of this virus in comparison to those of cherry necrotic rusty mottle virus (CNRMV) and cherry green ring mottle virus (CGRMV) are described. CRMaV was subsequently detected in additional sweet cherry trees expressing symptoms of cherry rusty mottle disease.

  3. CATO: The Clone Alignment Tool.

    PubMed

    Henstock, Peter V; LaPan, Peter

    2016-01-01

    High-throughput cloning efforts produce large numbers of sequences that need to be aligned, edited, compared with reference sequences, and organized as files and selected clones. Different pieces of software are typically required to perform each of these tasks. We have designed a single piece of software, CATO, the Clone Alignment Tool, that allows a user to align, evaluate, edit, and select clone sequences based on comparisons to reference sequences. The input and output are designed to be compatible with standard data formats, and thus suitable for integration into a clone processing pipeline. CATO provides both sequence alignment and visualizations to facilitate the analysis of cloning experiments. The alignment algorithm matches each of the relevant candidate sequences against each reference sequence. The visualization portion displays three levels of matching: 1) a top-level summary of the top candidate sequences aligned to each reference sequence, 2) a focused alignment view with the nucleotides of matched sequences displayed against one reference sequence, and 3) a pair-wise alignment of a single reference and candidate sequence pair. Users can select the minimum matching criteria for valid clones, edit or swap reference sequences, and export the results to a summary file as part of the high-throughput cloning workflow.

  4. Malakite: an automatic tool for characterisation of structure of reliable blocks in multiple alignments of protein sequences.

    PubMed

    Burkov, Boris; Nagaev, Boris; Spirin, Sergei; Alexeevski, Andrei

    2010-06-01

    It makes sense to speak of alignment of protein sequences only within the regions, where the sequences are related to each other. This simple consideration is often disregarded by programs of multiple alignment construction. A package for alignment analysis MAlAKiTE (Multiple Alignment Automatic Kinship Tiling Engine) is introduced. It aims to find the blocks of reliable alignment, which contain related regions only, within the whole alignment and allows for dealing with them. The validity of the detection of reliable blocks' was verified by comparison with structural data.

  5. Optimizing nucleotide sequence ensembles for combinatorial protein libraries using a genetic algorithm.

    PubMed

    Craig, Roger A; Lu, Jin; Luo, Jinquan; Shi, Lei; Liao, Li

    2010-01-01

    Protein libraries are essential to the field of protein engineering. Increasingly, probabilistic protein design is being used to synthesize combinatorial protein libraries, which allow the protein engineer to explore a vast space of amino acid sequences, while at the same time placing restrictions on the amino acid distributions. To this end, if site-specific amino acid probabilities are input as the target, then the codon nucleotide distributions that match this target distribution can be used to generate a partially randomized gene library. However, it turns out to be a highly nontrivial computational task to find the codon nucleotide distributions that exactly matches a given target distribution of amino acids. We first showed that for any given target distribution an exact solution may not exist at all. Formulated as a constrained optimization problem, we then developed a genetic algorithm-based approach to find codon nucleotide distributions that match as closely as possible to the target amino acid distribution. As compared with the previous gradient descent method on various objective functions, the new method consistently gave more optimized distributions as measured by the relative entropy between the calculated and the target distributions. To simulate the actual lab solutions, new objective functions were designed to allow for two separate sets of codons in seeking a better match to the target amino acid distribution.

  6. A simple ABO genotyping by PCR using sequence-specific primers with mismatched nucleotides.

    PubMed

    Taki, Takashi; Kibayashi, Kazuhiko

    2014-05-01

    In forensics, the specific ABO blood group is often determined by analyzing the ABO gene. Among various methods used, PCR employing sequence-specific primers (PCR-SSP) is simpler than other methods for ABO typing. When performing the PCR-SSP, the pseudo-positive signals often lead to errors in ABO typing. We introduced mismatched nucleotides at the second and the third positions from the 3'-end of the primers for the PCR-SSP method and examined whether reliable typing could be achieved by suppressing pseudo-positive signals. Genomic DNA was extracted from nail clippings of 27 volunteers, and the ABO gene was examined with PCR-SSP employing primers with and without mismatched nucleotides. The ABO blood group of the nail clippings was also analyzed serologically, and these results were compared with those obtained using PCR-SSP. When mismatched primers were employed for amplification, the results of the ABO typing matched with those obtained by the serological method. When primers without mismatched nucleotides were used for PCR-SSP, pseudo-positive signals were observed. Thus our method may be used for achieving more reliable ABO typing.

  7. Complete nucleotide sequences of two begomoviruses infecting Madagascar periwinkle (Catharanthus roseus) from Pakistan.

    PubMed

    Ilyas, Muhammad; Nawaz, Kiran; Shafiq, Muhammad; Haider, Muhammad Saleem; Shahid, Ahmad Ali

    2013-02-01

    Though Catharanthus roseus (Madagascar periwinkle) is an ornamental plant, it is famous for its medicinal value. Its alkaloids are known for anti-cancerous properties, and this plant is studied mainly for its alkaloids. Here, this plant has been studied for its viral diseases. Complete DNA sequences of two begomoviruses infecting C. roseus originating from Pakistan were determined. The sequence of one begomovirus (clone KN4) shows the highest level of nucleotide sequence identity (86.5 %) to an unpublished virus, chili leaf curl India virus (ChiLCIV), and then (84.4 % identity) to papaya leaf curl virus (PaLCV), and thus represents a new species, for which the name "Catharanthus yellow mosaic virus" (CYMV) is proposed. The sequence of another begomovirus (clone KN6) shows the highest level of sequence identity (95.9 % to 99 %) to a newly reported virus from India, papaya leaf crumple virus (PaLCrV). Sequence analysis shows that KN4 and KN6 are recombinants of Pedilanthus leaf curl virus (PedLCV) and croton yellow vein mosaic virus (CrYVMV).

  8. Nucleotide sequence of the leukotoxin gene from Actinobacillus actinomycetemcomitans: homology to the alpha-hemolysin/leukotoxin gene family.

    PubMed Central

    Kraig, E; Dailey, T; Kolodrubetz, D

    1990-01-01

    The leukotoxin produced by Actinobacillus actinomycetemcomitans has been implicated in the etiology of localized juvenile periodontitis. To initiate a genetic analysis into the role of this protein in disease, we have cloned its gene, lktA. We now present the complete nucleotide sequence of the lktA gene from A. actinomycetemcomitans. When the deduced amino acid sequence of the leukotoxin protein was compared with those of other proteins, it was found to be homologous to the leukotoxin from Pasteurella haemolytica and to the alpha-hemolysins from Escherichia coli and Actinobacillus pleuropneumoniae. Each alignment showed at least 42% identity. As in the other organisms, the lktA gene of A. actinomycetemcomitans was linked to another gene, lktC, which is thought to be involved in the activation of the leukotoxin. The predicted LktC protein was related to the leukotoxin/hemolysin C proteins from the other bacteria, since they shared a minimum of 49% amino acid identity. Surprisingly, although actinobacillus species are more closely related to pasteurellae than to members of the family Enterobacteriaciae, LktA and LktC from A. actinomycetemcomitans shared significantly greater sequence identity with the E. coli alpha-hemolysin proteins than with the P. haemolytica leukotoxin proteins. Despite the overall homology to the other leukotoxin/hemolysin proteins, the LktA protein from A. actinomycetemcomitans has several unique properties. Most strikingly, it is a very basic protein with a calculated pI of 9.7; the other toxins have estimated pIs around 6.2. The unusual features of the A. actinomycetemcomitans protein are discussed in light of the different species and target-cell specificities of the hemolysins and the leukotoxins. Images PMID:2318535

  9. The eSNV-detect: a computational system to identify expressed single nucleotide variants from transcriptome sequencing data.

    PubMed

    Tang, Xiaojia; Baheti, Saurabh; Shameer, Khader; Thompson, Kevin J; Wills, Quin; Niu, Nifang; Holcomb, Ilona N; Boutet, Stephane C; Ramakrishnan, Ramesh; Kachergus, Jennifer M; Kocher, Jean-Pierre A; Weinshilboum, Richard M; Wang, Liewei; Thompson, E Aubrey; Kalari, Krishna R

    2014-12-16

    Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7% precision and 91.0% sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6-96.8% precision and 91.6-95.7% sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at http://bioinformaticstools.mayo.edu/research/esnv-detect/.

  10. Complete nucleotide sequence of a South African isolate of Grapevine fanleaf virus and its associated satellite RNA.

    PubMed

    Lamprecht, Renate L; Spaltman, Monique; Stephan, Dirk; Wetzel, Thierry; Burger, Johan T

    2013-07-17

    The complete sequences of RNA1, RNA2 and satellite RNA have been determined for a South African isolate of Grapevine fanleaf virus (GFLV-SACH44). The two RNAs of GFLV-SACH44 are 7,341 nucleotides (nt) and 3,816 nt in length, respectively, and its satellite RNA (satRNA) is 1,104 nt in length, all excluding the poly(A) tail. Multiple sequence alignment of these sequences showed that GFLV-SACH44 RNA1 and RNA2 were the closest to the South African isolate, GFLV-SAPCS3 (98.2% and 98.6% nt identity, respectively), followed by the French isolate, GFLV-F13 (87.3% and 90.1% nt identity, respectively). Interestingly, the GFLV-SACH44 satRNA is more similar to three Arabis mosaic virus satRNAs (85%-87.4% nt identity) than to the satRNA of GFLV-F13 (81.8% nt identity) and was most distantly related to the satRNA of GFLV-R2 (71.0% nt identity). Full-length infectious clones of GFLV-SACH44 satRNA were constructed. The infectivity of the clones was tested with three nepovirus isolates, GFLV-NW, Arabis mosaic virus (ArMV)-NW and GFLV-SAPCS3. The clones were mechanically inoculated in Chenopodium quinoa and were infectious when co-inoculated with the two GFLV helper viruses, but not when co-inoculated with ArMV-NW.

  11. Simultaneous Detection of Both Single Nucleotide Variations and Copy Number Alterations by Next-Generation Sequencing in Gorlin Syndrome

    PubMed Central

    Morita, Kei-ichi; Naruto, Takuya; Tanimoto, Kousuke; Yasukawa, Chisato; Oikawa, Yu; Masuda, Kiyoshi; Imoto, Issei; Inazawa, Johji; Omura, Ken; Harada, Hiroyuki

    2015-01-01

    Gorlin syndrome (GS) is an autosomal dominant disorder that predisposes affected individuals to developmental defects and tumorigenesis, and caused mainly by heterozygous germline PTCH1 mutations. Despite exhaustive analysis, PTCH1 mutations are often unidentifiable in some patients; the failure to detect mutations is presumably because of mutations occurred in other causative genes or outside of analyzed regions of PTCH1, or copy number alterations (CNAs). In this study, we subjected a cohort of GS-affected individuals from six unrelated families to next-generation sequencing (NGS) analysis for the combined screening of causative alterations in Hedgehog signaling pathway-related genes. Specific single nucleotide variations (SNVs) of PTCH1 causing inferred amino acid changes were identified in four families (seven affected individuals), whereas CNAs within or around PTCH1 were found in two families in whom possible causative SNVs were not detected. Through a targeted resequencing of all coding exons, as well as simultaneous evaluation of copy number status using the alignment map files obtained via NGS, we found that GS phenotypes could be explained by PTCH1 mutations or deletions in all affected patients. Because it is advisable to evaluate CNAs of candidate causative genes in point mutation-negative cases, NGS methodology appears to be useful for improving molecular diagnosis through the simultaneous detection of both SNVs and CNAs in the targeted genes/regions. PMID:26544948

  12. Simultaneous Detection of Both Single Nucleotide Variations and Copy Number Alterations by Next-Generation Sequencing in Gorlin Syndrome.

    PubMed

    Morita, Kei-ichi; Naruto, Takuya; Tanimoto, Kousuke; Yasukawa, Chisato; Oikawa, Yu; Masuda, Kiyoshi; Imoto, Issei; Inazawa, Johji; Omura, Ken; Harada, Hiroyuki

    2015-01-01

    Gorlin syndrome (GS) is an autosomal dominant disorder that predisposes affected individuals to developmental defects and tumorigenesis, and caused mainly by heterozygous germline PTCH1 mutations. Despite exhaustive analysis, PTCH1 mutations are often unidentifiable in some patients; the failure to detect mutations is presumably because of mutations occurred in other causative genes or outside of analyzed regions of PTCH1, or copy number alterations (CNAs). In this study, we subjected a cohort of GS-affected individuals from six unrelated families to next-generation sequencing (NGS) analysis for the combined screening of causative alterations in Hedgehog signaling pathway-related genes. Specific single nucleotide variations (SNVs) of PTCH1 causing inferred amino acid changes were identified in four families (seven affected individuals), whereas CNAs within or around PTCH1 were found in two families in whom possible causative SNVs were not detected. Through a targeted resequencing of all coding exons, as well as simultaneous evaluation of copy number status using the alignment map files obtained via NGS, we found that GS phenotypes could be explained by PTCH1 mutations or deletions in all affected patients. Because it is advisable to evaluate CNAs of candidate causative genes in point mutation-negative cases, NGS methodology appears to be useful for improving molecular diagnosis through the simultaneous detection of both SNVs and CNAs in the targeted genes/regions.

  13. Molecular cloning of the Clostridium botulinum structural gene encoding the type B neurotoxin and determination of its entire nucleotide sequence.

    PubMed Central

    Whelan, S M; Elmore, M J; Bodsworth, N J; Brehm, J K; Atkinson, T; Minton, N P

    1992-01-01

    DNA fragments derived from the Clostridium botulinum type A neurotoxin (BoNT/A) gene (botA) were used in DNA-DNA hybridization reactions to derive a restriction map of the region of the C. botulinum type B strain Danish chromosome encoding botB. As the one probe encoded part of the BoNT/A heavy (H) chain and the other encoded part of the light (L) chain, the position and orientation of botB relative to this map were established. The temperature at which hybridization occurred indicated that a higher degree of DNA homology occurred between the two genes in the H-chain-encoding region. By using the derived restriction map data, a 2.1-kb BglII-XbaI fragment encoding the entire BoNT/B L chain and 108 amino acids of the H chain was cloned and characterized by nucleotide sequencing. A contiguous 1.8-kb XbaI fragment encoding a further 623 amino acids of the H chain was also cloned. The 3' end of the gene was obtained by cloning a 1.6-kb fragment amplified from genomic DNA by inverse polymerase chain reaction. Translation of the nucleotide sequence derived from all three clones demonstrated that BoNT/B was composed of 1,291 amino acids. Comparative alignment of its sequence with all currently characterized BoNTs (A, C, D, and E) and tetanus toxin (TeTx) showed that a wide variation in percent homology occurred dependent on which component of the dichain was compared. Thus, the L chain of BoNT/B exhibits the greatest degree of homology (50% identity) with the TeTx L chain, whereas its H chain is most homologous (48% identity) with the BoNT/A H chain. Overall, the six neurotoxins were shown to be composed of highly conserved amino acid domains interceded with amino acid tracts exhibiting little overall similarity. In total, 68 amino acids of an average of 442 are absolutely conserved between L chains and 110 of 845 amino acids are conserved between H chains. Conservation of Trp residues (one in the L chain and nine in the H chain) was particularly striking. The most

  14. Mulan: Multiple-Sequence Local Alignment and Visualization for Studying Function and Evolution

    SciTech Connect

    Ovcharenko, I; Loots, G; Giardine, B; Hou, M; Ma, J; Hardison, R; Stubbs, L; Miller, W

    2004-07-14

    Multiple sequence alignment analysis is a powerful approach for understanding phylogenetic relationships, annotating genes and detecting functional regulatory elements. With a growing number of partly or fully sequenced vertebrate genomes, effective tools for performing multiple comparisons are required to accurately and efficiently assist biological discoveries. Here we introduce Mulan (http://mulan.dcode.org/), a novel method and a network server for comparing multiple draft and finished-quality sequences to identify functional elements conserved over evolutionary time. Mulan brings together several novel algorithms: the tba multi-aligner program for rapid identification of local sequence conservation and the multiTF program for detecting evolutionarily conserved transcription factor binding sites in multiple alignments. In addition, Mulan supports two-way communication with the GALA database; alignments of multiple species dynamically generated in GALA can be viewed in Mulan, and conserved transcription factor binding sites identified with Mulan/multiTF can be integrated and overlaid with extensive genome annotation data using GALA. Local multiple alignments computed by Mulan ensure reliable representation of short-and large-scale genomic rearrangements in distant organisms. Mulan allows for interactive modification of critical conservation parameters to differentially predict conserved regions in comparisons of both closely and distantly related species. We illustrate the uses and applications of the Mulan tool through multi-species comparisons of the GATA3 gene locus and the identification of elements that are conserved differently in avians than in other genomes allowing speculation on the evolution of birds. Source code for the aligners and the aligner-evaluation software can be freely downloaded from http://bio.cse.psu.edu/.

  15. Overproduction and nucleotide sequence of the respiratory D-lactate dehydrogenase of Escherichia coli.

    PubMed Central

    Rule, G S; Pratt, E A; Chin, C C; Wold, F; Ho, C

    1985-01-01

    Recombinant DNA plasmids containing the gene for the membrane-bound D-lactate dehydrogenase (D-LDH) of Escherichia coli linked to the promoter PL from lambda were constructed. After induction, the levels of D-LDH were elevated 300-fold over that of the wild type and amounted to 35% of the total cellular protein. The nucleotide sequence of the D-LDH gene was determined and shown to agree with the amino acid composition and the amino-terminal sequence of the purified enzyme. Removal of the amino-terminal formyl-Met from D-LDH was not inhibited in cells which contained these high levels of D-LDH. Images PMID:3882663

  16. Using mitochondrial nucleotide sequences to investigate diversity and genealogical relationships within common carp (Cyprinus carpio L.).

    PubMed

    Thai, B T; Burridge, C P; Pham, T A; Austin, C M

    2005-02-01

    Direct sequencing of mitochondrial DNA (mtDNA) D-loop (745 bp) and MTATPase6/MTATPase8 (857 bp) regions was used to investigate genetic variation within common carp and develop a global genealogy of common carp strains. The D-loop region was more variable than the MTATPase6/MTATPase8 region, but given the wide distribution of carp the overall levels of sequence divergence were low. Levels of haplotype diversity varied widely among countries with Chinese, Indonesian and Vietnamese carp showing the greatest diversity whereas Japanese Koi and European carp had undetectable nucleotide variation. A genealogical analysis supports a close relationship between Vietnamese, Koi and Chinese Color carp strains and to a lesser extent, European carp. Chinese and Indonesian carp strains were the most divergent, and their relationships do not support the evolution of independent Asian and European lineages and current taxonomic treatments.

  17. Complete nucleotide sequence of a new variant of grapevine fanleaf virus from northeastern China.

    PubMed

    Zhou, Jun; Fan, Xudong; Dong, Yafeng; Zhang, Zunping; Ren, Fang; Hu, Guojun; Li, Zhengnan

    2017-02-01

    The complete RNA1 and RNA2 sequences of a new grapevine fanleaf virus isolate (GFLV-SDHN) from northeastern China were determined. The two RNAs are 7,367 and 3,788 nucleotides (nt) in length, respectively, excluding the poly(A) tails. Compared to other GFLV isolates, GFLV-SDHN has a 22- to 24-nt insertion in the RNA1 5' untranslated region, and there was 19.1-20.1 % and 11.7 %-13.0 % sequence divergence in RNA1, and 15.5 %-20.5 % and 8.5-13.5 % in RNA2, at the nt and amino acid level, respectively. Phylogenetic analysis revealed that the origins of GFLV-SDHN are distinct from those of other GFLV isolates. One recombination event was identified in the 2A(HP) region of RNA2 in GFLV-SDHN.

  18. The complete nucleotide sequence and genome organization of pea streak virus (genus Carlavirus).

    PubMed

    Su, Li; Li, Zhengnan; Bernardy, Mike; Wiersma, Paul A; Cheng, Zhihui; Xiang, Yu

    2015-10-01

    Pea streak virus (PeSV) is a member of the genus Carlavirus in the family Betaflexiviridae. Here, the first complete genome sequence of PeSV was determined by deep sequencing of a cDNA library constructed from dsRNA extracted from a PeSV-infected sample and Rapid Amplification of cDNA Ends (RACE) PCR. The PeSV genome consists of 8041 nucleotides excluding the poly(A) tail and contains six open reading frames (ORFs). The putative peptide encoded by the PeSV ORF6 has an estimated molecular mass of 6.6 kDa and shows no similarity to any known proteins. This differs from typical carlaviruses, whose ORF6 encodes a 12- to 18-kDa cysteine-rich nucleic-acid-binding protein.

  19. [Nucleotide sequence of HLA-DQA1 promoter region (QAP) in a lung cancer patient].

    PubMed

    Qiu, C; Zhou, W; Song, C

    1996-06-01

    The HLA-DQA1 allele and nucleotide sequence of HLA-DQA1 promoter region (QAP) in a patient with IDDM complicated lung cancer have been identified by PCR/SSCP, PCR/SSCP and PCR/sequencing. The results showed that: (1) All of the lung cancer patient and his family members carried HLA-DQA1* 0301/0501 alleles. (2) a single base substitution G-->A at position -155 and deletion CAA at position -161 to -163 occurred in the patient. These results suggest that the mutation of HLA-DQA1 promoter region may modulate HLA-DQA1 gene expression by trans-acting factors binding to variant cis-acting elements and may be responsible for pathogenesis of lung cancer.

  20. Molecular detection and nucleotide sequence analysis of a new Aichi virus closely related to canine kobuvirus in sewage samples.

    PubMed

    Yamashita, Teruo; Adachi, Hirokazu; Hirose, Emi; Nakamura, Noriko; Ito, Miyabi; Yasui, Yoshihiro; Kobayashi, Shinichi; Minagawa, Hiroko

    2014-05-01

    Between 2001 and 2005, 207 raw sewage samples were collected at the inflow of a sewage treatment plant in Aichi Prefecture, Japan. Of the 207 sewage samples, 137 (66.2 %) were found to be positive for amplification of Aichi virus (AiV) nucleotide using reverse transcription (RT)-PCR with 10 forward and 10 reverse primers in the 3D region corresponding to the nucleotide sequence of all kobuviruses. AiV genotype A sequences were detected in all 137 samples. New sequences of AiV were detected in nine samples, exhibiting 83 % similarity with AiV A846/88, but 95 % similarity with canine kobuvirus (CKV) US-PC0082 in this region. The nucleotide sequences from the VP3 region to the 3' untranslated region (UTR) of sewage sample Y12/2004 were determined. The number of nucleotides in each region was the same as that of CKV. The similarity of the nucleotide (amino acid) identity of a complete VP1 region was 90.5 % (94.8 %) between Y12/2004 and CKV US-PC0082. The phylogenic analyses based on the nucleotide and the deduced amino acid sequences of VP1 and 3D showed that Y12/2004 was independent from AiV, but closely related to CKV. These results suggested that CKV is present in Aichi Prefecture, Japan.

  1. The nucleotide sequence surrounding the replication origin of the cop3 mutant of the bacteriocinogenic plasmid Clo DF13.

    PubMed Central

    Stuitje, A R; Veltkamp, E; Maat, J; Heyneker, H L

    1980-01-01

    The nucleotide sequence from about 100 base-pairs downstream to about 600 base pairs upstream the CloDF13 replication origin has been determined. A comparison of this sequence with the corresponding ColE1 origin sequence reveals that: The sequence at the origin of replication is conserved. There are large differences in the nucleotide sequence downstream the replication origin, whereas there is a large homology in the region of about 410 base-pairs upstream the replication origin. This conserved region might code for a largely homologous basic, arginine rich polypeptide of about 45 amino-acids, for both ColE1 and CloDF13. Although there are large differences in the primary structure of the region coding for the 100 nucleotide RNA, the secondary structure of this region seems to be conserved. Images PMID:6253936

  2. The nucleotide sequence surrounding the replication origin of the cop3 mutant of the bacteriocinogenic plasmid Clo DF13.

    PubMed

    Stuitje, A R; Veltkamp, E; Maat, J; Heyneker, H L

    1980-04-11

    The nucleotide sequence from about 100 base-pairs downstream to about 600 base pairs upstream the CloDF13 replication origin has been determined. A comparison of this sequence with the corresponding ColE1 origin sequence reveals that: The sequence at the origin of replication is conserved. There are large differences in the nucleotide sequence downstream the replication origin, whereas there is a large homology in the region of about 410 base-pairs upstream the replication origin. This conserved region might code for a largely homologous basic, arginine rich polypeptide of about 45 amino-acids, for both ColE1 and CloDF13. Although there are large differences in the primary structure of the region coding for the 100 nucleotide RNA, the secondary structure of this region seems to be conserved.

  3. Nucleotide sequence of ermA, a macrolide-lincosamide-streptogramin B determinant in Staphylococcus aureus.

    PubMed Central

    Murphy, E

    1985-01-01

    The complete nucleotide sequence of ermA, the prototype macrolide-lincosamide-streptogramin B resistance gene from Staphylococcus aureus, has been determined. The sequence predicts a 243-amino-acid protein that is homologous to those specified by ermC, ermAM, and ermD, resistance determinants from Staphylococcus aureus, Streptococcus sanguis, and Bacillus licheniformis, respectively. The ermA transcript, identified by Northern analysis and S1 mapping, contains a 5' leader sequence of 211 bases which has the potential to encode two short peptides of 15 and 19 amino acids; the second, longer peptide has 13 amino acids in common with the putative regulatory leader peptide of ermC. The coding sequence for this peptide is deleted in several mutants in which macrolide-lincosamide-streptogramin B resistance is constitutively expressed. Potential secondary structures available to the leader sequence of the wild-type (inducible) transcript and to constitutive deletion, insertion, and point mutations provide additional support for the translational attenuation model for induction of macrolide-lincosamide-streptogramin B resistance. Images PMID:2985541

  4. Nucleotide sequence analysis of beta tubulin gene in a wide range of dermatophytes.

    PubMed

    Rezaei-Matehkolaei, Ali; Mirhendi, Hossein; Makimura, Koichi; de Hoog, G Sybren; Satoh, Kazuo; Najafzadeh, Mohammad Javad; Shidfar, Mohammad Reza

    2014-10-01

    We investigated the resolving power of the beta tubulin protein-coding gene (BT2) for systematic study of dermatophyte fungi. Initially, 144 standard and clinical strains belonging to 26 species in the genera Trichophyton, Microsporum, and Epidermophyton were identified by internal transcribe spacer (ITS) sequencing. Subsequently, BT2 was partially amplified in all strains, and sequence analysis performed after construction of a BT2 database that showed length ranged from approximately 723 (T. ajelloi) to 808 nucleotides (M. persicolor) in different species. Intraspecific sequence variation was found in some species, but T. tonsurans, T. equinum, T. concentricum, T. verrucosum, T. rubrum, T. violaceum, T. eriotrephon, E. floccosum, M. canis, M. ferrugineum, and M. audouinii were invariant. The sequences were found to be relatively conserved among different strains of the same species. The species with the closest resemblance were Arthroderma benhamiae and T. concentricum and T. tonsurans and T. equinum with 100% and 99.8% identity, respectively; the most distant species were M. persicolor and M. amazonicum. The dendrogram obtained from BT2 topology was almost compatible with the species concept based on ITS sequencing, and similar clades and species were distinguished in the BT2 tree. Here, beta tubulin was characterized in a wide range of dermatophytes in order to assess intra- and interspecies variation and resolution and was found to be a taxonomically valuable gene.

  5. Computational generation and screening of RNA motifs in large nucleotide sequence pools

    PubMed Central

    Kim, Namhee; Izzo, Joseph A.; Elmetwaly, Shereef; Gan, Hin Hark; Schlick, Tamar

    2010-01-01

    Although identification of active motifs in large random sequence pools is central to RNA in vitro selection, no systematic computational equivalent of this process has yet been developed. We develop a computational approach that combines target pool generation, motif scanning and motif screening using secondary structure analysis for applications to 1012–1014-sequence pools; large pool sizes are made possible using program redesign and supercomputing resources. We use the new protocol to search for aptamer and ribozyme motifs in pools up to experimental pool size (1014 sequences). We show that motif scanning, structure matching and flanking sequence analysis, respectively, reduce the initial sequence pool by 6–8, 1–2 and 1 orders of magnitude, consistent with the rare occurrence of active motifs in random pools. The final yields match the theoretical yields from probability theory for simple motifs and overestimate experimental yields, which constitute lower bounds, for aptamers because screening analyses beyond secondary structure information are not considered systematically. We also show that designed pools using our nucleotide transition probability matrices can produce higher yields for RNA ligase motifs than random pools. Our methods for generating, analyzing and designing large pools can help improve RNA design via simulation of aspects of in vitro selection. PMID:20448026

  6. Organization and nucleotide sequence analysis of a ribosomal RNA gene cluster from Streptomyces ambofaciens.

    PubMed

    Pernodet, J L; Boccard, F; Alegre, M T; Gagnat, J; Guérineau, M

    1989-06-30

    The Streptomyces ambofaciens genome contains four rRNA gene clusters. These copies are called rrnA, B, C and D. The complete nucleotide (nt) sequence of rrnD has been determined. These genes possess striking similarity with other eubacterial rRNA genes. Comparison with other rRNA sequences allowed the putative localization of the sequences encoding mature rRNAs. The structural genes are arranged in the order 16S-23S-5S and are tightly linked. The mature rRNAs are predicted to contain 1528, 3120 and 120 nt, for the 16S, 23S and 5S rRNAs, respectively. The 23S rRNA is, to our knowledge, the longest of all sequenced prokaryotic 23S rRNAs. When compared to other large rRNAs it shows insertions at positions where they are also present in archaebacterial and in eukaryotic large rRNAs. Secondary structure models of S. ambofaciens rRNAs are proposed, based upon those existing for other bacterial rRNAs. Positions of putative transcription start points and of a termination signal are suggested. The corresponding putative primary transcript, containing the 16S, 23S and 5S rRNAs plus flanking regions, was folded into a secondary structure, and sequences possibly involved in rRNA maturation are described. The G + C content of the rRNA gene cluster is low (57%) compared with the overall G + C content of Streptomyces DNA (73%).

  7. Unique nucleotide sequence-guided assembly of repetitive DNA parts for synthetic biology applications

    SciTech Connect

    Torella, JP; Lienert, F; Boehm, CR; Chen, JH; Way, JC; Silver, PA

    2014-08-07

    Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts, and they hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies-for example, repeated terminator and insulator sequences-that complicate recombination-based assembly. We and others have recently developed DNA assembly methods, which we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked with UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly assembled constructs, or into high-quality combinatorial libraries in only 2-3 d. If the DNA parts must be generated from scratch, an additional 2-5 d are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques.

  8. Unique nucleotide sequence (UNS)-guided assembly of repetitive DNA parts for synthetic biology applications

    PubMed Central

    Torella, Joseph P.; Lienert, Florian; Boehm, Christian R.; Chen, Jan-Hung; Way, Jeffrey C.; Silver, Pamela A.

    2016-01-01

    Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts and hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies — for example repeated terminator and insulator sequences — that complicate recombination-based assembly. We and others have recently developed DNA assembly methods that we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked with UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly-assembled constructs, or into high-quality combinatorial libraries in only 2–3 days. If the DNA parts must be generated from scratch, an additional 2–5 days are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques. PMID:25101822

  9. Evidence for Balancing Selection from Nucleotide Sequence Analyses of Human G6PD

    PubMed Central

    Verrelli, Brian C.; McDonald, John H.; Argyropoulos, George; Destro-Bisol, Giovanni; Froment, Alain; Drousiotou, Anthi; Lefranc, Gerard; Helal, Ahmed N.; Loiselet, Jacques; Tishkoff, Sarah A.

    2002-01-01

    Glucose-6-phosphate dehydrogenase (G6PD) mutations that result in reduced enzyme activity have been implicated in malarial resistance and constitute one of the best examples of selection in the human genome. In the present study, we characterize the nucleotide diversity across a 5.2-kb region of G6PD in a sample of 160 Africans and 56 non-Africans, to determine how selection has shaped patterns of DNA variation at this gene. Our global sample of enzymatically normal B alleles and A, A−, and Med alleles with reduced enzyme activities reveals many previously uncharacterized silent-site polymorphisms. In comparison with the absence of amino acid divergence between human and chimpanzee G6PD sequences, we find that the number of G6PD amino acid polymorphisms in human populations is significantly high. Unlike many other G6PD-activity alleles with reduced activity, we find that the age of the A variant, which is common in Africa, may not be consistent with the recent emergence of severe malaria and therefore may have originally had a historically different adaptive function. Overall, our observations strongly support previous genotype-phenotype association studies that proposed that balancing selection maintains G6PD deficiencies within human populations. The present study demonstrates that nucleotide sequence analyses can reveal signatures of both historical and recent selection in the genome and may elucidate the impact that infectious disease has had during human evolution. PMID:12378426

  10. Nucleotide sequence at the termini of the DNA of Bacillus subtilis phage phi 29.

    PubMed Central

    Escarmís, C; Salas, M

    1981-01-01

    Phage phi 29 DNA cannot be phosphorylated with polynucleotide kinase and [gamma-32P]ATP because of the presence of a viral protein covalently linked to the 5' termini. The 5' ends can, however, be made susceptible to phosphorylation by treatment with alkali and alkaline phosphatase. Restriction fragments Hpa II C and Hpa II F, corresponding to the right and left ends of phi 29 DNA, respectively, were labeled at the 5' ends with polynucleotide kinase and [gamma-32P]ATP or at the 3' ends with terminal transferase and [alpha-32P]ATP or [alpha-32P]cordycepin 5'-triphosphate. After a secondary cleavage of the labeled fragments, the sequence of the first 150-180 nucleotides at the termini of phi 29 DNA was determined by the method of Maxam and Gilbert. The ends of phi 29 DNA are flush, and a six-nucleotides-long inverted terminal repetition was found. The functional implications of the sequences determined are discussed. Images PMID:6262800

  11. Mapping DNA methylation by transverse current sequencing: Reduction of noise from neighboring nucleotides

    NASA Astrophysics Data System (ADS)

    Alvarez, Jose; Massey, Steven; Kalitsov, Alan; Velev, Julian

    Nanopore sequencing via transverse current has emerged as a competitive candidate for mapping DNA methylation without needed bisulfite-treatment, fluorescent tag, or PCR amplification. By eliminating the error producing amplification step, long read lengths become feasible, which greatly simplifies the assembly process and reduces the time and the cost inherent in current technologies. However, due to the large error rates of nanopore sequencing, single base resolution has not been reached. A very important source of noise is the intrinsic structural noise in the electric signature of the nucleotide arising from the influence of neighboring nucleotides. In this work we perform calculations of the tunneling current through DNA molecules in nanopores using the non-equilibrium electron transport method within an effective multi-orbital tight-binding model derived from first-principles calculations. We develop a base-calling algorithm accounting for the correlations of the current through neighboring bases, which in principle can reduce the error rate below any desired precision. Using this method we show that we can clearly distinguish DNA methylation and other base modifications based on the reading of the tunneling current.

  12. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase.

    PubMed Central

    Clark, A G; Weiss, K M; Nickerson, D A; Taylor, S L; Buchanan, A; Stengård, J; Salomaa, V; Vartiainen, E; Perola, M; Boerwinkle, E; Sing, C F

    1998-01-01

    Allelic variation in 9.7 kb of genomic DNA sequence from the human lipoprotein lipase gene (LPL) was scored in 71 healthy individuals (142 chromosomes) from three populations: African Americans (24) from Jackson, MS; Finns (24) from North Karelia, Finland; and non-Hispanic Whites (23) from Rochester, MN. The sequences had a total of 88 variable sites, with a nucleotide diversity (site-specific heterozygosity) of .002+/-.001 across this 9.7-kb region. The frequency spectrum of nucleotide variation exhibited a slight excess of heterozygosity, but, in general, the data fit expectations of the infinite-sites model of mutation and genetic drift. Allele-specific PCR helped resolve linkage phases, and a total of 88 distinct haplotypes were identified. For 1,410 (64%) of the 2,211 site pairs, all four possible gametes were present in these haplotypes, reflecting a rich history of past recombination. Despite the strong evidence for recombination, extensive linkage disequilibrium was observed. The number of haplotypes generally is much greater than the number expected under the infinite-sites model, but there was sufficient multisite linkage disequilibrium to reveal two major clades, which appear to be very old. Variation in this region of LPL may depart from the variation expected under a simple, neutral model, owing to complex historical patterns of population founding, drift, selection, and recombination. These data suggest that the design and interpretation of disease-association studies may not be as straightforward as often is assumed. PMID:9683608

  13. Differentiation of Campylobacter coli, Campylobacter jejuni, Campylobacter lari, and Campylobacter upsaliensis by a Multiplex PCR Developed from the Nucleotide Sequence of the Lipid A Gene lpxA

    PubMed Central

    Klena, John D.; Parker, Craig T.; Knibb, Krista; Ibbitt, J. Claire; Devane, Phillippa M. L.; Horn, Sharon T.; Miller, William G.; Konkel, Michael E.

    2004-01-01

    We describe a multiplex PCR assay to identify and discriminate between isolates of Campylobacter coli, Campylobacter jejuni, Campylobacter lari, and Campylobacter upsaliensis. The C. jejuni isolate F38011 lpxA gene, encoding a UDP-N-acetylglucosamine acyltransferase, was identified by sequence analysis of an expression plasmid that restored wild-type lipopolysaccharide levels in Escherichia coli strain SM105 [lpxA(Ts)]. With oligonucleotide primers developed to the C. jejuni lpxA gene, nearly full-length lpxA amplicons were amplified from an additional 11 isolates of C. jejuni, 20 isolates of C. coli, 16 isolates of C. lari, and five isolates of C. upsaliensis. The nucleotide sequence of each amplicon was determined, and sequence alignment revealed a high level of species discrimination. Oligonucleotide primers were constructed to exploit species differences, and a multiplex PCR assay was developed to positively identify isolates of C. coli, C. jejuni, C. lari, and C. upsaliensis. We characterized an additional set of 41 thermotolerant isolates by partial nucleotide sequence analysis to further demonstrate the uniqueness of each species-specific region. The multiplex PCR assay was validated with 105 genetically defined isolates of C. coli, C. jejuni, C. lari, and C. upsaliensis, 34 strains representing 12 additional Campylobacter species, and 24 strains representing 19 non-Campylobacter species. Application of the multiplex PCR method to whole-cell lysates obtained from 108 clinical and environmental thermotolerant Campylobacter isolates resulted in 100% correlation with biochemical typing methods. PMID:15583280

  14. SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

    PubMed

    Hagopian, Raffi; Davidson, John R; Datta, Ruchira S; Samad, Bushra; Jarvis, Glen R; Sjölander, Kimmen

    2010-07-01

    We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.

  15. Identification of protein-interacting nucleotides in a RNA sequence using composition profile of tri-nucleotides.

    PubMed

    Panwar, Bharat; Raghava, Gajendra P S

    2015-04-01

    The RNA-protein interactions play a diverse role in the cells, thus identification of RNA-protein interface is essential for the biologist to understand their function. In the past, several methods have been developed for predicting RNA interacting residues in proteins, but limited efforts have been made for the identification of protein-interacting nucleotides in RNAs. In order to discriminate protein-interacting and non-interacting nucleotides, we used various classifiers (NaiveBayes, NaiveBayesMultinomial, BayesNet, ComplementNaiveBayes, MultilayerPerceptron, J48, SMO, RandomForest, SMO and SVM(light)) for prediction model development using various features and achieved highest 83.92% sensitivity, 84.82 specificity, 84.62% accuracy and 0.62 Matthew's correlation coefficient by SVM(light) based models. We observed that certain tri-nucleotides like ACA, ACC, AGA, CAC, CCA, GAG, UGA, and UUU preferred in protein-interaction. All the models have been developed using a non-redundant dataset and are evaluated using five-fold cross validation technique. A web-server called RNApin has been developed for the scientific community (http://crdd.osdd.net/raghava/rnapin/).

  16. Multiple sequence alignment with arbitrary gap costs: computing an optimal solution using polyhedral combinatorics.

    PubMed

    Althaus, Ernst; Caprara, Alberto; Lenhof, Hans-Peter; Reinert, Knut

    2002-01-01

    Multiple sequence alignment is one of the dominant problems in computational molecular biology. Numerous scoring functions and methods have been proposed, most of which result in NP-hard problems. In this paper we propose for the first time a general formulation for multiple alignment with arbitrary gap-costs based on an integer linear program (ILP). In addition we describe a branch-and-cut algorithm to effectively solve the ILP to optimality. We evaluate the performances of our approach in terms of running time and quality of the alignments using the BAliBase database of reference alignments. The results show that our implementation ranks amongst the best programs developed so far.

  17. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments.

    PubMed

    Schwarz, Roland F; Tamuri, Asif U; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M; Schultz, Jörg; Goldman, Nick

    2016-05-05

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles).

  18. The bioinformatics of nucleotide sequence coding for proteins requiring metal coenzymes and proteins embedded with metals

    NASA Astrophysics Data System (ADS)

    Tremberger, G.; Dehipawala, Sunil; Cheung, E.; Holden, T.; Sullivan, R.; Nguyen, A.; Lieberman, D.; Cheung, T.

    2015-09-01

    All metallo-proteins need post-translation metal incorporation. In fact, the isotope ratio of Fe, Cu, and Zn in physiology and oncology have emerged as an important tool. The nickel containing F430 is the prosthetic group of the enzyme methyl coenzyme M reductase which catalyzes the release of methane in the final step of methano-genesis, a prime energy metabolism candidate for life exploration space mission in the solar system. The 3.5 Gyr early life sulfite reductase as a life switch energy metabolism had Fe-Mo clusters. The nitrogenase for nitrogen fixation 3 billion years ago had Mo. The early life arsenite oxidase needed for anoxygenic photosynthesis energy metabolism 2.8 billion years ago had Mo and Fe. The selection pressure in metal incorporation inside a protein would be quantifiable in terms of the related nucleotide sequence complexity with fractal dimension and entropy values. Simulation model showed that the studied metal-required energy metabolism sequences had at least ten times more selection pressure relatively in comparison to the horizontal transferred sequences in Mealybug, guided by the outcome histogram of the correlation R-sq values. The metal energy metabolism sequence group was compared to the circadian clock KaiC sequence group using magnesium atomic level bond shifting mechanism in the protein, and the simulation model would suggest a much higher selection pressure for the energy life switch sequence group. The possibility of using Kepler 444 as an example of ancient life in Galaxy with the associated exoplanets has been proposed and is further discussed in this report. Examples of arsenic metal bonding shift probed by Synchrotron-based X-ray spectroscopy data and Zn controlled FOXP2 regulated pathways in human and chimp brain studied tissue samples are studied in relationship to the sequence bioinformatics. The analysis results suggest that relatively large metal bonding shift amount is associated with low probability correlation R

  19. Power Spectrum and Mutual Information Analyses of DNA Base (Nucleotide) Sequences

    NASA Astrophysics Data System (ADS)

    Isohata, Yasuhiko; Hayashi, Masaki

    2003-03-01

    On the basis of the power spectrum analyses for the base (nucleotide) sequences of various genes, we have studied long-range correlations in total base sequences which are expressed as 1/fα, behaviour of the exponent α for the accumulated base sequences as well as periodicities at short range. In particular from the analysis of content rate distributions of α we have obtained the average value \\barα=0.40± 0.01 and \\barα=0.20± 0.01 for the human genes and S. cerevisiae genes, respectively. We have also performed the analyses using the mutual information function. We show that there exists a clear difference between the content rate distributions of correlation lengths for the sample human genes and the S. cerevisiae genes. We are led to a conjecture that the elongation of the correlation length in the base sequences of genes from the early eukaryote (S. cerevisiae) to the late eukaryote (human) should be the definite reflection of the evolutionary process.

  20. Feasibility of mini-sequencing schemes based on nucleotide polymorphisms for microbial identification and population analyses.

    PubMed

    Araujo, Ricardo; Eusebio, Nadia; Caramalho, Rita

    2015-03-01

    Practical schemes based on single nucleotide polymorphisms (SNP) have been proposed as alternatives to simplify and replace the molecular methodologies based on the extensive sequencing analysis of genes. SNaPshot mini-sequencing has been progressively experienced during the last decade and represents a fast and robust strategy to analyze critical polymorphisms. Such assays have been proposed to characterize some bacteria and microbial eukaryotes, and its feasibility was now reviewed in the present manuscript. The mini-sequencing schemes showed high discriminatory power and competence for identification of microorganisms, but some specificity errors were still found, particularly for species of the Burkholderia cepacia complex and mycobacteria. SNP assays designed for other goals, e.g., comparison of strains, detection of serotypes, virulence, epidemic, and phylogenetic-related subgroups of isolates, can be very useful by facilitating the investigation of large collections of isolates. The next-generation of SNP assays might consider the inclusion of large number of markers to fully characterize microbial taxonomy and strains; nevertheless, these new technologies are still prone to errors and can largely benefit from integration with well-established mini-sequencing assays. Newly proposed molecular tools should be systematically tested in collections of isolates with high indexes of diversity and guarantee interlaboratorial validation.

  1. The nucleotide sequence of a Polish isolate of Tomato torrado virus.

    PubMed

    Budziszewska, Marta; Obrepalska-Steplowska, Aleksandra; Wieczorek, Przemysław; Pospieszny, Henryk

    2008-12-01

    A new virus was isolated from greenhouse tomato plants showing symptoms of leaf and apex necrosis in Wielkopolska province in Poland in 2003. The observed symptoms and the virus morphology resembled viruses previously reported in Spain called Tomato torrado virus (ToTV) and that in Mexico called Tomato marchitez virus (ToMarV). The complete genome of a Polish isolate Wal'03 was determined using RT-PCR amplification using oligonucleotide primers developed against the ToTV sequences deposited in Genbank, followed by cloning, sequencing, and comparison with the sequence of the type isolate. Phylogenetic analyses, performed on the basis of fragments of polyproteins sequences, established the relationship of Polish isolate Wal'03 with Spanish ToTV and Mexican ToMarV, as well as with other viruses from Sequivirus, Sadwavirus, and Cheravirus genera, reported to be the most similar to the new tomato viruses. Wal'03 genome strands has the same organization and very high homology with the ToTV type isolate, showing only some nucleotide and deduced amino acid changes, in contrast to ToMarV, which was significantly different. The phylogenetic tree clustered aforementioned viruses to the same group, indicating that they have a common origin.

  2. Increased functional protein expression using nucleotide sequence features enriched in highly expressed genes in zebrafish.

    PubMed

    Horstick, Eric J; Jordan, Diana C; Bergeron, Sadie A; Tabor, Kathryn M; Serpe, Mihaela; Feldman, Benjamin; Burgess, Harold A

    2015-04-20

    Many genetic manipulations are limited by difficulty in obtaining adequate levels of protein expression. Bioinformatic and experimental studies have identified nucleotide sequence features that may increase expression, however it is difficult to assess the relative influence of these features. Zebrafish embryos are rapidly injected with calibrated doses of mRNA, enabling the effects of multiple sequence changes to be compared in vivo. Using RNAseq and microarray data, we identified a set of genes that are highly expressed in zebrafish embryos and systematically analyzed for enrichment of sequence features correlated with levels of protein expression. We then tested enriched features by embryo microinjection and functional tests of multiple protein reporters. Codon selection, releasing factor recognition sequence and specific introns and 3' untranslated regions each increased protein expression between 1.5- and 3-fold. These results suggested principles for increasing protein yield in zebrafish through biomolecular engineering. We implemented these principles for rational gene design in software for codon selection (CodonZ) and plasmid vectors incorporating the most active non-coding elements. Rational gene design thus significantly boosts expression in zebrafish, and a similar approach will likely elevate expression in other animal models.

  3. Nucleotide sequence of the glucoamylase gene GLU1 in the yeast Saccharomycopsis fibuligera.

    PubMed Central

    Itoh, T; Ohtsuki, I; Yamashita, I; Fukui, S

    1987-01-01

    The complete nucleotide sequence of the glucoamylase gene GLU1 from the yeast Saccharomycopsis fibuligera has been determined. The GLU1 DNA hybridized to a polyadenylated RNA of 2.1 kilobases. A single open reading frame codes for a 519-amino-acid protein which contains four potential N-glycosylation sites. The putative precursor begins with a hydrophobic segment that presumably acts as a signal sequence for secretion. Glucoamylase was purified from a culture fluid of the yeast Saccharomyces cerevisiae which had been transformed with a plasmid carrying GLU1. The molecular weight of the protein was 57,000 by both gel filtration and acrylamide gel electrophoresis. The protein was glycosylated with asparagine-linked glycosides whose molecular weight was 2,000. The amino-terminal sequence of the protein began from the 28th amino acid residue from the first methionine of the putative precursor. The amino acid composition of the purified protein matched the predicted amino acid composition. These results confirmed that GLU1 encodes glucoamylase. A comparison of the amino acid sequence of glucoamylases from several fungi and yeast shows five highly conserved regions. One homology region is absent from the yeast enzyme and so may not be essential to glucoamylase function. Images PMID:3114236

  4. The Complete Nucleotide Sequence and Biotype Variability of Papaya leaf distortion mosaic virus.

    PubMed

    Maoka, Tetsuo; Hataya, Tatsuji

    2005-02-01

    ABSTRACT The complete nucleotide sequence of the genome of Papaya leaf distortion mosaic virus (PLDMV) was determined. The viral RNA genome of strain LDM (leaf distortion mosaic) comprised 10,153 nucleotides, excluding the poly(A) tail, and contained one long open reading frame encoding a polyprotein of 3,269 amino acids (molecular weight 373,347). The polyprotein contained nine putative proteolytic cleavage sites and some motifs conserved in other potyviral polyproteins with 44 to 50% identities, indicating that PLDMV is a distinct species in the genus Potyvirus. Like the W biotype of Papaya ringspot virus (PRSV), the non-papaya-infecting biotype of PLDMV (PLDMV-C) was found in plants of the family Cucurbitaceae. The coat protein (CP) sequence of PLDMV-C in naturally infected-Trichosanthes bracteata was compared with those of three strains of the P biotype (PLDMV-P), LDM and two additional strains M (mosaic) and YM (yellow mosaic), which are biologically different from each other. The CP sequences of three strains of PLDMV-P share high identities of 95 to 97%, while they share lower identities of 88 to 89% with that of PLDMV-C. Significant changes in hydrophobicity and a deletion of two amino acids at the N-terminal region of the CP of PLDMV-C were observed. The finding of two biotypes of PLDMV implies the possibility that the papaya-infecting biotype evolved from the cucurbitaceae-infecting potyvirus, as has been previously suggested for PRSV. In addition, a similar evolutionary event acquiring infectivity to papaya may arise frequently in viruses in the family Cucurbitaceae.

  5. Cloning and genomic nucleotide sequence of the matrix attachment region binding protein from the halotolerant alga Dunaliella salina.

    PubMed

    Wang, Peng-Ju; Wang, Tian-Yun; Wang, Ya-Feng; Yang, Rui; Li, Zhao-Xi

    2013-07-01

    In our previous study, the sequence of a matrix attachment region binding protein (MBP) cDNA was cloned from the unicellular green alga Dunaliella salina. However, the nucleotide sequence of this gene has not been reported so far. In this paper, the nucleotide sequence of MBP was cloned and characterized, and its gene copy number was determined. The MBP nucleotide sequence is 5641 bp long, and interrupted by 12 introns ranging from 132 to 562 bp. All the introns in the D. salina MBP gene have orthodox splice sites, exhibiting GT at the 5' end and AG at the 3' end. Southern blot analysis showed that MBP only has one copy in the D. salina genome.

  6. Complete nucleotide sequences of two isolates of cherry green ring mottle virus from peach (Prunus persica) in China.

    PubMed

    Wang, Lihui; Jiang, Dongmei; Niu, Feiqing; Lu, Meiguang; Wang, Hongqing; Li, Shifang

    2013-03-01

    Two complete nucleotide sequences of cherry green ring mottle virus (CGRMV) isolated from peach in Hebei (Hs10) and Fujian (F9) Provinces, China, were determined. Five open reading frames (ORFs) were found in the genomes of both isolates. The F9 and Hs10 isolates shared 82.2 % and 83.4-94.4 % nucleotide sequence identity, respectively, with two CGRMV isolates from cherry. Analysis of the nucleotide and amino acid sequences from the five ORFs of both isolates showed that Hs10 shares the greatest sequence identity with P1A (GenBank AJ291761) from cherry. Phylogenetic analysis indicated that CGRMV isolates from peach and cherry are closely related to members of the genus Foveavirus.

  7. Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees

    PubMed Central

    2010-01-01

    Background Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. Results ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Conclusions Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking

  8. The nucleotide sequence of the human int-1 mammary oncogene; evolutionary conservation of coding and non-coding sequences.

    PubMed Central

    van Ooyen, A; Kwee, V; Nusse, R

    1985-01-01

    The mouse mammary tumor virus can induce mammary tumors in mice by proviral activation of an evolutionarily conserved cellular oncogene called int-1. Here we present the nucleotide sequence of the human homologue of int-1, and compare it with the mouse gene. Like the mouse gene, the human homologue contains a reading frame of 370 amino acids, with only four substitutions. The amino acid changes are all in the hydrophobic leader domain of the int-1 encoded protein, and do not significantly alter its hydropathic index. The conservation between the mouse and the human int-1 genes is not restricted to exons; extensive parts of the introns are also homologous. Thus, int-1 ranks among the most conserved genes known, a property shared with other oncogenes. PMID:2998762

  9. DNAAlignEditor: DNA alignment editor tool

    PubMed Central

    Sanchez-Villeda, Hector; Schroeder, Steven; Flint-Garcia, Sherry; Guill, Katherine E; Yamasaki, Masanori; McMullen, Michael D

    2008-01-01

    Background With advances in DNA re-sequencing methods and Next-Generation parallel sequencing approaches, there has been a large increase in genomic efforts to define and analyze the sequence variability present among individuals within a species. For very polymorphic species such as maize, this has lead to a need for intuitive, user-friendly software that aids the biologist, often with naïve programming capability, in tracking, editing, displaying, and exporting multiple individual sequence alignments. To fill this need we have developed a novel DNA alignment editor. Results We have generated a nucleotide sequence alignment editor (DNAAlignEditor) that provides an intuitive, user-friendly interface for manual editing of multiple sequence alignments with functions for input, editing, and output of sequence alignments. The color-coding of nucleotide identity and the display of associated quality score aids in the manual alignment editing process. DNAAlignEditor works as a client/server tool having two main components: a relational database that collects the processed alignments and a user interface connected to database through universal data access connectivity drivers. DNAAlignEditor can be used either as a stand-alone application or as a network application with multiple users concurrently connected. Conclusion We anticipate that this software will be of general interest to biologists and population genetics in editing DNA sequence alignments and analyzing natural sequence variation regardless of species, and will be particularly useful for manual alignment editing of sequences in species with high levels of polymorphism. PMID:18366684

  10. Nucleotide sequences related to the transforming gene of avian sarcoma virus are present in DNA of uninfected vertebrates.

    PubMed

    Spector, D H; Varmus, H E; Bishop, J M

    1978-09-01

    We have detected nucleotide sequences related to the transforming gene of avian sarcoma vius (ASV) in the DNA of uninfected vertebrates. Purified radioactive DNA (cDNAsarc) complementary to most of all of the gene (src) required for transformation of fibroblasts by ASV was annealed with DNA from a variety of normal species. Under conditions that facilitate pairing of partially matched nucleotide sequences (1.5 M NaCl, 59 degrees), cDNAsarc formed duplexes with chicken, human, calf, mouse, and salmon DNA but not with DNA from sea urchin, Drosophila, or Escherichia coli. The kinetics of duplex formation indicated that cDNAsarc was reacting with nucleotide sequences present in a single copy or at most a few copies per cell. In contrast to the preceding findings, nucleotide sequences complementary to the remainder of the ASV genome were observed only in chicken DNA. Thermal denaturation studies of the duplexes formed with cDNAsarc indicated a high degree of conservation of the nucleotide sequences related to src in vertebrate DNAs; the reductions in melting temperature suggested about 3--4% mismatching of cDNAsarc with chicken DNA and 8--10% mismatching of cDNAsarc with the other vertebrate DNAs.

  11. Probabilistic sequence alignment of Late Pleistocene benthic δ18O data

    NASA Astrophysics Data System (ADS)

    Lawrence, C.; Lin, L.; Lisiecki, L. E.; Stern, J.

    2013-12-01

    The stratigraphic alignment of ocean sediment cores plays a vital role in paleoceanographic research because it is used to develop mutually consistent age models for climate proxies measured in these cores. The most common proxy used for alignment is the The stratigraphic alignment of ocean sediment cores plays a vital role in paleoceanographic research because it is used to develop mutually consistent age models for climate proxies measured in these cores. The most common proxy used for alignment is the δ18O of calcite from benthic or planktonic foraminifera because a large fraction of δ18O variance derives from the global signal of ice volume. To date, alignment has been performed either by manual, qualitative comparison or by deterministic algorithms (Martinson, Pisias et al. Quat. Res. 27 1987; Lisiecki and Lisiecki Paleoceanography 17, 2002; Huybers and Wunsch, Paleoceanography 19, 2004). Here we present a probabilistic sequence alignment algorithm which provides 95% confidence bands for the alignment of pairs of benthic δ18O records. The probabilistic algorithm presented here is based on a hidden Markov model (HMM) (Levinson, Rabiner et al. Bell Systems Technical Journal, 62,1983) similar to those that have been used extensively to align DNA and protein sequences (Durbin, Eddy et al. Biological Sequence Analysis, Ch. 4, 1998). However, here the need to the alignment of sequences stems from expansion and/or contraction in the records due to changes in sedimentation rates rather than the insertion or deletion of residues. Transition probabilities that are used in this HMM to model changes in sedimentation rates are based on radiocarbon estimates of sedimentation rates. The probabilistic algorithm considers all possible alignments with these predefined sedimentation rates. Exact calculations are completed using dynamic programming recursions. The algorithm yields the probability distributions of the age at each point in the record, which are probabilistically

  12. Plastid sequence evolution: a new pattern of nucleotide substitutions in the Cucurbitaceae.

    PubMed

    Decker-Walters, Deena S; Chung, Sang-Min; Staub, Jack E

    2004-05-01

    Nucleotide substitutions (i.e., point mutations) are the primary driving force in generating DNA variation upon which selection can act. Substitutions called transitions, which entail exchanges between purines (A = adenine, G = guanine) or pyrimidines (C = cytosine, T = thymine), typically outnumber transversions (e.g., exchanges between a purine and a pyrimidine) in a DNA strand. With an increasing number of plant studies revealing a transversion rather than transition bias, we chose to perform a detailed substitution analysis for the plant family Cucurbitaceae using data from several short plastid DNA sequences. We generated a phylogenetic tree for 19 taxa of the tribe Benincaseae and related genera and then scored conservative substitution changes (e.g., those not exhibiting homoplasy or reversals) from the unambiguous branches of the tree. Neither the transition nor (A+T)/(G+C) biases found in previous studies were supported by our overall data. More importantly, we found a novel and symmetrical substitution bias in which Gs had been preferentially replaced by A, As by C, Cs by T, and Ts by G, resulting in the G-->A-->C-->T-->G substitution series. Understanding this pattern will lead to new hypotheses concerning plastid evolution, which in turn will affect the choices of substitution models and other tree-building algorithms for phylogenetic analyses based on nucleotide data.

  13. Analysis of the genome sequence of the pathogenic Muscovy duck parvovirus strain YY reveals a 14-nucleotide-pair deletion in the inverted terminal repeats.

    PubMed

    Wang, Jianye; Huang, Yu; Zhou, Mingxu; Zhu, Guoqiang

    2016-09-01

    Genomic information about Muscovy duck parvovirus is still limited. In this study, the genome of the pathogenic MDPV strain YY was sequenced. The full-length genome of YY is 5075 nucleotides (nt) long, 57 nt shorter than that of strain FM. Sequence alignment indicates that the 5' and 3' inverted terminal repeats (ITR) of strain YY contain a 14-nucleotide-pair deletion in the stem of the palindromic hairpin structure in comparison to strain FM and FZ91-30. The deleted region contains one "E-box" site and one repeated motif with the sequence "TTCCGGT" or "ACCGGAA". Phylogenetic trees constructed based the protein coding genes concordantly showed that YY, together with nine other MDPV isolates from various places, clustered in a separate branch, distinct from the branch formed by goose parvovirus (GPV) strains. These results demonstrate that, despite the distinctive deletion, the YY strain still belongs to the classical MDPV group. Moreover, the deletion of ITR may contribute to the genome evolution of MDPV under immunization pressure.

  14. Nucleotide sequence of the gene ereA encoding the erythromycin esterase in Escherichia coli.

    PubMed

    Ounissi, H; Courvalin, P

    1985-01-01

    We have cloned and determined the nucleotide sequence of the gene ereA of plasmid pIP1100 which confers high-level resistance to erythromycin (Em) in Escherichia coli. The gene was defined by initiation and termination codons and by in vitro insertion-inactivation into an open reading frame (ORF) of 1032 bp corresponding to a product with an Mr of 37 765. However, the enzyme, an Em esterase, displayed an apparent Mr of 43 000 upon electrophoresis of a minicell extract on the SDS-polyacrylamide gels. The G + C content (50.5%) of the gene ereA and the preferential codon usage in its ORF suggest that this resistance determinant should be indigenous to E. coli.

  15. Developing single nucleotide polymorphism (SNP) markers from transcriptome sequences for identification of longan (Dimocarpus longan) germplasm

    PubMed Central

    Wang, Boyi; Tan, Hua-Wei; Fang, Wanping; Meinhardt, Lyndel W; Mischke, Sue; Matsumoto, Tracie; Zhang, Dapeng

    2015-01-01

    Longan (Dimocarpus longan Lour.) is an important tropical fruit tree crop. Accurate varietal identification is essential for germplasm management and breeding. Using longan transcriptome sequences from public databases, we developed single nucleotide polymorphism (SNP) markers; validated 60 SNPs in 50 longan germplasm accessions, including cultivated varieties and wild germplasm; and designated 25 SNP markers that unambiguously identified all tested longan varieties with high statistical rigor (P<0.0001). Multiple trees from the same clone were verified and off-type trees were identified. Diversity analysis revealed genetic relationships among analyzed accessions. Cultivated varieties differed significantly from wild populations (Fst=0.300; P<0.001), demonstrating untapped genetic diversity for germplasm conservation and utilization. Within cultivated varieties, apparent differences between varieties from China and those from Thailand and Hawaii indicated geographic patterns of genetic differentiation. These SNP markers provide a powerful tool to manage longan genetic resources and breeding, with accurate and efficient genotype identification. PMID:26504559

  16. Cloning, overexpression and nucleotide sequence of a thermostable DNA ligase-encoding gene.

    PubMed

    Barany, F; Gelfand, D H

    1991-12-20

    Thermostable DNA ligase has been harnessed for the detection of single-base genetic diseases using the ligase chain reaction [Barany, Proc. Natl. Acad. Sci. USA 88 (1991) 189-193]. The Thermus thermophilus (Tth) DNA ligase-encoding gene (ligT) was cloned in Escherichia coli by genetic complementation of a ligts 7 defect in an E. coli host. Nucleotide sequence analysis of the gene revealed a single chain of 676 amino acid residues with 47% identity to the E. coli ligase. Under phoA promoter control, Tth ligase was overproduced to greater than 10% of E. coli cellular proteins. Adenylated and deadenylated forms of the purified enzyme were distinguished by apparent molecular weights of 81 kDa and 78 kDa, respectively, after separation via sodium dodecyl sulfate-polyacrylamide-gel electrophoresis.

  17. MGAlignIt: A web service for the alignment of mRNA/EST and genomic sequences.

    PubMed

    Lee, Bernett T K; Tan, Tin Wee; Ranganathan, Shoba

    2003-07-01

    Splicing is a biological phenomenon that removes the non-coding sequence from the transcripts to produce a mature transcript suitable for translation. To study this phenomenon, information on the intron-exon arrangement of a gene is essential, usually obtained by aligning mRNA/EST sequences to their cognate genomic sequences. MGAlign is a novel, rapid, memory efficient and practical method for aligning mRNA/EST and genome sequences. We present here a freely available web service, MGAlignIt (http://origin.bic.nus.edu.sg/mgalign/mgalignit), based on MGAlign. Besides the alignment itself, this web service allows users to effectively visualize the alignment in a graphical manner and to perform limited analysis on the alignment output. The server also permits the alignment to be saved in several forms, both graphical and text, suitable for further processing and analysis by other programs.

  18. Nucleotide sequence of the genetic loci encoding subunits of Bradyrhizobium japonicum uptake hydrogenase.

    PubMed Central

    Sayavedra-Soto, L A; Powell, G K; Evans, H J; Morris, R O

    1988-01-01

    An indispensable part of the hydrogen-recycling system in Bradyrhizobium japonicum is the uptake hydrogenase, which is composed of 34.5- and 65.9-kDa subunits. The gene encoding the large subunit is located on a 5.9-kilobase fragment of the H2-uptake-complementing cosmid pHU52 [Zuber, M., Harker, A.R., Sultana, M.A. & Evans, H.J. (1986) Proc. Natl. Acad. Sci. USA 83, 7668-7672]. We have now determined that the structural genes for both subunits are present on this fragment. Two open reading frames are present that correspond in size and deduced amino acid sequence to the hydrogenase subunits, except that the small-subunit coding region contains a leader peptide of 46 amino acids. The two genes are separated by a 32-nucleotide intergenic region and likely constitute an operon. Comparison of the deduced amino acid sequences of the B. japonicum genes with those from Desulfovibrio gigas, Desulfovibrio baculatus, and Rhodobacter capsulatus indicates significant sequence identity. Images PMID:3054886

  19. Mining for single nucleotide polymorphisms and insertions / deletions in expressed sequence tag libraries of oil palm.

    PubMed

    Riju, Aykkal; Chandrasekar, Arumugam; Arunachalam, Vadivel

    2007-01-01

    The oil palm is a tropical oil bearing tree. Recently EST-derived SNPs and SSRs are a free by-product of the currently expanding EST (Expressed Sequence Tag) data bases. The development of high-throughput methods for the detection of SNPs (Single Nucleotide Polymorphism) and small indels (insertion / deletion) has led to a revolution in their use as molecular markers. Available (5452) Oil palm EST sequences were mined from dbEST of NCBI. CAP3 program was used to assemble EST sequences into contigs. Candidate SNPs and Indel polymorphisms were detected using the perl script auto_snip version 1.0 which has used 576 ESTs for detecting SNPs and Indel sites. We found 1180 SNP sites and 137 indel polymorphisms with frequency 1.36 SNPs / 100 bp. Among the six tissues from which the EST libraries had been generated, mesocarp had high frequency of 2.91 SNPs and indels per 100 bp whereas the zygotic embryos had lowest frequency of 0.15 per 100 bp. We also used the Shannon index to analyze the proportion of ten possible types of SNP/indels. ESTs from tissues of normal apex showed highest values of Shannon index (0.60) whereas abnormal apex had least value (0.02). The present report deals the use of Shannon index for comparing SNP/ indel frequencies mined from ESTlibraries and also confirm that the frequency of SNP occurrence in oil palm to use them as markers for genetic studies.

  20. Cloning and nucleotide sequence of a specific DNA fragment from Paracoccidioides brasiliensis.

    PubMed

    Goldani, L Z; Maia, A L; Sugar, A M

    1995-06-01

    We cloned and sequenced a species-specific 110-bp DNA fragment from Paracoccidioides brasiliensis. The DNA fragment was generated by PCR with primers complementary to the rat beta-actin gene under a low annealing temperature. Comparison of the nucleotide sequence, after excluding the primers, with those in the GenBank database identified approximately 60% homology with an exon of a major surface glycoprotein gene from Pneumocystis carinii and a fragment of unknown function in Saccharomyces cerevisiae chromosome VIII. By Southern hybridization analysis, the 32P-labelled fragment detected 1.0- and 1.9-kb restriction fragments within whole-cell genomic DNA of P. brasiliensis digested with HindIII and PstI, respectively, but failed to hybridize to genomic DNAs from Candida albicans, Blastomyces dermatitidis, Cryptococcus neoformans, Aspergillus fumigatus, Saccharomyces cerevisiae, Pneumocystis carinii, rat tissue, or humans under low-stringency hybridization conditions. Additionally, the specific DNA fragment from three different P. brasiliensis isolates (Pb18, RP18, RP17) was amplified by PCR with primers mostly complementary to nonactin sequences of the 110-bp DNA fragment. In contrast, there were no amplified products from other fungus genomic DNAs previously tested, including Histoplasma capsulatum. To date, this is the first species-specific DNA fragment cloned from P. brasiliensis which might be useful as a diagnostic marker for the identification and classification of different P. brasiliensis isolates.

  1. Modulation of base excision repair of 8-oxoguanine by the nucleotide sequence.

    PubMed

    Allgayer, Julia; Kitsera, Nataliya; von der Lippen, Carina; Epe, Bernd; Khobta, Andriy

    2013-10-01

    8-Oxoguanine (8-oxoG) is a major product of oxidative DNA damage, which induces replication errors and interferes with transcription. By varying the position of single 8-oxoG in a functional gene and manipulating the nucleotide sequence surrounding the lesion, we found that the degree of transcriptional inhibition is independent of the distance from the transcription start or the localization within the transcribed or the non-transcribed DNA strand. However, it is strongly dependent on the sequence context and also proportional to cellular expression of 8-oxoguanine DNA glycosylase (OGG1)-demonstrating that transcriptional arrest does not take place at unrepaired 8-oxoG and proving a causal connection between 8-oxoG excision and the inhibition of transcription. We identified the 5'-CAGGGC[8-oxoG]GACTG-3' motif as having only minimal transcription-inhibitory potential in cells, based on which we predicted that 8-oxoG excision is particularly inefficient in this sequence context. This anticipation was fully confirmed by direct biochemical assays. Furthermore, in DNA containing a bistranded Cp[8-oxoG]/Cp[8-oxoG] clustered lesion, the excision rates differed between the two strands at least by a factor of 9, clearly demonstrating that the excision preference is defined by the DNA strand asymmetry rather than the overall geometry of the double helix or local duplex stability.

  2. Cloning and nucleotide sequence of a specific DNA fragment from Paracoccidioides brasiliensis.

    PubMed Central

    Goldani, L Z; Maia, A L; Sugar, A M

    1995-01-01

    We cloned and sequenced a species-specific 110-bp DNA fragment from Paracoccidioides brasiliensis. The DNA fragment was generated by PCR with primers complementary to the rat beta-actin gene under a low annealing temperature. Comparison of the nucleotide sequence, after excluding the primers, with those in the GenBank database identified approximately 60% homology with an exon of a major surface glycoprotein gene from Pneumocystis carinii and a fragment of unknown function in Saccharomyces cerevisiae chromosome VIII. By Southern hybridization analysis, the 32P-labelled fragment detected 1.0- and 1.9-kb restriction fragments within whole-cell genomic DNA of P. brasiliensis digested with HindIII and PstI, respectively, but failed to hybridize to genomic DNAs from Candida albicans, Blastomyces dermatitidis, Cryptococcus neoformans, Aspergillus fumigatus, Saccharomyces cerevisiae, Pneumocystis carinii, rat tissue, or humans under low-stringency hybridization conditions. Additionally, the specific DNA fragment from three different P. brasiliensis isolates (Pb18, RP18, RP17) was amplified by PCR with primers mostly complementary to nonactin sequences of the 110-bp DNA fragment. In contrast, there were no amplified products from other fungus genomic DNAs previously tested, including Histoplasma capsulatum. To date, this is the first species-specific DNA fragment cloned from P. brasiliensis which might be useful as a diagnostic marker for the identification and classification of different P. brasiliensis isolates. PMID:7650207

  3. Complete nucleotide sequence of the mitochondrial genome of a salamander, Mertensiella luschani.

    PubMed

    Zardoya, Rafael; Malaga-Trillo, Edward; Veith, Michael; Meyer, Axel

    2003-10-23

    The complete nucleotide sequence (16,650 bp) of the mitochondrial genome of the salamander Mertensiella luschani (Caudata, Amphibia) was determined. This molecule conforms to the consensus vertebrate mitochondrial gene order. However, it is characterized by a long non-coding intervening sequence with two 124-bp repeats between the tRNA(Thr) and tRNA(Pro) genes. The new sequence data were used to reconstruct a phylogeny of jawed vertebrates. Phylogenetic analyses of all mitochondrial protein-coding genes at the amino acid level recovered a robust vertebrate tree in which lungfishes are the closest living relatives of tetrapods, salamanders and frogs are grouped together to the exclusion of caecilians (the Batrachia hypothesis) in a monophyletic amphibian clade, turtles show diapsid affinities and are placed as sister group of crocodiles+birds, and the marsupials are grouped together with monotremes and basal to placental mammals. The deduced phylogeny was used to characterize the molecular evolution of vertebrate mitochondrial proteins. Amino acid frequencies were analyzed across the main lineages of jawed vertebrates, and leucine and cysteine were found to be the most and least abundant amino acids in mitochondrial proteins, respectively. Patterns of amino acid replacements were conserved among vertebrates. Overall, cartilaginous fishes showed the least variation in amino acid frequencies and replacements. Constancy of rates of evolution among the main lineages of jawed vertebrates was rejected.

  4. Complete Nucleotide Sequence of Watermelon Chlorotic Stunt Virus Originating from Oman

    PubMed Central

    Khan, Akhtar J.; Akhtar, Sohail; Briddon, Rob W.; Ammara, Um; Al-Matrooshi, Abdulrahman M.; Mansoor, Shahid

    2012-01-01

    Watermelon chlorotic stunt virus (WmCSV) is a bipartite begomovirus (genus Begomovirus, family Geminiviridae) that causes economic losses to cucurbits, particularly watermelon, across the Middle East and North Africa. Recently squash (Cucurbita moschata) grown in an experimental field in Oman was found to display symptoms such as leaf curling, yellowing and stunting, typical of a begomovirus infection. Sequence analysis of the virus isolated from squash showed 97.6–99.9% nucleotide sequence identity to previously described WmCSV isolates for the DNA A component and 93–98% identity for the DNA B component. Agrobacterium-mediated inoculation to Nicotiana benthamiana resulted in the development of symptoms fifteen days post inoculation. This is the first bipartite begomovirus identified in Oman. Overall the Oman isolate showed the highest levels of sequence identity to a WmCSV isolate originating from Iran, which was confirmed by phylogenetic analysis. This suggests that WmCSV present in Oman has been introduced from Iran. The significance of this finding is discussed. PMID:22852046

  5. High-throughput nucleotide sequence analysis of diverse bacterial communities in leachates of decomposing pig carcasses.

    PubMed

    Yang, Seung Hak; Lim, Joung Soo; Khan, Modabber Ahmed; Kim, Bong Soo; Choi, Dong Yoon; Lee, Eun Young; Ahn, Hee Kwon

    2015-01-01

    The leachate generated by the decomposition of animal carcass has been implicated as an environmental contaminant surrounding the burial site. High-throughput nucleotide sequencing was conducted to investigate the bacterial communities in leachates from the decomposition of pig carcasses. We acquired 51,230 reads from six different samples (1, 2, 3, 4, 6 and 14 week-old carcasses) and found that sequences representing the phylum Firmicutes predominated. The diversity of bacterial 16S rRNA gene sequences in the leachate was the highest at 6 weeks, in contrast to those at 2 and 14 weeks. The relative abundance of Firmicutes was reduced, while the proportion of Bacteroidetes and Proteobacteria increased from 3-6 weeks. The representation of phyla was restored after 14 weeks. However, the community structures between the samples taken at 1-2 and 14 weeks differed at the bacterial classification level. The trend in pH was similar to the changes seen in bacterial communities, indicating that the pH of the leachate could be related to the shift in the microbial community. The results indicate that the composition of bacterial communities in leachates of decomposing pig carcasses shifted continuously during the study period and might be influenced by the burial site.

  6. Nucleotide sequence and the encoded amino acids of human apolipoprotein A-I mRNA.

    PubMed Central

    Law, S W; Brewer, H B

    1984-01-01

    The cDNA clones encoding the precursor form of human liver apolipoprotein A-I (apoA-I), preproapoA-I, have been isolated from a cDNA library. A 17-base synthetic oligonucleotide based on residues 108-113 of apoA-I and a 26-base primer-extended, dideoxynucleotide-terminated cDNA were used as hybridization probes to select for recombinant plasmids bearing the apoA-I sequence. The complete nucleic acid sequence of human liver preproapoA-I has been determined by analysis of the cloned cDNA. The sequence is composed of 801 nucleotides encoding 267 amino acid residues. PreproapoA-I contains an 18-amino-acid prepeptide and a 6-amino-acid propeptide connected to the amino terminus of the 243-amino acid mature apoA-I. Southern blotting analysis of chromosomal DNA obtained from peripheral blood indicated the apoA-I gene is contained in a 2.1-kilobase-pair Pst I fragment and there is no gross difference in structural organization between the normal apoA-I gene and the Tangier disease apoA-I gene. Images PMID:6198645

  7. High-throughput nucleotide sequence analysis of diverse bacterial communities in leachates of decomposing pig carcasses

    PubMed Central

    Yang, Seung Hak; Lim, Joung Soo; Khan, Modabber Ahmed; Kim, Bong Soo; Choi, Dong Yoon; Lee, Eun Young; Ahn, Hee Kwon

    2015-01-01

    The leachate generated by the decomposition of animal carcass has been implicated as an environmental contaminant surrounding the burial site. High-throughput nucleotide sequencing was conducted to investigate the bacterial communities in leachates from the decomposition of pig carcasses. We acquired 51,230 reads from six different samples (1, 2, 3, 4, 6 and 14 week-old carcasses) and found that sequences representing the phylum Firmicutes predominated. The diversity of bacterial 16S rRNA gene sequences in the leachate was the highest at 6 weeks, in contrast to those at 2 and 14 weeks. The relative abundance of Firmicutes was reduced, while the proportion of Bacteroidetes and Proteobacteria increased from 3–6 weeks. The representation of phyla was restored after 14 weeks. However, the community structures between the samples taken at 1–2 and 14 weeks differed at the bacterial classification level. The trend in pH was similar to the changes seen in bacterial communities, indicating that the pH of the leachate could be related to the shift in the microbial community. The results indicate that the composition of bacterial communities in leachates of decomposing pig carcasses shifted continuously during the study period and might be influenced by the burial site. PMID:26500442

  8. The complete nucleotide sequence and genome organization of tomato chlorosis virus.

    PubMed

    Wintermantel, W M; Wisler, G C; Anchieta, A G; Liu, H-Y; Karasev, A V; Tzanetakis, I E

    2005-11-01

    The crinivirus tomato chlorosis virus (ToCV) was discovered initially in diseased tomato and has since been identified as a serious problem for tomato production in many parts of the world, particularly in the United States, Europe and Southeast Asia. The complete nucleotide sequence of ToCV was determined and compared with related crinivirus species. RNA 1 is organized into four open reading frames (ORFs), and encodes proteins involved in replication, based on homology to other viral replication factors. RNA 2 is composed of nine ORFs including genes that encode a HSP70 homolog and two proteins involved in encapsidation of viral RNA, referred to as the coat protein and minor coat protein. Sequence homology between ToCV and other criniviruses varies throughout the viral genome. The minor coat protein (CPm) of ToCV, which forms part of the "rattlesnake tail" of virions and may be involved in determining the unique, broad vector transmissibility of ToCV, is larger than the CPm of lettuce infectious yellows virus (LIYV) by 217 amino acids. Among sequenced criniviruses, considerable variability exists in the size of some viral proteins. Analysis of these differences with respect to biological function may provide insight into the role crinivirus proteins play in virus infection and transmission.

  9. Human ribosomal RNA gene: nucleotide sequence of the transcription initiation region and comparison of three mammalian genes.

    PubMed Central

    Financsek, I; Mizumoto, K; Mishima, Y; Muramatsu, M

    1982-01-01

    The transcription initiation site of the human ribosomal RNA gene (rDNA) was located by using the single-strand specific nuclease protection method and by determining the first nucleotide of the in vitro capped 45S preribosomal RNA. The sequence of 1,211 nucleotides surrounding the initiation site was determined. The sequenced region was found to consist of 75% G and C and to contain a number of short direct and inverted repeats and palindromes. By comparison of the corresponding initiation regions of three mammalian species, several conserved sequences were found upstream and downstream from the transcription starting point. Two short A + T-rich sequences are present on human, mouse, and rat ribosomal RNA genes between the initiation site and 40 nucleotides upstream, and a C + T cluster is located at a position around -60. At and downstream from the initiation site, a common sequence, T-AG-C-T-G-A-C-A-C-G-C-T-G-T-C-C-T-CT-T, was found in the three genes from position -1 through +18. The strong conservation of these sequences suggests their functional significance in rDNA. The S1 nuclease protection experiments with cloned rDNA fragments indicated the presence in human 45S RNA of molecules several hundred nucleotides shorter than the supposed primary transcript. The first 19 nucleotides of these molecules appear identical--except for one mismatch--to the nucleotide sequence of the 5' end of a supposed early processing product of the mouse 45S RNA. Images PMID:6954460

  10. EvalMSA: A Program to Evaluate Multiple Sequence Alignments and Detect Outliers.

    PubMed

    Chiner-Oms, Alvaro; González-Candelas, Fernando

    2016-01-01

    We present EvalMSA, a software tool for evaluating and detecting outliers in multiple sequence alignments (MSAs). This tool allows the identification of divergent sequences in MSAs by scoring the contribution of each row in the alignment to its quality using a sum-of-pair-based method and additional analyses. Our main goal is to provide users with objective data in order to take informed decisions about the relevance and/or pertinence of including/retaining a particular sequence in an MSA. EvalMSA is written in standard Perl and also uses some routines from the statistical language R. Therefore, it is necessary to install the R-base package in order to get full functionality. Binary packages are freely available from http://sourceforge.net/projects/evalmsa/for Linux and Windows.

  11. EvalMSA: A Program to Evaluate Multiple Sequence Alignments and Detect Outliers

    PubMed Central

    Chiner-Oms, Alvaro; González-Candelas, Fernando

    2016-01-01

    We present EvalMSA, a software tool for evaluating and detecting outliers in multiple sequence alignments (MSAs). This tool allows the identification of divergent sequences in MSAs by scoring the contribution of each row in the alignment to its quality using a sum-of-pair-based method and additional analyses. Our main goal is to provide users with objective data in order to take informed decisions about the relevance and/or pertinence of including/retaining a particular sequence in an MSA. EvalMSA is written in standard Perl and also uses some routines from the statistical language R. Therefore, it is necessary to install the R-base package in order to get full functionality. Binary packages are freely available from http://sourceforge.net/projects/evalmsa/for Linux and Windows. PMID:27920488

  12. Evaluation of Ancestral Sequence Reconstruction Methods to Infer Nonstationary Patterns of Nucleotide Substitution.

    PubMed

    Matsumoto, Tomotaka; Akashi, Hiroshi; Yang, Ziheng

    2015-07-01

    Inference of gene sequences in ancestral species has been widely used to test hypotheses concerning the process of molecular sequence evolution. However, the approach may produce spurious results, mainly because using the single best reconstruction while ignoring the suboptimal ones creates systematic biases. Here we implement methods to correct for such biases and use computer simulation to evaluate their performance when the substitution process is nonstationary. The methods we evaluated include parsimony and likelihood using the single best reconstruction (SBR), averaging over reconstructions weighted by the posterior probabilities (AWP), and a new method called expected Markov counting (EMC) that produces maximum-likelihood estimates of substitution counts for any branch under a nonstationary Markov model. We simulated base composition evolution on a phylogeny for six species, with different selective pressures on G+C content among lineages, and compared the counts of nucleotide substitutions recorded during simulation with the inference by different methods. We found that large systematic biases resulted from (i) the use of parsimony or likelihood with SBR, (ii) the use of a stationary model when the substitution process is nonstationary, and (iii) the use of the Hasegawa-Kishino-Yano (HKY) model, which is too simple to adequately describe the substitution process. The nonstationary general time reversible (GTR) model, used with AWP or EMC, accurately recovered the substitution counts, even in cases of complex parameter fluctuations. We discuss model complexity and the compromise between bias and variance and suggest that the new methods may be useful for studying complex patterns of nucleotide substitution in large genomic data sets.

  13. A parallel approach of COFFEE objective function to multiple sequence alignment

    NASA Astrophysics Data System (ADS)

    Zafalon, G. F. D.; Visotaky, J. M. V.; Amorim, A. R.; Valêncio, C. R.; Neves, L. A.; de Souza, R. C. G.; Machado, J. M.

    2015-09-01

    The computational tools to assist genomic analyzes show even more necessary due to fast increasing of data amount available. With high computational costs of deterministic algorithms for sequence alignments, many works concentrate their efforts in the development of heuristic approaches to multiple sequence alignments. However, the selection of an approach, which offers solutions with good biological significance and feasible execution time, is a great challenge. Thus, this work aims to show the parallelization of the processing steps of MSA-GA tool using multithread paradigm in the execution of COFFEE objective function. The standard objective function implemented in the tool is the Weighted Sum of Pairs (WSP), which produces some distortions in the final alignments when sequences sets with low similarity are aligned. Then, in studies previously performed we implemented the COFFEE objective function in the tool to smooth these distortions. Although the nature of COFFEE objective function implies in the increasing of execution time, this approach presents points, which can be executed in parallel. With the improvements implemented in this work, we can verify the execution time of new approach is 24% faster than the sequential approach with COFFEE. Moreover, the COFFEE multithreaded approach is more efficient than WSP, because besides it is slightly fast, its biological results are better.

  14. The nucleotide sequence of cysteine transfer ribonucleic acid from baker's yeast. Identification of the products from partial degradation of the molecule and derivation of the complete sequence.

    PubMed Central

    Holness, N J; Atfield, G

    1976-01-01

    1. A series of large oligonucleotide fragments derived from tRNA Cys, were separated chromatographically and the sequence of each was deduced by examination of the products of digestion with pancreatic and T1 ribonucleases. 2. The location of the specific cleavage points in the nucleotide chain was similar to that produced by brief treatment with pancreatic ribonuclease. 3. The fragments could be arranged into two alternative sequences. The correct sequence was deduced by the sequential removal and identification of the first nine nucleotides from the 3'-end of the terminal half of the molecules. PMID:819006

  15. Greene SCPrimer: a rapid comprehensive tool for designing degenerate primers from multiple sequence alignments

    PubMed Central

    Jabado, Omar J.; Palacios, Gustavo; Kapoor, Vishal; Hui, Jeffrey; Renwick, Neil; Zhai, Junhui; Briese, Thomas; Lipkin, W. Ian

    2006-01-01

    Polymerase chain reaction (PCR) is widely applied in clinical and environmental microbiology. Primer design is key to the development of successful assays and is often performed manually by using multiple nucleic acid alignments. Few public software tools exist that allow comprehensive design of degenerate primers for large groups of related targets based on complex multiple sequence alignments. Here we present a method for designing such primers based on tree building followed by application of a set covering algorithm, and demonstrate its utility in compiling Multiplex PCR primer panels for detection and differentiation of viral pathogens. PMID:17135211

  16. Comparative Topological Analysis of Neuronal Arbors via Sequence Representation and Alignment

    NASA Astrophysics Data System (ADS)

    Gillette, Todd Aaron

    Neuronal morphology is a key mediator of neuronal function, defining the profile of connectivity and shaping signal integration and propagation. Reconstructing neurite processes is technically challenging and thus data has historically been relatively sparse. Data collection and curation along with more efficient and reliable data production methods provide opportunities for the application of informatics to find new relationships and more effectively explore the field. This dissertation presents a method for aiding the development of data production as well as a novel representation and set of analyses for extracting morphological patterns. The DIADEM Challenge was organized for the purposes of determining the state of the art in automated neuronal reconstruction and what existing challenges remained. As one of the co-organizers of the Challenge, I developed the DIADEM metric, a tool designed to measure the effectiveness of automated reconstruction algorithms by comparing resulting reconstructions to expert-produced gold standards and identifying errors of various types. It has been used in the DIADEM Challenge and in the testing of several algorithms since. Further, this dissertation describes a topological sequence representation of neuronal trees amenable to various forms of sequence analysis, notably motif analysis, global pairwise alignment, clustering, and multiple sequence alignment. Motif analysis of neuronal arbors shows a large difference in bifurcation type proportions between axons and dendrites, but that relatively simple growth mechanisms account for most higher order motifs. Pairwise global alignment of topological sequences, modified from traditional sequence alignment to preserve tree relationships, enabled cluster analysis which displayed strong correspondence with known cell classes by cell type, species, and brain region. Multiple alignment of sequences in selected clusters enabled the extraction of conserved features, revealing mouse

  17. Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.

  18. Nucleotide sequence of the capsid protein gene and 3' non-coding region of papaya mosaic virus RNA.

    PubMed

    Abouhaidar, M G

    1988-01-01

    The nucleotide sequences of cDNA clones corresponding to the 3' OH end of papaya mosaic virus RNA have been determined. The 3'-terminal sequence obtained was 900 nucleotides in length, excluding the poly(A) tail, and contained an open reading frame capable of giving rise to a protein of 214 amino acid residues with an Mr of 22930. This protein was identified as the viral capsid protein. The 3' non-coding region of PMV genome RNA was about 121 nucleotides long [excluding the poly(A) tail] and homologous to the complementary sequence of the non-coding region at the 5' end of PMV RNA. A long open reading frame was also found in the predicted 5' end region of the negative strand.

  19. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array

    PubMed Central

    Fuller, Carl W.; Kumar, Shiv; Porel, Mintu; Chien, Minchen; Bibillo, Arek; Stranges, P. Benjamin; Dorwart, Michael; Tao, Chuanjuan; Li, Zengmin; Guo, Wenjing; Shi, Shundi; Korenblum, Daniel; Trans, Andrew; Aguirre, Anne; Liu, Edward; Harada, Eric T.; Pollard, James; Bhat, Ashwini; Cech, Cynthia; Yang, Alexander; Arnold, Cleoma; Palla, Mirkó; Hovis, Jennifer; Chen, Roger; Morozova, Irina; Kalachikov, Sergey; Russo, James J.; Kasianowicz, John J.; Davis, Randy; Roever, Stefan; Church, George M.; Ju, Jingyue

    2016-01-01

    DNA sequencing by synthesis (SBS) offers a robust platform to decipher nucleic acid sequences. Recently, we reported a single-molecule nanopore-based SBS strategy that accurately distinguishes four bases by electronically detecting and differentiating four different polymer tags attached to the 5′-phosphate of the nucleotides during their incorporation into a growing DNA strand catalyzed by DNA polymerase. Further developing this approach, we report here the use of nucleotides tagged at the terminal phosphate with oligonucleotide-based polymers to perform nanopore SBS on an α-hemolysin nanopore array platform. We designed and synthesized several polymer-tagged nucleotides using tags that produce different electrical current blockade levels and verified they are active substrates for DNA polymerase. A highly processive DNA polymerase was conjugated to the nanopore, and the conjugates were complexed with primer/template DNA and inserted into lipid bilayers over individually addressable electrodes of the nanopore chip. When an incoming complementary-tagged nucleotide forms a tight ternary complex with the primer/template and polymerase, the tag enters the pore, and the current blockade level is measured. The levels displayed by the four nucleotides tagged with four different polymers captured in the nanopore in such ternary complexes were clearly distinguishable and sequence-specific, enabling continuous sequence determination during the polymerase reaction. Thus, real-time single-molecule electronic DNA sequencing data with single-base resolution were obtained. The use of these polymer-tagged nucleotides, combined with polymerase tethering to nanopores and multiplexed nanopore sensors, should lead to new high-throughput sequencing methods. PMID:27091962

  20. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array.

    PubMed

    Fuller, Carl W; Kumar, Shiv; Porel, Mintu; Chien, Minchen; Bibillo, Arek; Stranges, P Benjamin; Dorwart, Michael; Tao, Chuanjuan; Li, Zengmin; Guo, Wenjing; Shi, Shundi; Korenblum, Daniel; Trans, Andrew; Aguirre, Anne; Liu, Edward; Harada, Eric T; Pollard, James; Bhat, Ashwini; Cech, Cynthia; Yang, Alexander; Arnold, Cleoma; Palla, Mirkó; Hovis, Jennifer; Chen, Roger; Morozova, Irina; Kalachikov, Sergey; Russo, James J; Kasianowicz, John J; Davis, Randy; Roever, Stefan; Church, George M; Ju, Jingyue

    2016-05-10

    DNA sequencing by synthesis (SBS) offers a robust platform to decipher nucleic acid sequences. Recently, we reported a single-molecule nanopore-based SBS strategy that accurately distinguishes four bases by electronically detecting and differentiating four different polymer tags attached to the 5'-phosphate of the nucleotides during their incorporation into a growing DNA strand catalyzed by DNA polymerase. Further developing this approach, we report here the use of nucleotides tagged at the terminal phosphate with oligonucleotide-based polymers to perform nanopore SBS on an α-hemolysin nanopore array platform. We designed and synthesized several polymer-tagged nucleotides using tags that produce different electrical current blockade levels and verified they are active substrates for DNA polymerase. A highly processive DNA polymerase was conjugated to the nanopore, and the conjugates were complexed with primer/template DNA and inserted into lipid bilayers over individually addressable electrodes of the nanopore chip. When an incoming complementary-tagged nucleotide forms a tight ternary complex with the primer/template and polymerase, the tag enters the pore, and the current blockade level is measured. The levels displayed by the four nucleotides tagged with four different polymers captured in the nanopore in such ternary complexes were clearly distinguishable and sequence-specific, enabling continuous sequence determination during the polymerase reaction. Thus, real-time single-molecule electronic DNA sequencing data with single-base resolution were obtained. The use of these polymer-tagged nucleotides, combined with polymerase tethering to nanopores and multiplexed nanopore sensors, should lead to new high-throughput sequencing methods.

  1. Nucleotide sequence analysis and DNA hybridization studies of the ant(4')-IIa gene from Pseudomonas aeruginosa.

    PubMed Central

    Shaw, K J; Munayyer, H; Rather, P N; Hare, R S; Miller, G H

    1993-01-01

    The ant(4')-IIa gene was previously cloned from Pseudomonas aeruginosa on a 1.6-kb DNA fragment (G. A. Jacoby, M. J. Blaser, P. Santanam, H. Hächler, F. H. Kayser, R. S. Hare, and G. H. Miller, Antimicrob. Agents Chemother. 34:2381-2386, 1990). In the current study, the ant(4')-IIa gene was localized by gamma-delta mutagenesis. A region of approximately 600 nucleotides which contained the ant(4')-IIa gene was identified, and DNA sequence analysis revealed two overlapping open reading frames (ORFs) within this region. Northern (RNA) blot analysis demonstrated expression of both ORFs in P. aeruginosa; therefore, site-directed mutagenesis was used to identify the ORF which encodes the ant(4')-IIa gene. No homology was found between ant(4')-IIa and ant(4')-Ia DNA sequences. Hybridization experiments confirmed that the ant(4')-Ia probe hybridized only to gram-positive presumptive ANT(4')-I strains and that the ant(4')-IIa probe hybridized only to gram-negative strains presumed to carry ANT(4')-II. Seven gram-negative strains which had been classified as having ANT(4')-II resistance profiles did not hybridize with probes for either ant(4')-Ia or ant(4')-IIa, suggesting that at least one additional ant(4') gene may exist. The predicted amino-terminal sequences of the ANT(4')-Ia and ANT(4')-IIa proteins showed significant sequence similarity between residues 38 and 63 of the ANT(4')-Ia protein and residues 26 and 51 of the ANT(4')-IIa protein. PMID:8494365

  2. OrthoSelect: a web server for selecting orthologous gene alignments from EST sequences.

    PubMed

    Schreiber, Fabian; Wörheide, Gert; Morgenstern, Burkhard

    2009-07-01

    In the absence of whole genome sequences for many organisms, the use of expressed sequence tags (EST) offers an affordable approach for researchers conducting phylogenetic analyses to gain insight about the evolutionary history of organisms. Reliable alignments for phylogenomic analyses are based on orthologous gene sequences from different taxa. So far, researchers have not sufficiently tackled the problem of the completely automated construction of such datasets. Existing software tools are either semi-automated, covering only part of the necessary data processing, or implemented as a pipeline, requiring the installation and configuration of a cascade of external tools, which may be time-consuming and hard to manage. To simplify data set construction for phylogenomic studies, we set up a web server that uses our recently developed OrthoSelect approach. To the best of our knowledge, our web server is the first web-based EST analysis pipeline that allows the detection of orthologous gene sequences in EST libraries and outputs orthologous gene alignments. Additionally, OrthoSelect provides the user with an extensive results section that lists and visualizes all important results, such as annotations, data matrices for each gene/taxon and orthologous gene alignments. The web server is available at http://orthoselect.gobics.de.

  3. Rice pseudomolecule-anchored cross-species DNA sequence alignments indicate regional genomic variation in expressed sequence conservation

    PubMed Central

    Armstead, Ian; Huang, Lin; King, Julie; Ougham, Helen; Thomas, Howard; King, Ian

    2007-01-01

    Background Various methods have been developed to explore inter-genomic relationships among plant species. Here, we present a sequence similarity analysis based upon comparison of transcript-assembly and methylation-filtered databases from five plant species and physically anchored rice coding sequences. Results A comparison of the frequency of sequence alignments, determined by MegaBLAST, between rice coding sequences in TIGR pseudomolecules and annotations vs 4.0 and comprehensive transcript-assembly and methylation-filtered databases from Lolium perenne (ryegrass), Zea mays (maize), Hordeum vulgare (barley), Glycine max (soybean) and Arabidopsis thaliana (thale cress) was undertaken. Each rice pseudomolecule was divided into 10 segments, each containing 10% of the functionally annotated, expressed genes. This indicated a correlation between relative segment position in the rice genome and numbers of alignments with all the queried monocot and dicot plant databases. Colour-coded moving windows of 100 functionally annotated, expressed genes along each pseudomolecule were used to generate 'heat-maps'. These revealed consistent intra- and inter-pseudomolecule variation in the relative concentrations of significant alignments with the tested plant databases. Analysis of the annotations and derived putative expression patterns of rice genes from 'hot-spots' and 'cold-spots' within the heat maps indicated possible functional differences. A similar comparison relating to ancestral duplications of the rice genome indicated that duplications were often associated with 'hot-spots'. Conclusion Physical positions of expressed genes in the rice genome are correlated with the degree of conservation of similar sequences in the transcriptomes of other plant species. This relative conservation is associated with the distribution of different sized gene families and segmentally duplicated loci and may have functional and evolutionary implications. PMID:17708759

  4. Comparison of alignment software for genome-wide bisulphite sequence data

    PubMed Central

    Chatterjee, Aniruddha; Stockwell, Peter A.; Rodger, Euan J.; Morison, Ian M.

    2012-01-01

    Recent advances in next generation sequencing (NGS) technology now provide the opportunity to rapidly interrogate the methylation status of the genome. However, there are challenges in handling and interpretation of the methylation sequence data because of its large volume and the consequences of bisulphite modification. We sequenced reduced representation human genomes on the Illumina platform and efficiently mapped and visualized the data with different pipelines and software packages. We examined three pipelines for aligning bisulphite converted sequencing reads and compared their performance. We also comment on pre-processing and quality control of Illumina data. This comparison highlights differences in methods for NGS data processing and provides guidance to advance sequence-based methylation data analysis for molecular biologists. PMID:22344695

  5. Complete nucleotide sequence of rose yellow leaf virus, a new member of the family Tombusviridae.

    PubMed

    Mollov, Dimitre; Lockhart, Ben; Zlesak, David C

    2014-10-01

    The genome of the rose yellow leaf virus (RYLV) has been determined to be 3918 nucleotides long and to contain seven open reading frames (ORFs). ORF1 encodes a 27-kDa peptide (p27). ORF2 shares a common start codon with ORF1 and continues through the amber stop codon of p27 to encode an 87-kDa (p87) protein that has amino acid similarity to the RNA-dependent RNA polymerase (RdRp) of members of the family Tombusviridae. ORFs 3 and 4 have no significant amino acid similarity to known functional viral ORFs. ORF5 encodes a 6-kDa (p6) protein that has similarity to movement proteins of members of the Tombusviridae. ORF5A has no conventional start codon and overlaps with p6. A putative +1 frameshift mechanism allows p6 translation to continue through the stop codon and results in a 12-kDa protein that has high homology to the carmovirus p13 movement protein. The 37-kDa protein encoded by ORF6 has amino acid sequence similarity to coat proteins (CP) of members of the Tombusviridae. ORF7 has no significant amino acid similarity to known viral ORFs. Phylogenetic analysis of the RdRp amino acid sequences grouped RYLV together with the unclassified Rosa rugosa leaf distortion virus (RrLDV), pelargonium line pattern virus (PLPV), and pelargonium chlorotic ring pattern virus (PCRPV) in a distinct subgroup of the family Tombusviridae.

  6. Nucleotide sequence and phylogenetic analysis of a new potexvirus: Malva mosaic virus.

    PubMed

    Côté, Fabien; Paré, Christine; Majeau, Nathalie; Bolduc, Marilène; Leblanc, Eric; Bergeron, Michel G; Bernardy, Michael G; Leclerc, Denis

    2008-01-01

    A filamentous virus isolated from Malva neglecta Wallr. (common mallow) and propagated in Chenopodium quinoa was grown, cloned and the complete nucleotide sequence was determined (GenBank accession # DQ660333). The genomic RNA is 6858 nt in length and contains five major open reading frames (ORFs). The genomic organization is similar to members and the viral encoded proteins shared homology with the group of the Potexvirus genus in the Flexiviridae family. Phylogenetic analysis revealed a close relationship with narcissus mosaic virus (NMV), scallion virus X (ScaVX) and, to a lesser extent, to Alstroemeria virus X (AlsVX) and pepino mosaic virus (PepMV). A novel putative pseudoknot structure is predicted in the 3'-UTR of a subgroup of potexviruses, including this newly described virus. The consensus GAAAA sequence is detected at the 5'-end of the genomic RNA and experimental data strongly suggest that this motif could be a distinctive hallmark of this genus. The name Malva mosaic virus is proposed.

  7. Complete nucleotide sequence analysis of the norovirus GII.4 Sydney variant in South Korea.

    PubMed

    Park, Ji-Sun; Lee, Sung-Geun; Jin, Ji-Young; Cho, Han-Gil; Jheong, Weon-Hwa; Paik, Soon-Young

    2015-01-01

    Norovirus is the primary cause of acute gastroenteritis in individuals of all ages. In Australia, a new strain of norovirus (GII.4) was identified in March 2012, and this strain has spread rapidly around the world. In August 2012, this new GII.4 strain was identified in patients in South Korea. Therefore, to examine the characteristics of the epidemic norovirus GII.4 2012 variant in South Korea, we conducted KM272334 full-length genomic analysis. The genome of the gg-12-08-04 strain consisted of 7,558 bp and contained three open reading frame (ORF) composites throughout the whole genome: ORF1 (5,100 bp), ORF2 (1,623 bp), and ORF3 (807 bp). Phylogenetic analyses showed that gg-12-08-04 belonged to the GII.4 Sydney 2012 variant, sharing 98.92% nucleotide similarity with this variant strain. According to SimPlot analysis, the gg-12-08-04 strain was a recombinant strain with breakpoint at the ORF1/2 junction between Osaka 2007 and Apeldoorn 2008 strains. This study is the first report of the complete sequence of the GII.4 Sydney 2012 strain in South Korea. Therefore, this may represent the standard sequence of the norovirus GII.4 2012 variant in South Korea and could therefore be useful for the development of norovirus vaccines.

  8. Nucleotide sequence and transcriptional analysis of the type A2 neurotoxin gene cluster in Clostridium botulinum.

    PubMed

    Dineen, Sean S; Bradshaw, Marite; Karasek, Charles E; Johnson, Eric A

    2004-06-01

    The nucleotide sequences of the upstream regions of the botulinum neurotoxin type A1 (BoNT/A1) cluster of Clostridium botulinum strain NCTC 2916 and the BoNT/A2 cluster of strain Kyoto-F were determined. A novel gene, designated orfx3, was identified following the orfx2 gene in both clusters. ORF-X2 and ORF-X3 exhibit similarity to the BoNT cluster associated P-47 protein. The BoNT/A1 and BoNT/A2 clusters share a similar gene arrangement, but exhibit differences in the spacing between certain genes. Sequences with similarity to transposases were identified in these intergenic regions, suggesting that these differences arose from an ancestral insertion event. Transcriptional analysis of the BoNT/A2 cluster revealed that the genes of the cluster are primarily synthesized as three polycistronic transcripts. Two divergent polycistronic transcripts, one encoding the orfx1, orfx2, and orfx3 genes, the second encoding the p47, ntnh, and bont/a2 genes, are transcribed from conserved BoNT cluster promoters. The third polycistronic transcript, expressed at low levels, encodes the positive regulatory botR gene and the orfx genes. This is the first complete analysis of a botulinum toxin A2 cluster.

  9. Complete nucleotide sequence of a Spanish isolate of Parietaria mottle virus infecting tomato.

    PubMed

    Galipienso, Luis; Rubio, Luis; López, Luis; Soler, Salvador; Aramburu, José

    2009-10-01

    The genome of a Spanish isolate of Parietaria mottle virus (PMoV) obtained from tomato (strain PMoV-T) was completely sequenced. Protein motifs conserved for RNA viruses were identified: the p1 protein contained a metyltransferase domain in its N-terminal half and a triphosphatase/ helicase domain in its C-terminal half, the p2 protein contained a RNA polymerase domain; the 3a protein contained a RNA-binding domain with α-helix and β-sheet secondary structures. In addition, stem-loop structures with potential capacity of protein interactions were predicted on the untranslated terminal regions. Comparison with the other sequenced PMoV isolate showed nucleotide identities of 93, 90, and 93% for genomic RNAs 1, 2 and 3, respectively, and amino acid identities ranging from 88 to 97% for the different proteins. A cytosine deletion was detected at position 1,366 of RNA 3, involving a start codon for the coat protein (CP) gene different from the other PMoV isolate, resulting in a CP 16 amino acids shorter. Comparison of synonymous and nonsynonymous mutations revealed different selective constraints along the genome.

  10. Nucleotide sequence and structural organization of the human vasopressin pituitary receptor (V3) gene.

    PubMed

    René, P; Lenne, F; Ventura, M A; Bertagna, X; de Keyzer, Y

    2000-01-04

    In the pituitary, vasopressin triggers ACTH release through a specific receptor subtype, termed V3 or V1b. We cloned the V3 cDNA and showed that its expression was almost exclusive to pituitary corticotrophs and some corticotroph tumors. To study the determinants of this tissue specificity, we have now cloned the gene for the human (h) V3 receptor and characterized its structure. It is composed of two exons, spanning 10kb, with the coding region interrupted between transmembrane domains 6 and 7. We established that the transcription initiation site is located 498 nucleotides upstream of the initiator codon and showed that two polyadenylation sites may be used, while the most frequent is the most downstream. Sequence analysis of the promoter region showed no TATA box but identified consensus binding motifs for Sp1, CREB, and half sites of the estrogen receptor binding site. However comparison with another corticotroph-specific gene, proopiomelanocortin, did not identify common regulatory elements in the two promoters except for a short GC-rich region. Unexpectedly, hV3 gene analysis revealed that a formerly cloned 'artifactual' hV3 cDNA indeed corresponded to a spliced antisense transcript, overlapping the 5' part of the coding sequence in exon 1 and the promoter region. This transcript, hV3rev, was detected in normal pituitary and in many corticotroph tumors expressing hV3 sense mRNA and may therefore play a role in hV3 gene expression.

  11. A survey of chromosomal and nucleotide sequence variation in Drosophila miranda.

    PubMed Central

    Yi, Soojin; Bachtrog, Doris; Charlesworth, Brian

    2003-01-01

    There have recently been several studies of the evolution of Y chromosome degeneration and dosage compensation using the neo-sex chromosomes of Drosophila miranda as a model system. To understand these evolutionary processes more fully, it is necessary to document the general pattern of genetic variation in this species. Here we report a survey of chromosomal variation, as well as polymorphism and divergence data, for 12 nuclear genes of D. miranda. These genes exhibit varying levels of DNA sequence polymorphism. Compared to its well-studied sibling species D. pseudoobscura, D. miranda has much less nucleotide sequence variation, and the effective population size of this species is inferred to be several-fold lower. Nevertheless, it harbors a few inversion polymorphisms, one of which involves the neo-X chromosome. There is no convincing evidence for a recent population expansion in D. miranda, in contrast to D. pseudoobscura. The pattern of population subdivision previously observed for the X-linked gene period is not seen for the other loci, suggesting that there is no general population subdivision in D. miranda. However, data on an additional region of period confirm population subdivision for this gene, suggesting that local selection is operating at or near period to promote differentiation between populations. PMID:12930746

  12. Alignment editing and identification of consensus secondary structures for nucleic acid sequences: interactive use of dot matrix representations.

    PubMed Central

    Davis, J P; Janjić, N; Pribnow, D; Zichi, D A

    1995-01-01

    We present a computer-aided approach for identifying and aligning consensus secondary structure within a set of functionally related oligonucleotide sequences aligned by sequence. The method relies on visualization of secondary structure using a generalization of the dot matrix representation appropriate for consensus sequence data sets. An interactive computer program implementing such a visualization of consensus structure has been developed. The program allows for alignment editing, data and display filtering and various modes of base pair representation, including co-variation. The utility of this approach is demonstrated with four sample data sets derived from in vitro selection experiments and one data set comprising tRNA sequences. Images PMID:7501472

  13. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    SciTech Connect

    Ovacik, Meric A.; Androulakis, Ioannis P.

    2013-09-15

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy.

  14. A Probabilistic Model for Sequence Alignment with Context-Sensitive Indels

    NASA Astrophysics Data System (ADS)

    Hickey, Glenn; Blanchette, Mathieu

    Probabilistic approaches for sequence alignment are usually based on pair Hidden Markov Models (HMMs) or Stochastic Context Free Grammars (SCFGs). Recent studies have shown a significant correlation between the content of short indels and their flanking regions, which by definition cannot be modelled by the above two approaches. In this work, we present a context-sensitive indel model based on a pair Tree-Adjoining Grammar (TAG), along with accompanying algorithms for efficient alignment and parameter estimation. The increased precision and statistical power of this model is shown on simulated and real genomic data. As the cost of sequencing plummets, the usefulness of comparative analysis is becoming limited by alignment accuracy rather than data availability. Our results will therefore have an impact on any type of downstream comparative genomics analyses that rely on alignments. Fine-grained studies of small functional regions or disease markers, for example, could be significantly improved by our method. The implementation is available at http://www.mcb.mcgill.ca/~blanchem/software.html

  15. Nucleotide sequence of the 3'-noncoding region of alfalfa mosaic virus RNA 4 and its homology with the genomic RNAs.

    PubMed Central

    Koper-Zwarthoff, E C; Brederode, F T; Walstra, P; Bol, J F

    1979-01-01

    A 226-nucleotide fragment was derived from alfalfa mosaic virus RNA 4 (ALMV RNA 4), the subgenomic messenger for viral coat protein, and its sequence was deduced by in vitro labeling with polynucleotide kinase and application of RNA sequencing techniques. The fragment contains the 3'-terminal 45 nucleotides of the coat protein cistron and the complete 3'-noncoding region of 182 nucleotides. The total length of RNA 4 was calculated to be 881 nucleotides. AlMV RNAs 1, 2 and 3 were elongated with a 3'-terminal poly(A) stretch and subjected to sequence analysis by using a specific primer, reverse transcriptase and chain terminators. This revealed and extensive homology between the 3'-terminal 140 to 150 nucleotides of all four ALMV RNAs. Despite a number of base substitutions, the secondary structure of the homologous region is highly conserved. The observed homology indicates that, as with RNA 4, the sites with a high affinity for the viral coat protein are located at the 3'-termini of the genomic RNAs. Images PMID:537914

  16. A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery

    PubMed Central

    Yen, Ian E. H.; Lin, Xin; Zhang, Jiong; Ravikumar, Pradeep; Dhillon, Inderjit S.

    2016-01-01

    Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems. PMID:27559428

  17. RBT-GA: a novel metaheuristic for solving the multiple sequence alignment problem

    PubMed Central

    Taheri, Javid; Zomaya, Albert Y

    2009-01-01

    Background Multiple Sequence Alignment (MSA) has always been an active area of research in Bioinformatics. MSA is mainly focused on discovering biologically meaningful relationships among different sequences or proteins in order to investigate the underlying main characteristics/functions. This information is also used to generate phylogenetic trees. Results This paper presents a novel approach, namely RBT-GA, to solve the MSA problem using a hybrid solution methodology combining the Rubber Band Technique (RBT) and the Genetic Algorithm (GA) metaheuristic. RBT is inspired by the behavior of an elastic Rubber Band (RB) on a plate with several poles, which is analogues to locations in the input sequences that could potentially be biologically related. A GA attempts to mimic the evolutionary processes of life in order to locate optimal solutions in an often very complex landscape. RBT-GA is a population based optimization algorithm designed to find the optimal alignment for a set of input protein sequences. In this novel technique, each alignment answer is modeled as a chromosome consisting of several poles in the RBT framework. These poles resemble locations in the input sequences that are most likely to be correlated and/or biologically related. A GA-based optimization process improves these chromosomes gradually yielding a set of mostly optimal answers for the MSA problem. Conclusion RBT-GA is tested with one of the well-known benchmarks suites (BALiBASE 2.0) in this area. The obtained results show that the superiority of the proposed technique even in the case of formidable sequences. PMID:19594869

  18. rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison

    PubMed Central

    Hahn, Lars; Leimeister, Chris-André; Morgenstern, Burkhard

    2016-01-01

    Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don’t-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/ PMID:27760124

  19. BuddySuite: Command-line toolkits for manipulating sequences, alignments, and phylogenetic trees.

    PubMed

    Bond, Stephen R; Keat, Karl E; Barreira, Sofia N; Baxevanis, Andreas D

    2017-02-25

    The ability to manipulate sequence, alignment, and phylogenetic tree files has become an increasingly important skill in the life sciences, whether to generate summary information or to prepare data for further downstream analysis. The command line can be an extremely powerful environment for interacting with these resources, but only if the user has the appropriate general-purpose tools on hand. BuddySuite is a collection of four independent yet interrelated command-line toolkits that facilitate each step in the workflow of sequence discovery, curation, alignment, and phylogenetic reconstruction. Most common sequence, alignment, and tree file formats are automatically detected and parsed, and over 100 tools have been implemented for manipulating these data. The project has been engineered to easily accommodate the addition of new tools, it is written in the popular programming language Python, and is hosted on the Python Package Index and GitHub to maximize accessibility. Documentation for each BuddySuite tool, including usage examples, is available at http://tiny.cc/buddysuite wiki. All software is open source and freely available through http://research.nhgri.nih.gov/software/BuddySuite.

  20. Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks.

    PubMed Central

    Tatusov, R L; Altschul, S F; Koonin, E V

    1994-01-01

    We describe an approach to analyzing protein sequence databases that, starting from a single uncharacterized sequence or group of related sequences, generates blocks of conserved segments. The procedure involves iterative database scans with an evolving position-dependent weight matrix constructed from a coevolving set of aligned conserved segments. For each iteration, the expected distribution of matrix scores under a random model is used to set a cutoff score for the inclusion of a segment in the next iteration. This cutoff may be calculated to allow the chance inclusion of either a fixed number or a fixed proportion of false positive segments. With sufficiently high cutoff scores, the procedure converged for all alignment blocks studied, with varying numbers of iterations required. Different methods for calculating weight matrices from alignment blocks were compared. The most effective of those tested was a logarithm-of-odds, Bayesian-based approach that used prior residue probabilities calculated from a mixture of Dirichlet distributions. The procedure described was used to detect novel conserved motifs of potential biological importance. Images PMID:7991589

  1. Objective method for estimating asymptotic parameters, with an application to sequence alignment

    NASA Astrophysics Data System (ADS)

    Sheetlin, Sergey; Park, Yonil; Spouge, John L.

    2011-09-01

    Sequence alignment is an indispensable computational tool in modern molecular biology. The model underlying biological sequence alignment is of interest to physicists because it approximates the statistical mechanics of DNA and protein annealing, while bearing an intimate relationship to models of directed polymers in random media. Recent methods for determining the statistics of random sequence alignments have reduced the computation time to less than 1 s, opening up some interesting possibilities for online computation with biological search engines. Before implementation, however, the methods required an objective technique for computing regression coefficients pertinent to an asymptotic regime. Typically, physicists estimate parameters pertinent to an asymptotic regime subjectively: They eyeball their data; estimate the asymptotic regime where the regression model holds with reasonable accuracy; and then regress data only within the estimated asymptotic regime. Our publicly available computer program arrp replaces the subjective assessment of the asymptotic regime with an objective change-point detection method, increasing confidence in the scientific objectivity of the parameter estimates. Asymptotic regression has potential applications across most of physics.

  2. Molecular cloning, nucleotide sequencing, and expression of genes encoding alcohol dehydrogenases from the thermophile Thermoanaerobacter brockii and the mesophile Clostridium beijerinckii.

    PubMed

    Peretz, M; Bogin, O; Tel-Or, S; Cohen, A; Li, G; Chen, J S; Burstein, Y

    1997-08-01

    Proteins play a pivotal role in thermophily. Comparing the molecular properties of homologous proteins from thermophilic and mesophilic bacteria is important for understanding the mechanisms of microbial adaptation to extreme environments. The thermophile Thermoanaerobacter (Thermoanaerobium) brockii and the mesophile Clostridium beijerinckii contain an NADP(H)-linked, zinc-containing secondary alcohol dehydrogenase (TBADH and CBADH) showing a similarly broad substrate range. The structural genes encoding the TBADH and the CBADH were cloned, sequenced, and highly expressed in Escherichia coli. The coding sequences of the TB adh and the CB adh genes are, respectively, 1056 and 1053 nucleotides long. The TB adh gene encoded an amino acid sequence identical to that of the purified TBADH. Alignment of the deduced amino acid sequences of the TB and CB adh genes showed a 76% identity and a 86% similarity, and the two genes had a similar preference for codons with A or T in the third position. Multiple sequence alignment of ADHs from different sources revealed that two (Cys-46 and His-67) of the three ligands for the catalytic Zn atom of the horse-liver ADH are preserved in TBADH and CBADH. Both the TBADH and CBADH were homotetramers. The substrate specificities and thermostabilities of the TBADH and CBADH expressed inE. coli were identical to those of the enzymes isolated from T. brockii and C. beijerinckii, respectively. A comparison of the amino acid composition of the two ADHs suggests that the presence of eight additional proline residues in TBADH than in CBADH and the exchange of hydrophilic and large hydrophobic residues in CBADH for the small hydrophobic amino acids Pro, Ala, and Val in TBADH might contribute to the higher thermostability of the T. brockii enzyme.

  3. Distant homology detection using a LEngth and STructure-based sequence Alignment Tool (LESTAT).

    PubMed

    Lee, Marianne M; Bundschuh, Ralf; Chan, Michael K

    2008-05-15

    A new machine learning algorithm, LESTAT (LEngth and STructure-based sequence Alignment Tool) has been developed for detecting protein homologs having low-sequence identity. LESTAT is an iterative profile-based method that runs without reliance on a predefined library and incorporates several novel features that enhance its ability to identify remote sequences. To overcome the inherent bias associated with a single starting model, LESTAT utilizes three structural homologs to create a profile consisting of structurally conserved positions and block separation distances. Subsequent profiles are refined iteratively using sequence information obtained from previous cycles. Additionally, the refinement process incorporates a "lock-in" feature to retain the high-scoring sequences involved in previous alignments for subsequent model building and an enhancement factor to complement the weighting scheme used to build the position specific scoring matrix. A comparison of the performance of LESTAT against PSI-BLAST for seven systems reveals that LESTAT exhibits increased sensitivity and specificity over PSI-BLAST in six of these systems, based on the number of true homologs detected and the number of families these homologs covered. Notably, many of the hits identified are unique to each method, presumably resulting from the distinct differences in the two approaches. Taken together, these findings suggest that LESTAT is a useful complementary method to PSI-BLAST in the detection of distant homologs.

  4. Complete Nucleotide Sequences and Genome Organization of Two Pepper Mild Mottle Virus Isolates from Capsicum annuum in South Korea

    PubMed Central

    Choi, Seung-Kook; Choi, Gug-Seoun; Kwon, Sun-Jung

    2016-01-01

    The complete genome sequences of pepper mild mottle virus (PMMoV)-P2 and -P3 were determined by the Sanger sequencing method. Although PMMoV-P2 and PMMoV-P3 have different pathogenicity in some pepper cultivars, the complete genome sequences of PMMoV-P2 and -P3 are composed of 6,356 nucleotides (nt). In this study, we report the complete genome sequences and genome organization of PMMoV-P2 and -P3 isolates from pepper species in South Korea. PMID:27198033

  5. [Polymorphism of DNA nucleotide sequence as a source of enhancement of the discrimination potential of the STR-markers].

    PubMed

    Zemskova, E Yu; Timoshenko, T V; Leonov, S N; Ivanov, P L

    2016-01-01

    The objective of the present pilot investigation was to reveal and to study polymorphism of nucleotide sequence in the alleles of STR loci of human autosomal DNA with special reference to the role of this phenomenon as a source of the differences between homonymous allelic variants. The secondary objection was to evaluate the possibility of using the data thus obtained for the enhancement of the informative value of the forensic medical genotyping of STR loci by means of identification of single nucleotide polymorphisms (SNP) for the purpose of extending their allelic spectrum. The methodological basis of the study was constituted by the comprehensive amplified fragment length polymorphism (AFLP) analysis and amplified fragment sequence polymorphisms (AFSP) analysis of DNA with the use of the PLEX-ID^TM analytical mass-spectrometry platform (Abbot Molecular, USA). The study has demonstrated that polymorphism of DNA nucleotide sequence can be regarded as the possible source of enhancement of the discriminating potential of STR markers. It means that the analysis of polymorphism of DNA nucleotide sequence for genotyping AFLP-type markers of chromosomal DNA can considerably increase the effectiveness of their application as individualizing markers for the purpose of molecular genetic expertises.

  6. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... is DNA, RNA, or PRT (protein). If a nucleotide sequence contains both DNA and RNA fragments, the type shall be “DNA.” In addition, the combined DNA/RNA molecule shall be further described in the to feature... combined DNA/RNA” Name/Key Provide appropriate identifier for feature, preferably from WIPO Standard...

  7. Complete nucleotide sequence of a begomovirus associated with satellites molecules infecting a new host Tagetes patula in India.

    PubMed

    Marwal, Avinash; Sahu, Anurag Kumar; Choudhary, Devendra Kumar; Gaur, R K

    2013-08-01

    In the year 2012 leaf curl disease was observed on Marigold (Tagetes patula) in Lakshmangrh, Sikar province of India. Affected plants were severely stunted with apical leaf curl and crinkled leaves, symptoms typical of begomovirus infection. This is the first report of complete nucleotide sequence of a begomovirus associated with satellites molecules infecting a new host Tagetes patula in India.

  8. A high-density simple sequence repeat and single nucleotide polymorphism genetic map of the tetraploid cotton genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cotton genome complexity was investigated with a saturated molecular genetic map that combined several sets of microsatellites or simple sequence repeats (SSR) and the first major public set of single nucleotide polymorphism (SNP) markers in cotton genomes (Gossypium spp.), and that was constructed ...

  9. Comparing genotyping-by-sequencing and Single Nucleotide Polymorphism chip genotyping in Quantitive Trait Loci mapping in wheat

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Array- or chip-based single nucleotide polymorphism (SNP) markers are widely used in genomic studies because of their abundance in a genome and cost less per data point compared to older marker technologies. Genotyping by sequencing (GBS), a relatively newer approach of genotyping, suggests equal or...

  10. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... is DNA, RNA, or PRT (protein). If a nucleotide sequence contains both DNA and RNA fragments, the type shall be “DNA.” In addition, the combined DNA/RNA molecule shall be further described in the to feature... combined DNA/RNA” Name/Key Provide appropriate identifier for feature, preferably from WIPO Standard...

  11. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... is DNA, RNA, or PRT (protein). If a nucleotide sequence contains both DNA and RNA fragments, the type shall be “DNA.” In addition, the combined DNA/RNA molecule shall be further described in the to feature... combined DNA/RNA” Name/Key Provide appropriate identifier for feature, preferably from WIPO Standard...

  12. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... is DNA, RNA, or PRT (protein). If a nucleotide sequence contains both DNA and RNA fragments, the type shall be “DNA.” In addition, the combined DNA/RNA molecule shall be further described in the to feature... combined DNA/RNA” Name/Key Provide appropriate identifier for feature, preferably from WIPO Standard...

  13. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... is DNA, RNA, or PRT (protein). If a nucleotide sequence contains both DNA and RNA fragments, the type shall be “DNA.” In addition, the combined DNA/RNA molecule shall be further described in the to feature... combined DNA/RNA” Name/Key Provide appropriate identifier for feature, preferably from WIPO Standard...

  14. Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

    PubMed

    Borrayo, Ernesto; Mendizabal-Ruiz, E Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P; Morales, J Alejandro

    2014-01-01

    Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

  15. Genomic Signal Processing Methods for Computation of Alignment-Free Distances from DNA Sequences

    PubMed Central

    Borrayo, Ernesto; Mendizabal-Ruiz, E. Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P.; Morales, J. Alejandro

    2014-01-01

    Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments. PMID:25393409

  16. Molecular cloning and nucleotide sequence of a transforming gene detected by transfection of chicken B-cell lymphoma DNA

    NASA Astrophysics Data System (ADS)

    Goubin, Gerard; Goldman, Debra S.; Luce, Judith; Neiman, Paul E.; Cooper, Geoffrey M.

    1983-03-01

    A transforming gene detected by transfection of chicken B-cell lymphoma DNA has been isolated by molecular cloning. It is homologous to a conserved family of sequences present in normal chicken and human DNAs but is not related to transforming genes of acutely transforming retroviruses. The nucleotide sequence of the cloned transforming gene suggests that it encodes a protein that is partially homologous to the amino terminus of transferrin and related proteins although only about one tenth the size of transferrin.

  17. [Molecular phylogenetic analysis of the genus Abies (Pinaceae) based on the nucleotide sequence of chloroplast DNA].

    PubMed

    Semerikova, S A; Semerikov, V L

    2014-01-01

    A phylogenetic study of firs (Abies Mill.) was conducted using nucleotide sequences of several chloroplast DNA regions with a total length of 5580 bp. The analysis included 37 taxa, which represented the main evolutionary lineages of the genus, and Keteleeria daviana. According to phylogenetic reconstruction the Abies species were subdivided into six main groups, generally corresponding to their geographic distribution. The phylogenetic tree had three basal clades. All of these clades contained American species, and only one of them contained Eurasian species. The divergence time calibrations, based on paleobotanical data and the chloroplast DNA mutation rate estimates in Pinaceae, produced similar results..The age of diversification among the clades of the present-day Abies was estimated as the end of the Oligocene-beginning of Miocene. The age of the separation of Mediterranean firs from the Asian-North American branch corresponds to the Miocene. The age of diversification within the young groups of Mediterranean, Asian, and boreal American firs (A. lasiocarpa, A. balsamea, A. fraseri) was estimated as the Pliocene-Pleistocene. Based on the phylogenetic reconstruction obtained, the most plausible biogeographic scenarios were suggested. It is noted that the existing systematic classification of the genus Abies strongly contradicts with phylogenetic reconstruction and requires revision.

  18. Nucleotide sequence of a lysine tRNA from Bacillus subtilis.

    PubMed Central

    Yamada, Y; Ishikura, H

    1977-01-01

    A lysine tRNA (tRNA1Lys) was purified from Bacillus subtilis W168 by a consecutive use of several column chromatographic systems. The nucleotide sequence was determined to be pG-A-G-C-C-A-U-U-A-G-C-U-C-A-G-U-D-G-G-D-A-G-A-G-C-A-U-C-U-G-A-C-U-U(U*)-U-U-K-A-psi-C-A-G-A-G-G-m7G(G)-U-C-G-A-A-G-G-T-psi-C-G-A-G-U-C-C-U-U-C-A-U-G-G-C-U-C-A-C-C-AOH, where K and U* are unidentified nucleosides. The nucleosides of U34 and m7G46 were partially substituted with U* and G, respectively. The binding ability of lysyl-tRNA1Lys to Escherichia coli ribosomes was stimulated with ApApA as well as ApApG. PMID:414208

  19. Whole-genome sequencing identifies genomic heterogeneity at a nucleotide and chromosomal level in bladder cancer.

    PubMed

    Morrison, Carl D; Liu, Pengyuan; Woloszynska-Read, Anna; Zhang, Jianmin; Luo, Wei; Qin, Maochun; Bshara, Wiam; Conroy, Jeffrey M; Sabatini, Linda; Vedell, Peter; Xiong, Donghai; Liu, Song; Wang, Jianmin; Shen, He; Li, Yinwei; Omilian, Angela R; Hill, Annette; Head, Karen; Guru, Khurshid; Kunnev, Dimiter; Leach, Robert; Eng, Kevin H; Darlak, Christopher; Hoeflich, Christopher; Veeranki, Srividya; Glenn, Sean; You, Ming; Pruitt, Steven C; Johnson, Candace S; Trump, Donald L

    2014-02-11

    Using complete genome analysis, we sequenced five bladder tumors accrued from patients with muscle-invasive transitional cell carcinoma of the urinary bladder (TCC-UB) and identified a spectrum of genomic aberrations. In three tumors, complex genotype changes were noted. All three had tumor protein p53 mutations and a relatively large number of single-nucleotide variants (SNVs; average of 11.2 per megabase), structural variants (SVs; average of 46), or both. This group was best characterized by chromothripsis and the presence of subclonal populations of neoplastic cells or intratumoral mutational heterogeneity. Here, we provide evidence that the process of chromothripsis in TCC-UB is mediated by nonhomologous end-joining using kilobase, rather than megabase, fragments of DNA, which we refer to as "stitchers," to repair this process. We postulate that a potential unifying theme among tumors with the more complex genotype group is a defective replication-licensing complex. A second group (two bladder tumors) had no chromothripsis, and a simpler genotype, WT tumor protein p53, had relatively few SNVs (average of 5.9 per megabase) and only a single SV. There was no evidence of a subclonal population of neoplastic cells. In this group, we used a preclinical model of bladder carcinoma cell lines to study a unique SV (translocation and amplification) of the gene glutamate receptor ionotropic N-methyl D-aspertate as a potential new therapeutic target in bladder cancer.

  20. Whole-genome sequencing identifies genomic heterogeneity at a nucleotide and chromosomal level in bladder cancer

    PubMed Central

    Morrison, Carl D.; Liu, Pengyuan; Woloszynska-Read, Anna; Zhang, Jianmin; Luo, Wei; Qin, Maochun; Bshara, Wiam; Conroy, Jeffrey M.; Sabatini, Linda; Vedell, Peter; Xiong, Donghai; Liu, Song; Wang, Jianmin; Shen, He; Li, Yinwei; Omilian, Angela R.; Hill, Annette; Head, Karen; Guru, Khurshid; Kunnev, Dimiter; Leach, Robert; Eng, Kevin H.; Darlak, Christopher; Hoeflich, Christopher; Veeranki, Srividya; Glenn, Sean; You, Ming; Pruitt, Steven C.; Johnson, Candace S.; Trump, Donald L.

    2014-01-01

    Using complete genome analysis, we sequenced five bladder tumors accrued from patients with muscle-invasive transitional cell carcinoma of the urinary bladder (TCC-UB) and identified a spectrum of genomic aberrations. In three tumors, complex genotype changes were noted. All three had tumor protein p53 mutations and a relatively large number of single-nucleotide variants (SNVs; average of 11.2 per megabase), structural variants (SVs; average of 46), or both. This group was best characterized by chromothripsis and the presence of subclonal populations of neoplastic cells or intratumoral mutational heterogeneity. Here, we provide evidence that the process of chromothripsis in TCC-UB is mediated by nonhomologous end-joining using kilobase, rather than megabase, fragments of DNA, which we refer to as “stitchers,” to repair this process. We postulate that a potential unifying theme among tumors with the more complex genotype group is a defective replication–licensing complex. A second group (two bladder tumors) had no chromothripsis, and a simpler genotype, WT tumor protein p53, had relatively few SNVs (average of 5.9 per megabase) and only a single SV. There was no evidence of a subclonal population of neoplastic cells. In this group, we used a preclinical model of bladder carcinoma cell lines to study a unique SV (translocation and amplification) of the gene glutamate receptor ionotropic N-methyl D-aspertate as a potential new therapeutic target in bladder cancer. PMID:24469795

  1. Quantitative theory of entropic forces acting on constrained nucleotide sequences applied to viruses.

    PubMed

    Greenbaum, Benjamin D; Cocco, Simona; Levine, Arnold J; Monasson, Rémi

    2014-04-01

    We outline a theory to quantify the interplay of entropic and selective forces on nucleotide organization and apply it to the genomes of single-stranded RNA viruses. We quantify these forces as intensive variables that can easily be compared between sequences, outline a computationally efficient transfer-matrix method for their calculation, and apply this method to influenza and HIV viruses. We find viruses altering their dinucleotide motif use under selective forces, with these forces on CpG dinucleotides growing stronger in influenza the longer it replicates in humans. For a subset of genes in the human genome, many involved in antiviral innate immunity, the forces acting on CpG dinucleotides are even greater than the forces observed in viruses, suggesting that both effects are in response to similar selective forces involving the innate immune system. We further find that the dynamics of entropic forces balancing selective forces can be used to predict how long it will take a virus to adapt to a new host, and that it would take H1N1 several centuries to adapt to humans from birds, typically contributing many of its synonymous substitutions to the forcible removal of CpG dinucleotides. By examining the probability landscape of dinucleotide motifs, we predict where motifs are likely to appear using only a single-force parameter and uncover the localization of UpU motifs in HIV. Essentially, we extend the natural language and concepts of statistical physics, such as entropy and conjugated forces, to understanding viral sequences and, more generally, constrained genome evolution.

  2. Postzygotic single-nucleotide mosaicisms in whole-genome sequences of clinically unremarkable individuals

    PubMed Central

    Huang, August Y; Xu, Xiaojing; Ye, Adam Y; Wu, Qixi; Yan, Linlin; Zhao, Boxun; Yang, Xiaoxu; He, Yao; Wang, Sheng; Zhang, Zheng; Gu, Bowen; Zhao, Han-Qing; Wang, Meng; Gao, Hua; Gao, Ge; Zhang, Zhichao; Yang, Xiaoling; Wu, Xiru; Zhang, Yuehua; Wei, Liping

    2014-01-01

    Postzygotic single-nucleotide mutations (pSNMs) have been studied in cancer and a few other overgrowth human disorders at whole-genome scale and found to play critical roles. However, in clinically unremarkable individuals, pSNMs have never been identified at whole-genome scale largely due to technical difficulties and lack of matched control tissue samples, and thus the genome-wide characteristics of pSNMs remain unknown. We developed a new Bayesian-based mosaic genotyper and a series of effective error filters, using which we were able to identify 17 SNM sites from ∼80× whole-genome sequencing of peripheral blood DNAs from three clinically unremarkable adults. The pSNMs were thoroughly validated using pyrosequencing, Sanger sequencing of individual cloned fragments, and multiplex ligation-dependent probe amplification. The mutant allele fraction ranged from 5%-31%. We found that C→T and C→A were the predominant types of postzygotic mutations, similar to the somatic mutation profile in tumor tissues. Simulation data showed that the overall mutation rate was an order of magnitude lower than that in cancer. We detected varied allele fractions of the pSNMs among multiple samples obtained from the same individuals, including blood, saliva, hair follicle, buccal mucosa, urine, and semen samples, indicating that pSNMs could affect multiple sources of somatic cells as well as germ cells. Two of the adults have children who were diagnosed with Dravet syndrome. We identified two non-synonymous pSNMs in SCN1A, a causal gene for Dravet syndrome, from these two unrelated adults and found that the mutant alleles were transmitted to their children, highlighting the clinical importance of detecting pSNMs in genetic counseling. PMID:25312340

  3. Evaluation of the flanking nucleotide sequences of sarcomeric hypertrophic cardiomyopathy substitution mutations.

    PubMed

    Meurs, Kathryn M; Mealey, Katrina L

    2008-07-03

    Hypertrophic cardiomyopathy (HCM) is a familial myocardial disease with a prevalence of 1 in 500. More than 400 causative mutations have been identified in 13 sarcomeric and myofilament related genes, 350 of these are substitution mutations within eight sarcomeric genes. Within a population, examples of recurring identical disease causing mutations that appear to have arisen independently have been noted as well as those that appear to have been inherited from a common ancestor. The large number of novel HCM mutations could suggest a mechanism of increased mutability within the sarcomeric genes. The objective of this study was to evaluate the most commonly reported HCM genes, beta myosin heavy chain (MYH7), myosin binding protein C, troponin I, troponin T, cardiac regulatory myosin light chain, cardiac essential myosin light chain, alpha tropomyosin and cardiac alpha-actin for sequence patterns surrounding the substitution mutations that may suggest a mechanism of increased mutability. The mutations as well as the 10 flanking nucleotides were evaluated for frequency of di-, tri- and tetranucleotides containing the mutation as well as for the presence of certain tri- and tetranculeotide motifs. The most common substitutions were guanine (G) to adenine (A) and cytosine (C) to thymidine (T). The CG dinucleotide had a significantly higher relative mutability than any other dinucleotide (p<0.05). The relative mutability of each possible trinucleotide and tetranucleotide sequence containing the mutation was calculated; none were at a statistically higher frequency than the others. The large number of G to A and C to T mutations as well as the relative mutability of CG may suggest that deamination of methylated CpG is an important mechanism for mutation development in at least some of these cardiac genes.

  4. Nucleotide sequence and functional analysis of the luxE gene encoding acyl-protein synthetase of the lux operon from Photobacterium leiognathi.

    PubMed

    Lin, J W; Chao, Y F; Weng, S F

    1996-11-21

    Nucleotide sequence of the luxE gene GenBank Accession No. U66407 from Photobacterium leiognathi PL741 has been determined, and the amino acid sequence of acyl-protein synthetase encoded by the luxE gene is deduced. Nucleotide sequence reveals that the luxE gene encodes acyl-protein synthetase, which is a component of the fatty acid reductase complex that is responsible for converting fatty acid to aldehyde as substrate in the luciferase-catalyzed bioluminescence reaction. The acyl-protein synthetase encoded by the luxE gene has a calculated M, 43,128 and comprises 373 amino acid residues. Alignment and comparison of acyl-protein synthetases from P. leiognathi, P. phosphoreum, Vibrio fischeri, V. harveyi and Xenorhabdus luminescens shows that they are homologous; there is 75.5% homologous (44.2% identity and 31.3% similarity) among these species. Functional analysis illustrates that the specific segment sequence lying before or in the luxE gene might from potential loops omega o omega e1, omega e2 as mRNA stability loop and/or for sub-regulation by alternative modulation in the lux operon. The gene order of the luxE gene in the lux and the lum operons is<--ter-lumQ-lumP-R&R-luxC-luxD-luxA-luxB -luxN-luxE-->(R&R: regulatory region; ter; transcriptional terminator), whereas the R&R is the regulatory region for the lum and the lux operons, and ter is the transcriptional terminator for the lum operon.

  5. Full length nucleotide sequences of 30 common SLC44A2 alleles encoding human neutrophil antigen-3 (HNA-3)

    PubMed Central

    Chen, Qing; Srivastava, Kshitij; Ardinski, Stefanie C.; Lam, Kevin; Huvard, Michael J.; Schmid, Pirmin; Flegel, Willy A.

    2015-01-01

    Background HNA-3a alloantibodies can cause severe transfusion-related acute lung injury (TRALI). The frequency of the single nucleotide polymorphisms (SNPs) indicative of the two clinically relevant HNA-3a/b antigens are known in many populations. In the present study, we determined the full length nucleotide sequence of common SLC44A2 alleles encoding the choline transporter-like protein-2 (CTL2) that harbors HNA-3a/b antigens. Study design and methods A method was devised to determine the full length coding sequence and adjacent intron sequences from genomic DNA by 8 polymerase chain reaction (PCR) amplifications covering all 22 SLC44A2 exons. Samples from 200 African American, 96 Caucasian, 2 Hispanic and 4 Asian blood donors were analyzed. We developed a decision tree to determine alleles (confirmed haplotypes) from the genotype data. Results A total of 10 SNPs were detected in the SLC44A2 coding sequence. The non-coding sequences harbored an additional 28 SNPs (1 in the 5’-untranslated region (UTR); 23 in the introns; and 4 in the 3’-UTR). No SNP indicative of a non-functional allele was detected. The nucleotide sequences for 30 SLC44A2 alleles (haplotypes) were confirmed. There may be 66 haplotypes among the 604 chromosomes screened. Conclusions We found 38 SNPs, including 1 novel SNP, in 8192 nucleotides covering the coding sequence of the SLC44A2 gene among 302 blood donors. Population frequencies of these SNPs were established for African Americans and Caucasians. Because alleles encoding HNA-3b are more common than non-functional SLC44A2 alleles, we confirmed our previous postulate that African American donors are less likely to form HNA-3a antibodies compared to Caucasians. PMID:26437811

  6. Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences

    PubMed Central

    Liu, George E; Matukumalli, Lakshmi K; Sonstegard, Tad S; Shade, Larry L; Van Tassell, Curtis P

    2006-01-01

    Background Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. Results Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence) were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site) for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9) change/site/year) was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9) change/site/year) was approximately half of the overall rate (1.9–2.0 × 10(-9) change/site/year). Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. Conclusion This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies. PMID:16759380

  7. A unified statistical model of protein multiple sequence alignment integrating direct coupling and insertions

    PubMed Central

    Kinjo, Akira R.

    2016-01-01

    The multiple sequence alignment (MSA) of a protein family provides a wealth of information in terms of the conservation pattern of amino acid residues not only at each alignment site but also between distant sites. In order to statistically model the MSA incorporating both short-range and long-range correlations as well as insertions, I have derived a lattice gas model of the MSA based on the principle of maximum entropy. The partition function, obtained by the transfer matrix method with a mean-field approximation, accounts for all possible alignments with all possible sequences. The model parameters for short-range and long-range interactions were determined by a self-consistent condition and by a Gaussian approximation, respectively. Using this model with and without long-range interactions, I analyzed the globin and V-set domains by increasing the “temperature” and by “mutating” a site. The correlations between residue conservation and various measures of the system’s stability indicate that the long-range interactions make the conservation pattern more specific to the structure, and increasingly stabilize better conserved residues. PMID:27924257

  8. A horizontal alignment tool for numerical trend discovery in sequence data: application to protein hydropathy.

    PubMed

    Hadzipasic, Omar; Wrabl, James O; Hilser, Vincent J

    2013-01-01

    An algorithm is presented that returns the optimal pairwise gapped alignment of two sets of signed numerical sequence values. One distinguishing feature of this algorithm is a flexible comparison engine (based on both relative shape and absolute similarity measures) that does not rely on explicit gap penalties. Additionally, an empirical probability model is developed to estimate the significance of the returned alignment with respect to randomized data. The algorithm's utility for biological hypothesis formulation is demonstrated with test cases including database search and pairwise alignment of protein hydropathy. However, the algorithm and probability model could possibly be extended to accommodate other diverse types of protein or nucleic acid data, including positional thermodynamic stability and mRNA translation efficiency. The algorithm requires only numerical values as input and will readily compare data other than protein hydropathy. The tool is therefore expected to complement, rather than replace, existing sequence and structure based tools and may inform medical discovery, as exemplified by proposed similarity between a chlamydial ORFan protein and bacterial colicin pore-forming domain. The source code, documentation, and a basic web-server application are available.

  9. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine

    PubMed Central

    Ye, Hao; Meehan, Joe; Tong, Weida; Hong, Huixiao

    2015-01-01

    Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS) is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants. PMID:26610555

  10. The Bryopsis hypnoides Plastid Genome: Multimeric Forms and Complete Nucleotide Sequence

    PubMed Central

    Tian, Chao; Wang, Guangce; Niu, Jiangfeng; Pan, Guanghua; Hu, Songnian

    2011-01-01

    Background Bryopsis hypnoides Lamouroux is a siphonous green alga, and its extruded protoplasm can aggregate spontaneously in seawater and develop into mature individuals. The chloroplast of B. hypnoides is the biggest organelle in the cell and shows strong autonomy. To better understand this organelle, we sequenced and analyzed the chloroplast genome of this green alga. Principal Findings A total of 111 functional genes, including 69 potential protein-coding genes, 5 ribosomal RNA genes, and 37 tRNA genes were identified. The genome size (153,429 bp), arrangement, and inverted-repeat (IR)-lacking structure of the B. hypnoides chloroplast DNA (cpDNA) closely resembles that of Chlorella vulgaris. Furthermore, our cytogenomic investigations using pulsed-field gel electrophoresis (PFGE) and southern blotting methods showed that the B. hypnoides cpDNA had multimeric forms, including monomer, dimer, trimer, tetramer, and even higher multimers, which is similar to the higher order organization observed previously for higher plant cpDNA. The relative amounts of the four multimeric cpDNA forms were estimated to be about 1, 1/2, 1/4, and 1/8 based on molecular hybridization analysis. Phylogenetic analyses based on a concatenated alignment of chloroplast protein sequences suggested that B. hypnoides is sister to all Chlorophyceae and this placement received moderate support. Conclusion All of the results suggest that the autonomy of the chloroplasts of B. hypnoides has little to do with the size and gene content of the cpDNA, and the IR-lacking structure of the chloroplasts indirectly demonstrated that the multimeric molecules might result from the random cleavage and fusion of replication intermediates instead of recombinational events. PMID:21339817

  11. MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.

    PubMed

    González-Domínguez, Jorge; Liu, Yongchao; Touriño, Juan; Schmidt, Bertil

    2016-12-15

    MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively.

  12. Feature-based multiexposure image-sequence fusion with guided filter and image alignment

    NASA Astrophysics Data System (ADS)

    Xu, Liang; Du, Junping; Zhang, Zhenhong

    2015-01-01

    Multiexposure fusion images have a higher dynamic range and reveal more details than a single captured image of a real-world scene. A clear and intuitive feature-based fusion technique for multiexposure image sequences is conceptually proposed. The main idea of the proposed method is to combine three image features [phase congruency (PC), local contrast, and color saturation] to obtain weight maps of the images. Then, the weight maps are further refined using a guided filter which can improve their accuracy. The final fusion result is constructed using the weighted sum of the source image sequence. In addition, for multiexposure image-sequence fusion involving dynamic scenes containing moving objects, ghost artifacts can easily occur if fusion is directly performed. Therefore, an image-alignment method is first used to adjust the input images to correspond to a reference image, after which fusion is performed. Experimental results demonstrate that the proposed method has a superior performance compared to the existing methods.

  13. RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA

    PubMed Central

    Wright, Imogen A.; Travers, Simon A.

    2014-01-01

    The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618

  14. CUDA ClustalW: An efficient parallel algorithm for progressive multiple sequence alignment on Multi-GPUs.

    PubMed

    Hung, Che-Lun; Lin, Yu-Shiang; Lin, Chun-Yuan; Chung, Yeh-Ching; Chung, Yi-Fang

    2015-10-01

    For biological applications, sequence alignment is an important strategy to analyze DNA and protein sequences. Multiple sequence alignment is an essential methodology to study biological data, such as homology modeling, phylogenetic reconstruction and etc. However, multiple sequence alignment is a NP-hard problem. In the past decades, progressive approach has been proposed to successfully align multiple sequences by adopting iterative pairwise alignments. Due to rapid growth of the next generation sequencing technologies, a large number of sequences can be produced in a short period of time. When the problem instance is large, progressive alignment will be time consuming. Parallel computing is a suitable solution for such applications, and GPU is one of the important architectures for contemporary parallel computing researches. Therefore, we proposed a GPU version of ClustalW v2.0.11, called CUDA ClustalW v1.0, in this work. From the experiment results, it can be seen that the CUDA ClustalW v1.0 can achieve more than 33× speedups for overall execution time by comparing to ClustalW v2.0.11.

  15. Nucleotide sequence of a cluster of early and late genes in a conserved segment of the vaccinia virus genome.

    PubMed Central

    Plucienniczak, A; Schroeder, E; Zettlmeissl, G; Streeck, R E

    1985-01-01

    The nucleotide sequence of a 7.6 kb vaccinia DNA segment from a genomic region conserved among different orthopox virus has been determined. This segment contains a tight cluster of 12 partly overlapping open reading frames most of which can be correlated with previously identified early and late proteins and mRNAs. Regulatory signals used by vaccinia virus have been studied. Presumptive promoter regions are rich in A, T and carry the consensus sequences TATA and AATAA spaced at 20-24 base pairs. Tandem repeats of a CTATTC consensus sequence are proposed to be involved in the termination of early transcription. PMID:2987815

  16. The nucleotide sequence of blue-green algae phenylalanine-tRNA and the evolutionary origin of chloroplasts.

    PubMed Central

    Hecker, L I; Barnett, W E; Lin, F K; Furr, T D; Heckman, J E; RajBhandary, U L; Chang, S H

    1982-01-01

    Phenylalanine tRNA from the blue-green alga, Agmenellum quadruplicatum, has been purified to homogeneity. The nucleotide sequence of this tRNA was determined to be: (see tests) Comparisons of the sequence and the modified nucleosides of this tRNA with those of other tRNAPhes thus far sequenced, indicate that this blue green algal tRNAPhe is typically prokaryotic and closely resembles the chloroplast tRNAPhes of higher plants and Euglena. The significance of this observation to the evolutionary origin of chloroplasts is discussed. Images PMID:6817301

  17. Nucleotide sequence of a complementary DNA encoding pea cytosolic copper/zinc superoxide dismutase. [Pisum sativum L

    SciTech Connect

    White, D.A.; Zilinskas, B.A. )

    1991-08-01

    The authors now report the nucleotide sequence of the cytosolic Cu/Zn SOD cloned from a {lambda}gt11 cDNA library constructed from mRNA extracted from leaves of 7- to 10-d pea seedlings (Pisum sativum L.). The clone was isolated using a 22-base synthetic oligonucleotide complementary to the amino acid sequence CGIIGLQG. This sequence, found at the protein's carboxy terminus, is highly conserved among plant cytosolic Cu/Zn SODs but not chloroplastic Cu/Zn SODs. The 738-base pair sequence contains an open reading frame specifying 152 codons and a predicted M{sub r} of 18,024 D. The deduced amino acid sequence is highly homologous (79-82% identity) with the sequences of other known plant cytosolic Cu/Zn SODs but less highly conserved (63-65%) when compared with several chloroplastic Cu/Zn SODs including pea (10).

  18. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    PubMed

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples.

  19. Alignment of high-throughput sequencing data inside in-memory databases.

    PubMed

    Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

    2014-01-01

    In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

  20. Single nucleotide polymorphism discovery from expressed sequence tags in the waterflea Daphnia magna

    PubMed Central

    2011-01-01

    Background Daphnia (Crustacea: Cladocera) plays a central role in standing aquatic ecosystems, has a well known ecology and is widely used in population studies and environmental risk assessments. Daphnia magna is, especially in Europe, intensively used to study stress responses of natural populations to pollutants, climate change, and antagonistic interactions with predators and parasites, which have all been demonstrated to induce micro-evolutionary and adaptive responses. Although its ecology and evolutionary biology is intensively studied, little is known on the functional genomics underpinning of phenotypic responses to environmental stressors. The aim of the present study was to find genes expressed in presence of environmental stressors, and target such genes for single nucleotide polymorphic (SNP) marker development. Results We developed three expressed sequence tag (EST) libraries using clonal lineages of D. magna exposed to ecological stressors, namely fish predation, parasite infection and pesticide exposure. We used these newly developed ESTs and other Daphnia ESTs retrieved from NCBI GeneBank to mine for SNP markers targeting synonymous as well as non synonymous genetic variation. We validate the developed SNPs in six natural populations of D. magna distributed at regional scale. Conclusions A large proportion (47%) of the produced ESTs are Daphnia lineage specific genes, which are potentially involved in responses to environmental stress rather than to general cellular functions and metabolic activities, or reflect the arthropod's aquatic lifestyle. The characterization of genes expressed under stress and the validation of their SNPs for population genetic study is important for identifying ecologically responsive genes in D. magna. PMID:21668940

  1. Nucleotide sequence and characterization of a Bacillus subtilis gene encoding a flagellar switch protein.

    PubMed Central

    Zuberi, A R; Bischoff, D S; Ordal, G W

    1991-01-01

    The nucleotide sequence of the Bacillus subtilis fliM gene has been determined. This gene encodes a 38-kDa protein that is homologous to the FliM flagellar switch proteins of Escherichia coli and Salmonella typhimurium. Expression of this gene in Che+ cells of E. coli and B. subtilis interferes with normal chemotaxis. The nature of the chemotaxis defect is dependent upon the host used. In B. subtilis, overproduction of FliM generates mostly nonmotile cells. Those cells that are motile switch less frequently. Expression of B. subtilis FliM in E. coli also generates nonmotile cells. However, those cells that are motile have a tumble bias. The B. subtilis fliM gene cannot complement an E. coli fliM mutant. A frameshift mutation was constructed in the fliM gene, and the mutation was transferred onto the B. subtilis chromosome. The mutant has a Fla- phenotype. This phenotype is consistent with the hypothesis that the FliM protein encodes a component of the flagellar switch in B. subtilis. Additional characterization of the fliM mutant suggests that the hag and mot loci are not expressed. These loci are regulated by the SigD form of RNA polymerase. We also did not observe any methyl-accepting chemotaxis proteins in an in vivo methylation experiment. The expression of these proteins is also dependent upon SigD. It is possible that a functional basal body-hook complex may be required for the expression of SigD-regulated chemotaxis and motility genes. Images PMID:1898932

  2. Nucleotide Sequence and Genetic Structure of a Novel Carbaryl Hydrolase Gene (cehA) from Rhizobium sp. Strain AC100

    PubMed Central

    Hashimoto, Masayuki; Fukui, Mitsuru; Hayano, Kouichi; Hayatsu, Masahito

    2002-01-01

    Rhizobium sp. strain AC100, which is capable of degrading carbaryl (1-naphthyl-N-methylcarbamate), was isolated from soil treated with carbaryl. This bacterium hydrolyzed carbaryl to 1-naphthol and methylamine. Carbaryl hydrolase from the strain was purified to homogeneity, and its N-terminal sequence, molecular mass (82 kDa), and enzymatic properties were determined. The purified enzyme hydrolyzed 1-naphthyl acetate and 4-nitrophenyl acetate indicating that the enzyme is an esterase. We then cloned the carbaryl hydrolase gene (cehA) from the plasmid DNA of the strain and determined the nucleotide sequence of the 10-kb region containing cehA. No homologous sequences were found by a database homology search using the nucleotide and deduced amino acid sequences of the cehA gene. Six open reading frames including the cehA gene were found in the 10-kb region, and sequencing analysis shows that the cehA gene is flanked by two copies of insertion sequence-like sequence, suggesting that it makes part of a composite transposon. PMID:11872471

  3. HLA-C locus allelic dropout in Sanger sequence-based typing due to intronic single nucleotide polymorphism.

    PubMed

    Cheng, Christopher; Kashi, Zahra Mehdizadeh; Martin, Russell; Woodruff, Gillian; Dinauer, David; Agostini, Tina

    2014-12-01

    We report a novel HLA-C allele that was identified during routine HLA typing using sequence-based methods. The patient was initially typed as a C*06:02, 06:04 with two nucleotide mismatches in exon 3, (C to T and T to G changes) which would have resulted in a non-synonymous mutation of a leucine residue being replaced with tryptophan. Further resolution of the patient's type by using sequence-specific primers (SSP) revealed that the companion allele to C*06:02 was a novel C*17:01. Confirmation of the existence of the new allele was performed across multiple platforms: Sanger sequencing, SSP, and Next Generation Sequencing (NGS) on the original sample and allele-specific clones for the entire HLA-C locus. The investigation revealed a single nucleotide mismatch within the Sanger sequencing primer binding site in intron 3. The mutation caused the initial C*17 dropout in exons 2 and 3. Further analysis of the Sanger and NGS data revealed that the C*17 had two additional unique positions in introns 2 and 7. The companion C*06:02 allele also possessed a novel position at intron 3. On August 31, 2013, the WHO nomenclature committee officially named the novel C*17:01 allele sequence as C*17:01:01:03 and the novel C*06:02 allele sequence as C*06:02:01:03.

  4. Nucleotide sequencing and analysis of 16S rDNA and 16S-23S rDNA internal spacer region (ISR) of Taylorella equigenitalis, as an important pathogen for contagious equine metritis (CEM).

    PubMed

    Kagawa, S; Nagano, Y; Tazumi, A; Murayama, O; Millar, B C; Moore, J E; Matsuda, M

    2006-05-01

    The primer set for 16S rDNA amplified an amplicon of about 1500 bp in length for three strains of Taylorella equigenitalis (NCTC11184(T), Kentucky188 and EQ59). Sequence differences of the 16S rDNA among the six sequences, including three reference sequences, occurred at only a few nucleotide positions and thus, an extremely high sequence similarity of the 16S rDNA was first demonstrated among the six sequences. In addition, the primer set for 16S-23S rDNA internal spacer region (ISR) amplified two amplicons about 1300 bp and 1200 bp in length for the three strains. The ISRs were estimated to be about 920 bp in length for large ISR-A and about 830 bp for small ISR-B. Sequence alignment of the ISR-A and ISR-B demonstrated about 10 base differences between NCTC11184(T) and EQ59 and between Kentucky188 and EQ59. However, only minor sequence differences were demonstrated between the ISR-A and ISR-B from NCTC11184(T) and Kentucky188, respectively. A typical order of the intercistronic tRNAs with the 29 nucleotide spacer of 5'-16S rDNA-tRNA(Ile)-tRNA(Ala)-23S rDNA-3' was demonstrated in the all ISRs. The ISRs may be useful for the discrimination amongst isolates of T. equigenitalis if sequencing is employed.

  5. Open-Phylo: a customizable crowd-computing platform for multiple sequence alignment

    PubMed Central

    2013-01-01

    Citizen science games such as Galaxy Zoo, Foldit, and Phylo aim to harness the intelligence and processing power generated by crowds of online gamers to solve scientific problems. However, the selection of the data to be analyzed through these games is under the exclusive control of the game designers, and so are the results produced by gamers. Here, we introduce Open-Phylo, a freely accessible crowd-computing platform that enables any scientist to enter our system and use crowds of gamers to assist computer programs in solving one of the most fundamental problems in genomics: the multiple sequence alignment problem. PMID:24148814

  6. Open-Phylo: a customizable crowd-computing platform for multiple sequence alignment.

    PubMed

    Kwak, Daniel; Kam, Alfred; Becerra, David; Zhou, Qikuan; Hops, Adam; Zarour, Eleyine; Kam, Arthur; Sarmenta, Luis; Blanchette, Mathieu; Waldispühl, Jérôme

    2013-01-01

    Citizen science games such as Galaxy Zoo, Foldit, and Phylo aim to harness the intelligence and processing power generated by crowds of online gamers to solve scientific problems. However, the selection of the data to be analyzed through these games is under the exclusive control of the game designers, and so are the results produced by gamers. Here, we introduce Open-Phylo, a freely accessible crowd-computing platform that enables any scientist to enter our system and use crowds of gamers to assist computer programs in solving one of the most fundamental problems in genomics: the multiple sequence alignment problem.

  7. Identification and Evaluation of Single-Nucleotide Polymorphisms in Allotetraploid Peanut (Arachis hypogaea L.) Based on Amplicon Sequencing Combined with High Resolution Melting (HRM) Analysis

    PubMed Central

    Hong, Yanbin; Pandey, Manish K.; Liu, Ying; Chen, Xiaoping; Liu, Hong; Varshney, Rajeev K.; Liang, Xuanqiang; Huang, Shangzhi

    2015-01-01

    The cultivated peanut (Arachis hypogaea L.) is an allotetraploid (AABB) species derived from the A-genome (Arachis duranensis) and B-genome (Arachis ipaensis) progenitors. Presence of two versions of a DNA sequence based on the two progenitor genomes poses a serious technical and analytical problem during single nucleotide polymorphism (SNP) marker identification and analysis. In this context, we have analyzed 200 amplicons derived from expressed sequence tags (ESTs) and genome survey sequences (GSS) to identify SNPs in a panel of genotypes consisting of 12 cultivated peanut varieties and two diploid progenitors representing the ancestral genomes. A total of 18 EST-SNPs and 44 genomic-SNPs were identified in 12 peanut varieties by aligning the sequence of A. hypogaea with diploid progenitors. The average frequency of sequence polymorphism was higher for genomic-SNPs than the EST-SNPs with one genomic-SNP every 1011 bp as compared to one EST-SNP every 2557 bp. In order to estimate the potential and further applicability of these identified SNPs, 96 peanut varieties were genotyped using high resolution melting (HRM) method. Polymorphism information content (PIC) values for EST-SNPs ranged between 0.021 and 0.413 with a mean of 0.172 in the set of peanut varieties, while genomic-SNPs ranged between 0.080 and 0.478 with a mean of 0.249. Total 33 SNPs were used for polymorphism detection among the parents and 10 selected lines from mapping population Y13Zh (Zhenzhuhei × Yueyou13). Of the total 33 SNPs, nine SNPs showed polymorphism in the mapping population Y13Zh, and seven SNPs were successfully mapped into five linkage groups. Our results showed that SNPs can be identified in allotetraploid peanut with high accuracy through amplicon sequencing and HRM assay. The identified SNPs were very informative and can be used for different genetic and breeding applications in peanut. PMID:26697032

  8. Identification and Evaluation of Single-Nucleotide Polymorphisms in Allotetraploid Peanut (Arachis hypogaea L.) Based on Amplicon Sequencing Combined with High Resolution Melting (HRM) Analysis.

    PubMed

    Hong, Yanbin; Pandey, Manish K; Liu, Ying; Chen, Xiaoping; Liu, Hong; Varshney, Rajeev K; Liang, Xuanqiang; Huang, Shangzhi

    2015-01-01

    The cultivated peanut (Arachis hypogaea L.) is an allotetraploid (AABB) species derived from the A-genome (Arachis duranensis) and B-genome (Arachis ipaensis) progenitors. Presence of two versions of a DNA sequence based on the two progenitor genomes poses a serious technical and analytical problem during single nucleotide polymorphism (SNP) marker identification and analysis. In this context, we have analyzed 200 amplicons derived from expressed sequence tags (ESTs) and genome survey sequences (GSS) to identify SNPs in a panel of genotypes consisting of 12 cultivated peanut varieties and two diploid progenitors representing the ancestral genomes. A total of 18 EST-SNPs and 44 genomic-SNPs were identified in 12 peanut varieties by aligning the sequence of A. hypogaea with diploid progenitors. The average frequency of sequence polymorphism was higher for genomic-SNPs than the EST-SNPs with one genomic-SNP every 1011 bp as compared to one EST-SNP every 2557 bp. In order to estimate the potential and further applicability of these identified SNPs, 96 peanut varieties were genotyped using high resolution melting (HRM) method. Polymorphism information content (PIC) values for EST-SNPs ranged between 0.021 and 0.413 with a mean of 0.172 in the set of peanut varieties, while genomic-SNPs ranged between 0.080 and 0.478 with a mean of 0.249. Total 33 SNPs were used for polymorphism detection among the parents and 10 selected lines from mapping population Y13Zh (Zhenzhuhei × Yueyou13). Of the total 33 SNPs, nine SNPs showed polymorphism in the mapping population Y13Zh, and seven SNPs were successfully mapped into five linkage groups. Our results showed that SNPs can be identified in allotetraploid peanut with high accuracy through amplicon sequencing and HRM assay. The identified SNPs were very informative and can be used for different genetic and breeding applications in peanut.

  9. Nucleotide sequence and analysis of the 58.3 to 65.5-kb early region of bacteriophage T4.

    PubMed Central

    Valerie, K; Stevens, J; Lynch, M; Henderson, E E; de Riel, J K

    1986-01-01

    The complete 7.2-kb nucleotide sequence from the 58.3 to 65.5-kb early region of bacteriophage T4 has been determined by Maxam and Gilbert sequencing. Computer analysis revealed at least 20 open reading frames (ORFs) within this sequence. All major ORFs are transcribed from the left strand, suggesting that they are expressed early during infection. Among the ORFs, we have identified the ipIII, ipII, denV and tk genes. The ORFs are very tightly spaced, even overlapping in some instances, and when ORF interspacing occurs, promoter-like sequences can be implicated. Several of the sequences preceding the ORFs, in particular those at ipIII, ipII, denV, and orf61.9, can potentially form stable stem-loop structures. PMID:3024113

  10. Complete nucleotide sequence and analysis of the putative polyprotein of maize dwarf mosaic virus genomic RNA (Bulgarian isolate).

    PubMed

    Kong, P; Steinbiss, H H

    1998-01-01

    The complete nucleotide sequence of maize dwarf mosaic virus Bulgarian isolate (MDMV-Bg) was determined. The viral genome was 9515 nt and contained an open reading frame encoding 3042 amino acids, flanked by 3'- and 5'-UTRs of 139 and 250 nucleotides, respectively. MDMV-Bg was more conserved in the coding region (52.9%) than in the UTRs (45.8%) when compared to the 15 other potyviruses. Of ten putative gene products of MDMV-Bg, the P1 was the most variable protein (24.9%) while the NIb was the most conserved protein (67.3%). Several sequence variations were observed between MDMV-Bg and Johnson grass mosaic virus (JGMV), and more between MDMV-Bg and the dicot potyviruses. Phylogenetic analysis suggested that MDMV-Bg was the most closely related to JGMV.

  11. The nucleotide sequence of the coat protein genes of satsuma dwarf virus and naval orange infectious mottling virus.

    PubMed

    Iwanami, T; Kondo, Y; Makita, Y; Azeyanagi, C; Ieki, H

    1998-01-01

    The sequence of the 3'-terminal 4320 and 2409 nucleotides were determined for RNA2 of satsuma dwarf virus (SDV) and navel infectious mottling virus (NIMV). Both sequences contained a part of a long open reading frame which encodes larger and smaller coat proteins (CPs) at the 3'-terminus followed by a 3'non-coding region upstream of a poly (A) tail. Amino acid sequence identity for larger and smaller CPs ranged 81-84% and 68-78%, respectively, among SDV, NIMV and the previously sequenced citrus mosaic virus (CiMV). No significant sequence similarity was found between the CPs of SDV or NIMV and those of the como-, nepo- or other viruses. The nucleotide sequence identity of the 3' non-coding region of RNA2 were 68%-78% among SDV, CiMV and NIMV. These results suggest that SDV, CiMV and NIMV are distinct, though related, viruses. They may be assigned as members of the new genus, which is close to the genera of Comovirus and Nepovirus.

  12. Cloning and nucleotide sequence of the gene coding for aspartokinase II from a thermophilic methylotrophic Bacillus sp.

    PubMed Central

    Schendel, F J; Flickinger, M C

    1992-01-01

    The structural gene coding for the lysine-sensitive aspartokinase II of the methylotrophic thermotolerant Bacillus sp. strain MGA3 was cloned from a genomic library by complementation of an Escherichia coli auxotrophic mutant lacking all three aspartokinase isozymes. The nucleotide sequence of the entire 2.2-kb PstI fragment was determined, and a single open reading frame coding for the aspartokinase II enzyme was found. Aspartokinase II was shown to be an alpha 2 beta 2 tetramer (M(r) 122,000) with the beta subunit (M(r) 18,000) encoded within the alpha subunit (M(r) 45,000) in the samea reading frame. The enzyme was purified, and the N-terminal sequences of the alpha and beta subunits were identical with those predicted from the gene sequences. The predicted amino acid sequence was 76% identical with the sequence of the Bacillus subtilis aspartokinase II. The transcription initiation site was located approximately 350 bp upstream of the translation start site, and putative promoter regions at -10 (TATGCT) and -35 (ATGACA) were identified. A 300-nucleotide intervening sequence between the transcription initiation and translational start sites suggests a possible attenuation mechanism for the regulation of transcription of this enzyme in the presence of lysine. Images PMID:1444390

  13. Complete nucleotide sequence of the haemagglutinin gene from a human influenza virus of the Hong Kong subtype.

    PubMed Central

    Both, G W; Sleigh, M J

    1980-01-01

    The complete nucleotide sequence has been determined for a cloned double-stranded DNA copy of the haemagglutinin gene from the human influenza strain A/NT/60/68/29C, a laboratory-isolated variant of A/NT/60/68, an early strain of the Hong Kong subtype. The gene is 1765 nucleotides long and contains information sufficient to code for a protein of 566 amino acids, which includes a hydrophobic leader peptide (16 residues), HA1 (328), HA2 (221) and an arginine residue which joins the HA subunits. Comparison of the predicted amino acid sequence for 29C haemagglutinin with protein sequence data available for HA from other influenza strains shows that no potential coding information is lost by processing of the mRNA. A comparison of the amino acid sequences predicted from the gene sequences for 29C and fowl plague virus haemagglutinins, (1) indicates the extent to which changes can occur in the primary sequence of different regions of the protein, while maintaining essential structure and function. Images PMID:6253883

  14. EzEditor: a versatile sequence alignment editor for both rRNA- and protein-coding genes.

    PubMed

    Jeon, Yoon-Seong; Lee, Kihyun; Park, Sang-Cheol; Kim, Bong-Soo; Cho, Yong-Joon; Ha, Sung-Min; Chun, Jongsik

    2014-02-01

    EzEditor is a Java-based molecular sequence editor allowing manipulation of both DNA and protein sequence alignments for phylogenetic analysis. It has multiple features optimized to connect initial computer-generated multiple alignment and subsequent phylogenetic analysis by providing manual editing with reference to biological information specific to the genes under consideration. It provides various functionalities for editing rRNA alignments using secondary structure information. In addition, it supports simultaneous editing of both DNA sequences and their translated protein sequences for protein-coding genes. EzEditor is, to our knowledge, the first sequence editing software designed for both rRNA- and protein-coding genes with the visualization of biologically relevant information and should be useful in molecular phylogenetic studies. EzEditor is based on Java, can be run on all major computer operating systems and is freely available from http://sw.ezbiocloud.net/ezeditor/.

  15. Prediction of protein function improving sequence remote alignment search by a fuzzy logic algorithm.

    PubMed

    Gómez, Antonio; Cedano, Juan; Espadaler, Jordi; Hermoso, Antonio; Piñol, Jaume; Querol, Enrique

    2008-02-01

    The functional annotation of the new protein sequences represents a major drawback for genomic science. The best way to suggest the function of a protein from its sequence is by finding a related one for which biological information is available. Current alignment algorithms display a list of protein sequence stretches presenting significant similarity to different protein targets, ordered by their respective mathematical scores. However, statistical and biological significance do not always coincide, therefore, the rearrangement of the program output according to more biological characteristics than the mathematical scoring would help functional annotation. A new method that predicts the putative function for the protein integrating the results from the PSI-BLAST program and a fuzzy logic algorithm is described. Several protein sequence characteristics have been checked in their ability to rearrange a PSI-BLAST profile according more to their biological functions. Four of them: amino acid content, matched segment length and hydropathic and flexibility profiles positively contributed, upon being integrated by a fuzzy logic algorithm into a program, BYPASS, to the accurate prediction of the function of a protein from its sequence.

  16. An alignment-free method to find and visualise rearrangements between pairs of DNA sequences

    PubMed Central

    Pratas, Diogo; Silva, Raquel M.; Pinho, Armando J.; Ferreira, Paulo J.S.G.

    2015-01-01

    Species evolution is indirectly registered in their genomic structure. The emergence and advances in sequencing technology provided a way to access genome information, namely to identify and study evolutionary macro-events, as well as chromosome alterations for clinical purposes. This paper describes a completely alignment-free computational method, based on a blind unsupervised approach, to detect large-scale and small-scale genomic rearrangements between pairs of DNA sequences. To illustrate the power and usefulness of the method we give complete chromosomal information maps for the pairs human-chimpanzee and human-orangutan. The tool by means of which these results were obtained has been made publicly available and is described in detail. PMID:25984837

  17. A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure

    PubMed Central

    2002-01-01

    Background Covariance models (CMs) are probabilistic models of RNA secondary structure, analogous to profile hidden Markov models of linear sequence. The dynamic programming algorithm for aligning a CM to an RNA sequence of length N is O(N3) in memory. This is only practical for small RNAs. Results I describe a divide and conquer variant of the alignment algorithm that is analogous to memory-efficient Myers/Miller dynamic programming algorithms for linear sequence alignment. The new algorithm has an O(N2 log N) memory complexity, at the expense of a small constant factor in time. Conclusions Optimal ribosomal RNA structural alignments that previously required up to 150 GB of memory now require less than 270 MB. PMID:12095421

  18. SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

    PubMed Central

    Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

    2001-01-01

    Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202

  19. A probabilistic coding based quantum genetic algorithm for multiple sequence alignment.

    PubMed

    Huo, Hongwei; Xie, Qiaoluan; Shen, Xubang; Stojkovic, Vojislav

    2008-01-01

    This paper presents an original Quantum Genetic algorithm for Multiple sequence ALIGNment (QGMALIGN) that combines a genetic algorithm and a quantum algorithm. A quantum probabilistic coding is designed for representing the multiple sequence alignment. A quantum rotation gate as a mutation operator is used to guide the quantum state evolution. Six genetic operators are designed on the coding basis to improve the solution during the evolutionary process. The features of implicit parallelism and state superposition in quantum mechanics and the global search capability of the genetic algorithm are exploited to get efficient computation. A set of well known test cases from BAliBASE2.0 is used as reference to evaluate the efficiency of the QGMALIGN optimization. The QGMALIGN results have been compared with the most popular methods (CLUSTALX, SAGA, DIALIGN, SB_PIMA, and QGMALIGN) results. The QGMALIGN results show that QGMALIGN performs well on the presenting biological data. The addition of genetic operators to the quantum algorithm lowers the cost of overall running time.

  20. QuickProbs--a fast multiple sequence alignment algorithm designed for graphics processors.

    PubMed

    Gudyś, Adam; Deorowicz, Sebastian

    2014-01-01

    Multiple sequence alignment is a crucial task in a number of biological analyses like secondary structure prediction, domain searching, phylogeny, etc. MSAProbs is currently the most accurate alignment algorithm, but its effectiveness is obtained at the expense of computational time. In the paper we present QuickProbs, the variant of MSAProbs customised for graphics processors. We selected the two most time consuming stages of MSAProbs to be redesigned for GPU execution: the posterior matrices calculation and the consistency transformation. Experiments on three popular benchmarks (BAliBASE, PREFAB, OXBench-X) on quad-core PC equipped with high-end graphics card show QuickProbs to be 5.7 to 9.7 times faster than original CPU-parallel MSAProbs. Additional tests performed on several protein families from Pfam database give overall speed-up of 6.7. Compared to other algorithms like MAFFT, MUSCLE, or ClustalW, QuickProbs proved to be much more accurate at similar speed. Additionally we introduce a tuned variant of QuickProbs which is significantly more accurate on sets of distantly related sequences than MSAProbs without exceeding its computation time. The GPU part of QuickProbs was implemented in OpenCL, thus the package is suitable for graphics processors produced by all major vendors.

  1. A close relationship between primary nucleotides sequence structure and the composition of functional genes in the genome of prokaryotes.

    PubMed

    Garcia, Juan A L; Fernández-Guerra, Antoni; Casamayor, Emilio O

    2011-12-01

    Comparative genomics is an essential tool to unravel how genomes change over evolutionary time and to gain clues on the links between functional genomics and evolution. In prokaryotes, the large, good quality, genome sequences available in public databases and the recently developed large-scale computational methods, offer an unprecedent view on the ecology and evolution of microorganisms through comparative genomics. In this work, we examined the links among genome structure (i.e., the sequential distribution of nucleotides itself by detrended fluctuation analysis, DFA) and genomic diversity (i.e., gene functionality by Clusters of Orthologous Genes, COGs) in 828 full sequenced prokaryotic genomes from 548 different bacteria and archaea species. DFA scaling exponent α indicated persistent long-range correlations (fractality) in each genome analyzed. Higher resolution power was found when considering the sequential succession of purine (AG) vs. pyrimidine (CT) bases than either keto (GT) to amino (AC) forms or strongly (GC) vs. weakly (AT) bonded nucleotides. Interestingly, the phyla Aquificae, Fusobacteria, Dictyoglomi, Nitrospirae, and Thermotogae were closer to archaea than to their bacterial counterparts. A strong significant correlation was found between scaling exponent α and COGs distribution, and we consistently observed that the larger α the more heterogeneous was the gene distribution within each functional category, suggesting a close relationship between primary nucleotides sequence structure and functional genes composition.

  2. [Analysis on the preference of synonymous codon in VP1 nucleotide sequence of the EV71 based on RSCU method].

    PubMed

    Qi, Bin; Zhao, Jing-Jing; Gao, Lei; Zhu, Ping

    2009-11-01

    Based on RSCU method and by analyzing the preference of codon usage in VP1 nucleotide sequences of EV71 isolated in Chinese mainland and Taiwan region from 1998 to 2008, it is clear that there is an obvious time discrimination in RSCU calculated from EV71 VP1 strain between two different regions of China and it is more obvious in Taiwan region, therefore, according to the diversity of RSCU, the years can be divided into 2 intervals in Chinese mainland and 4 intervals in Taiwan region, especially, the number of intervals in one region have a positive co-relation with the activity of variation of the EV71 in the same region. The change of the preference of codon usage in VP1 nucleotide sequences of EV71 can significantly embody the Variation of the EV71, so we can make use of the analysis on preference of codon usage in VP1 nucleotide sequences of EV71 to predict the possible variation trend of the EV71.

  3. The Nucleotide Capture Region of Alpha Hemolysin: Insights into Nanopore Design for DNA Sequencing from Molecular Dynamics Simulations.

    PubMed

    Manara, Richard M A; Tomasio, Susana; Khalid, Syma

    2015-01-27

    Nanopore technology for DNA sequencing is constantly being refined and improved. In strand sequencing a single strand of DNA is fed through a nanopore and subsequent fluctuations in the current are measured. A major hurdle is that the DNA is translocated through the pore at a rate that is too fast for the current measurement systems. An alternative approach is "exonuclease sequencing", in which an exonuclease is attached to the nanopore that is able to process the strand, cleaving off one base at a time. The bases then flow through the nanopore and the current is measured. This method has the advantage of potentially solving the translocation rate problem, as the speed is controlled by the exonuclease. Here we consider the practical details of exonuclease attachment to the protein alpha hemolysin. We employ molecular dynamics simulations to determine the ideal (a) distance from alpha-hemolysin, and (b) the orientation of the monophosphate nucleotides upon release from the exonuclease such that they will enter the protein. Our results indicate an almost linear decrease in the probability of entry into the protein with increasing distance of nucleotide release. The nucleotide orientation is less significant for entry into the protein.

  4. Complete nucleotide sequence and gene rearrangement of the mitochondrial genome of the bell-ring frog, Buergeria buergeri (family Rhacophoridae).

    PubMed

    Sano, Naomi; Kurabayashi, Atsushi; Fujii, Tamotsu; Yonekawa, Hiromichi; Sumida, Masayuki

    2004-06-01

    In this study we determined the complete nucleotide sequence (19,959 bp) of the mitochondrial DNA of the rhacophorid frog Buergeria buergeri. The gene content, nucleotide composition, and codon usage of B. buergeri conformed to those of typical vertebrate patterns. However, due to an accumulation of lengthy repetitive sequences in the D-loop region, this species possesses the largest mitochondrial genome among all the vertebrates examined so far. Comparison of the gene organizations among amphibian species (Rana, Xenopus, salamanders and caecilians) revealed that the positioning of four tRNA genes and the ND5 gene in the mtDNA of B. buergeri diverged from the common vertebrate gene arrangement shared by Xenopus, salamanders and caecilians. The unique positions of the tRNA genes in B. buergeri are shared by ranid frogs, indicating that the rearrangements of the tRNA genes occurred in a common ancestral lineage of ranids and rhacophorids. On the other hand, the novel position of the ND5 gene seems to have arisen in a lineage leading to rhacophorids (and other closely related taxa) after ranid divergence. Phylogenetic analysis based on nucleotide sequence data of all mitochondrial genes also supported the gene rearrangement pathway.

  5. Comparative nucleotide sequences encoding the immunity proteins and the carboxyl-terminal peptides of colicins E2 and E3.

    PubMed Central

    Lau, P C; Rowsome, R W; Zuker, M; Visentin, L P

    1984-01-01

    Using the M13 dideoxy sequencing technique, we have established the DNA sequences of colicins E2 and E3 which encompass the receptor-binding and the catalytic domains of each of the nucleases, and their immunity (imm) genes. The imm gene of plasmid ColE2-P9 is 255 bp long and is separated from the end of the col gene by a dinucleotide. This gene pair is arranged similarly in plasmid ColE3-CA38 except that the intergenic space is 9 bp and the E3 imm gene is one codon shorter than its E2 counterpart. Comparisons of the E2 and E3 imm sequences indicate considerable divergence whereas the receptor-binding domains of both colicins are highly conserved. The two nuclease domains appear to share some sequence homology. A possible evolutionary relationship between colicin E3 and other microbial extracellular ribonucleases is also suggested from the sequence alignment analysis. PMID:6095211

  6. IRE1α nucleotide sequence cleavage specificity in the unfolded protein response.

    PubMed

    Poothong, Juthakorn; Sopha, Pattarawut; Kaufman, Randal J; Tirasophon, Witoon

    2017-01-01

    Inositol-requiring enzyme 1 (IRE1) is a conserved sensor of the unfolded protein response that has protein kinase and endoribonuclease (RNase) enzymatic activities and thereby initiates HAC1/XBP1 splicing. Previous studies demonstrated that human IRE1α (hIRE1α) does not cleave Saccharomyces cerevisiae HAC1 mRNA. Using an in vitro cleavage assay, we show that adenine to cytosine nucleotide substitution at the +1 position in the 3' splice site of HAC1 RNA is required for specific cleavage by hIRE1α. A similar restricted nucleotide specificity in the RNA substrate was observed for XBP1 splicing in vivo. Together these findings underscore the essential role of cytosine nucleotide at +1 in the 3' splice site for determining cleavage specificity of hIRE1α.

  7. Cloning and nucleotide sequence of the gene coding for enzymatically active fragments of the Bacillus polymyxa beta-amylase.

    PubMed

    Kawazu, T; Nakanishi, Y; Uozumi, N; Sasaki, T; Yamagata, H; Tsukagoshi, N; Udaka, S

    1987-04-01

    The gene encoding beta-amylase was cloned from Bacillus polymyxa 72 into Escherichia coli HB101 by inserting HindIII-generated DNA fragments into the HindIII site of pBR322. The 4.8-kilobase insert was shown to direct the synthesis of beta-amylase. A 1.8-kilobase AccI-AccI fragment of the donor strain DNA was sufficient for the beta-amylase synthesis. Homologous DNA was found by Southern blot analysis to be present only in B. polymyxa 72 and not in other bacteria such as E. coli or B. subtilis. B. polymyxa, as well as E. coli harboring the cloned DNA, was found to produce enzymatically active fragments of beta-amylases (70,000, 56,000, or 58,000, and 42,000 daltons), which were detected in situ by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Nucleotide sequence analysis of the cloned 3.1-kilobase DNA revealed that it contains one open reading frame of 2,808 nucleotides without a translational stop codon. The deduced amino acid sequence for these 2,808 nucleotides encoding a secretory precursor of the beta-amylase protein is 936 amino acids including a signal peptide of 33 or 35 residues at its amino-terminal end. The existence of a beta-amylase of larger than 100,000 daltons, which was predicted on the basis of the results of nucleotide sequence analysis of the gene, was confirmed by examining culture supernatants after various cultivation periods. It existed only transiently during cultivation, but the multiform beta-amylases described above existed for a long time. The large beta-amylase (approximately 160,000 daltons) existed for longer in the presence of a protease inhibitor such as chymostatin, suggesting that proteolytic cleavage is the cause of the formation of multiform beta-amylases.

  8. Organization and nucleotide sequence of a densovirus genome imply a host-dependent evolution of the parvoviruses.

    PubMed Central

    Bando, H; Kusuda, J; Gojobori, T; Maruyama, T; Kawase, S

    1987-01-01

    The genome structure of a densovirus from a silkworm was determined by sequencing more than 85% of the complete genome DNA. This is the first report of the genome organization of an insect parvovirus deduced from the DNA sequence. In the viral genome, two large open reading frames designated 1 and 2 and one smaller open reading frame designated 3 were identified. The first two open reading frames shared the same strand, while the third was found in the complementary sequence. Computer analysis suggested that open reading frame 2 may encode all four structural proteins. The genome organization and a part of the nucleotide sequence were conserved among the insect densovirus, rodent parvoviruses, and a human dependovirus. These viruses may have diverged from a common ancestor. PMID:3027382

  9. Complete Nucleotide Sequence and Genetic Organization of the 210-Kilobase Linear Plasmid of Rhodococcus erythropolis BD2

    PubMed Central

    Stecker, Christiane; Johann, Andre; Herzberg, Christina; Averhoff, Beate; Gottschalk, Gerhard

    2003-01-01

    The complete nucleotide sequence of the linear plasmid pBD2 from Rhodococcus erythropolis BD2 comprises 210,205 bp. Sequence analyses of pBD2 revealed 212 putative open reading frames (ORFs), 97 of which had an annotatable function. These ORFs could be assigned to six functional groups: plasmid replication and maintenance, transport and metalloresistance, catabolism, transposition, regulation, and protein modification. Many of the transposon-related sequences were found to flank the isopropylbenzene pathway genes. This finding together with the significant sequence similarities of the ipb genes to genes of the linear plasmid-encoded biphenyl pathway in other rhodococci suggests that the ipb genes were acquired via transposition events and subsequently distributed among the rhodococci via horizontal transfer. PMID:12923100

  10. Nucleotide sequence of the melA gene, coding for alpha-galactosidase in Escherichia coli K-12.

    PubMed Central

    Liljeström, P L; Liljeström, P

    1987-01-01

    Melibiose uptake and hydrolysis in E.coli is performed by the MelB and MelA proteins, respectively. We report the cloning and sequencing of the melA gene. The nucleotide sequence data showed that melA codes for a 450 amino acid long protein with a molecular weight of 50.6 kd. The sequence data also supported the assumption that the mel locus forms an operon with melA in proximal position. A comparison of MelA with alpha-galactosidase proteins from yeast and human origin showed that these proteins have only limited homology, the yeast and human proteins being more related. However, regions common to all three proteins were found indicating sequences that might comprise the active site of alpha-galactosidase. PMID:3031590

  11. Definition of the tempo of sequence diversity across an alignment and automatic identification of sequence motifs: Application to protein homologous families and superfamilies

    PubMed Central

    May, Alex C.W.

    2002-01-01

    It is often possible to identify sequence motifs that characterize a protein family in terms of its fold and/or function from aligned protein sequences. Such motifs can be used to search for new family members. Partitioning of sequence alignments into regions of similar amino acid variability is usually done by hand. Here, I present a completely automatic method for this purpose: one that is guaranteed to produce globally optimal solutions at all levels of partition granularity. The method is used to compare the tempo of sequence diversity across reliable three-dimensional (3D) structure-based alignments of 209 protein families (HOMSTRAD) and that for 69 superfamilies (CAMPASS). (The mean alignment length for HOMSTRAD and CAMPASS are very similar.) Surprisingly, the optimal segmentation distributions for the closely related proteins and distantly related ones are found to be very similar. Also, optimal segmentation identifies an unusual protein superfamily. Finally, protein 3D structure clues from the tempo of sequence diversity across alignments are examined. The method is general, and could be applied to any area of comparative biological sequence and 3D structure analysis where the constraint of the inherent linear organization of the data imposes an ordering on the set of objects to be clustered. PMID:12441381

  12. AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization

    NASA Astrophysics Data System (ADS)

    Newkirk, Daniel; Biesinger, Jacob; Chon, Alvin; Yokomori, Kyoko; Xie, Xiaohui

    High-throughput sequencing coupled to chromatin immunoprecipitation (ChIP-Seq) is widely used in characterizing genome-wide binding patterns of transcription factors, cofactors, chromatin modifiers, and other DNA binding proteins. A key step in ChIP-Seq data analysis is to map short reads from high-throughput sequencing to a reference genome and identify peak regions enriched with short reads. Although several methods have been proposed for ChIP-Seq analysis, most existing methods only consider reads that can be uniquely placed in the reference genome, and therefore have low power for detecting peaks located within repeat sequences. Here we introduce a probabilistic approach for ChIP-Seq data analysis which utilizes all reads, providing a truly genome-wide view of binding patterns. Reads are modeled using a mixture model corresponding to K enriched regions and a null genomic background. We use maximum likelihood to estimate the locations of the enriched regions, and implement an expectation-maximization (E-M) algorithm, called AREM (aligning reads by expectation maximization), to update the alignment probabilities of each read to different genomic locations. We apply the algorithm to identify genome-wide binding events of two proteins: Rad21, a component of cohesin and a key factor involved in chromatid cohesion, and Srebp-1, a transcription factor important for lipid/cholesterol homeostasis. Using AREM, we were able to identify 19,935 Rad21 peaks and 1,748 Srebp-1 peaks in the mouse genome with high confidence, including 1,517 (7.6%) Rad21 peaks and 227 (13%) Srebp-1 peaks that were missed using only uniquely mapped reads. The open source implementation of our algorithm is available at http://sourceforge.net/projects/arem

  13. Role of base stacking and sequence context in the inhibition of yeast DNA polymerase eta by pyrene nucleotide.

    PubMed

    Hwang, Hanshin; Taylor, John-Stephen

    2004-11-23

    The Y family DNA polymerase yeast pol eta inserts pyrene deoxyribose monophosphate (dPMP) in preference to A opposite an abasic site, the 3'-T of a thymine dimer, and a normal T with almost equal efficiency. In contrast, pol A family polymerases such as Klenow fragment and T7 DNA polymerase only insert dPMP efficiently opposite an abasic site and the 3'-T of a thymine dimer but not opposite undamaged DNA. Pyrene nucleotide is also an efficient chain-terminating inhibitor of DNA synthesis by pol eta but not by Klenow fragment or T7 DNA polymerase. To better understand the origin of the efficiency and sequence specificity of dPMP insertion by pol eta, the kinetics of dPMP insertion opposite various templates have been determined. In one sequence context, the efficiency of dPMP insertion increases 4.6-fold opposite G < A < T < C, suggesting that the templating nucleotide modulates dPMP insertion efficiency by having to destack prior to dPTP binding. The efficiency of insertion of dPMP opposite T in the same sequence context increases 7-fold for primers terminating in G < A < C < T and is similar to that observed for nontemplated blunt-end extension, suggesting that stacking interactions between the pyrene and the primer terminus are also important. On heterogeneous templates, the average selectivity for dPMP insertion relative to the complementary dNMP decreases in the order of dAMP > dGMP > dTMP > dCMP, from a high of 5.8 when dAMP is to be inserted following a T to a low of 0.5 when dCMP is to be inserted following a C. The relative preference for dPMP insertion at a given site can be largely explained by the energetic cost of destacking the templating base and stacking of pyrene nucleotide relative to that of stacking and base pairing the complementary nucleotide. Thus, pyrene nucleotide represents a novel class of nucleotide-based chain-terminating DNA synthesis inhibitors whose base portion consists of a hydrophobic, non-hydrogen bonding, base-pair mimic.

  14. Unifying bacteria from decaying wood with various ubiquitous Gibbsiella species as G. acetica sp. nov. based on nucleotide sequence similarities and their acetic acid secretion.

    PubMed

    Geider, Klaus; Gernold, Marina; Jock, Susanne; Wensing, Annette; Völksch, Beate; Gross, Jürgen; Spiteller, Dieter

    2015-12-01

    Bacteria were isolated from necrotic apple and pear tree tissue and from dead wood in Germany and Austria as well as from pear tree exudate in China. They were selected for growth at 37 °C, screened for levan production and then characterized as Gram-negative, facultatively anaerobic rods. Nucleotide sequences from 16S rRNA genes, the housekeeping genes dnaJ, gyrB, recA and rpoB alignments, BLAST searches and phenotypic data confirmed by MALDI-TOF analysis showed that these bacteria belong to the genus Gibbsiella and resembled strains isolated from diseased oaks in Britain and Spain. Gibbsiella-specific PCR primers were designed from the proline isomerase and the levansucrase genes. Acid secretion was investigated by screening for halo formation on calcium carbonate agar and the compound identified by NMR as acetic acid. Its production by Gibbsiella spp. strains was also determined in culture supernatants by GC/MS analysis after derivatization with pentafluorobenzyl bromide. Some strains were differentiated by the PFGE patterns of SpeI digests and by sequence analyses of the lsc and the ppiD genes, and the Chinese Gibbsiella strain was most divergent. The newly investigated bacteria as well as Gibbsiella querinecans, Gibbsiella dentisursi and Gibbsiella papilionis, isolated in Britain, Spain, Korea and Japan, are taxonomically related Enterobacteriaceae, tolerate and secrete acetic acid. We therefore propose to unify them in the species Gibbsiella acetica sp. nov.

  15. The Nucleotide Capture Region of Alpha Hemolysin: Insights into Nanopore Design for DNA Sequencing from Molecular Dynamics Simulations

    PubMed Central

    Manara, Richard M. A.; Tomasio, Susana; Khalid, Syma

    2015-01-01

    Nanopore technology for DNA sequencing is constantly being refined and improved. In strand sequencing a single strand of DNA is fed through a nanopore and subsequent fluctuations in the current are measured. A major hurdle is that the DNA is translocated through the pore at a rate that is too fast for the current measurement systems. An alternative approach is “exonuclease sequencing”, in which an exonuclease is attached to the nanopore that is able to process the strand, cleaving off one base at a time. The bases then flow through the nanopore and the current is measured. This method has the advantage of potentially solving the translocation rate problem, as the speed is controlled by the exonuclease. Here we consider the practical details of exonuclease attachment to the protein alpha hemolysin. We employ molecular dynamics simulations to determine the ideal (a) distance from alpha-hemolysin, and (b) the orientation of the monophosphate nucleotides upon release from the exonuclease such that they will enter the protein. Our results indicate an almost linear decrease in the probability of entry into the protein with increasing distance of nucleotide release. The nucleotide orientation is less significant for entry into the protein.

  16. Complete nucleotide sequence of Rose yellow leaf virus, a new member of the family Tombusviridae

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the Rose yellow leaf virus (RYLV) has been determined to be 3918 nucleotides containing seven open reading frames (ORFs). ORF1 encodes a 27 kDa peptide (p27). ORF2 shares a common start codon with ORF1 and continues through the amber stop codon of p27 to encode a 87 kDa (p87) protein t...

  17. Cloning, sequence, and properties of the soluble pyridine nucleotide transhydrogenase of Pseudomonas fluorescens.

    PubMed Central

    French, C E; Boonstra, B; Bufton, K A; Bruce, N C

    1997-01-01

    The gene encoding the soluble pyridine nucleotide transhydrogenase (STH) of Pseudomonas fluorescens was cloned and expressed in Escherichia coli. STH is related to the flavoprotein disulfide oxidoreductases but lacks one of the conserved redox-active cysteine residues. The gene is highly similar to an E. coli gene of unknown function. PMID:9098078

  18. STRUCTFAST: protein sequence remote homology detection and alignment using novel dynamic programming and profile-profile scoring.

    PubMed

    Debe, Derek A; Danzer, Joseph F; Goddard, William A; Poleksic, Aleksandar

    2006-09-01

    STRUCTFAST is a novel profile-profile alignment algorithm capable of detecting weak similarities between protein sequences. The increased sensitivity and accuracy of the STRUCTFAST method are achieved through several unique features. First, the algorithm utilizes a novel dynamic programming engine capable of incorporating important information from a structural family directly into the alignment process. Second, the algorithm employs a rigorous analytical formula for profile-profile scoring to overcome the limitations of ad hoc scoring functions that require adjustable parameter training. Third, the algorithm employs Convergent Island Statistics (CIS) to compute the statistical significance of alignment scores independently for each pair of sequences. STRUCTFAST routinely produces alignments that meet or exceed the quality obtained by an expert human homology modeler, as evidenced by its performance in the latest CAFASP4 and CASP6 blind prediction benchmark experiments.

  19. Molecular cloning, nucleotide sequence, and expression of a carboxypeptidase-encoding gene from the archaebacterium Sulfolobus solfataricus.

    PubMed Central

    Colombo, S; Toietta, G; Zecca, L; Vanoni, M; Tortora, P

    1995-01-01

    Mammalian metallocarboxypeptidases play key roles in major biological processes, such as digestive-protein degradation and specific proteolytic processing. A Sulfolobus solfataricus gene (cpsA) encoding a recently described zinc carboxypeptidase with an unusually broad substrate specificity was cloned, sequenced, and expressed in Escherichia coli. Despite the lack of overall sequence homology with known carboxypeptidases, seven homology blocks, including the Zn-coordinating and catalytic residues, were identified by multiple alignment with carboxypeptidases A, B, and T. S. solfataricus carboxypeptidase expressed in E. coli was found to be enzymatically active, and both its substrate specificity and thermostability were comparable to those of the purified S. solfataricus enzyme. PMID:7559343

  20. Molecular cloning, nucleotide sequence, and expression of a carboxypeptidase-encoding gene from the archaebacterium Sulfolobus solfataricus.

    PubMed

    Colombo, S; Toietta, G; Zecca, L; Vanoni, M; Tortora, P

    1995-10-01

    Mammalian metallocarboxypeptidases play key roles in major biological processes, such as digestive-protein degradation and specific proteolytic processing. A Sulfolobus solfataricus gene (cpsA) encoding a recently described zinc carboxypeptidase with an unusually broad substrate specificity was cloned, sequenced, and expressed in Escherichia coli. Despite the lack of overall sequence homology with known carboxypeptidases, seven homology blocks, including the Zn-coordinating and catalytic residues, were identified by multiple alignment with carboxypeptidases A, B, and T. S. solfataricus carboxypeptidase expressed in E. coli was found to be enzymatically active, and both its substrate specificity and thermostability were comparable to those of the purified S. solfataricus enzyme.

  1. Identification of essential nucleotides in an upstream repressing sequence of Saccharomyces cerevisiae by selection for increased expression of TRK2.

    PubMed Central

    Vidal, M; Buckley, A M; Yohn, C; Hoeppner, D J; Gaber, R F

    1995-01-01

    The TRK2 gene in Saccharomyces cerevisiae encodes a membrane protein involved in potassium transport and is expressed at extremely low levels. Dominant cis-acting mutations (TRK2D), selected by their ability to confer TRK2-dependent growth on low-potassium medium, identified an upstream repressor element (URS1-TRK2) in the TRK2 promoter. The URS1-TRK2 sequence (5'-AGCCGCACG-3') shares six nucleotides with the ubiquitous URS1 element (5'-AGCCGCCGA-3'), and the protein species binding URS1-CAR1 (URSF) is capable of binding URS1-TRK2 in vitro. Sequence analysis of 17 independent repression-defective TRK2D mutations identified three adjacent nucleotides essential for URS1-mediated repression in vivo. Our results suggest a role for context effects with regard to URS1-related sequences: several mutant alleles of the URS1 element previously reported to have little or no effect when analyzed within the context of a heterologous promoter (CYC1) [Luche, R.M., Sumrada, R. & Cooper, T.G. (1990) Mol. Cell. Biol. 10, 3884-3895] have major effects on repression in the context of their native promoters (TRK2 and CAR1). TRK2D mutations that abolish repression also reveal upstream activating sequence activity either within or adjacent to URS1. Additivity between TRK2D and sin3 delta mutations suggest that SIN3-mediated repression is independent of that mediated by URS1. Images Fig. 1 Fig. 4 PMID:7892273

  2. Nucleotide sequence of the 5' end of araBAD operon messenger RNA in Escherichia coli B/r.

    PubMed

    Lee, N; Carbon, J

    1977-01-01

    The transcription reaction in vitro provides a means of analyzing the nucleotide sequence of the mRNA of the araBAD operon. By controlling the time of synthesis, we obtained araBAD mRNA of varying lengths beginning from the 5' end. These 5' fragments were freed of lambda RNA transcripts by successive hybridizations to the sense strands of a pair of lambda ara transducing phages that carry ara genes in opposite orientations. The purified 5' fragments were ordered by their times of appearance during synchronized RNA elongation and by nearest neighbor analyses. The results, when combined with the knowledge of the NH2-terminal sequence of the product of the first cistron (L-ribulokinase gene araB), establish the nucleotide sequence of the first 69 bases at the 5' end of the araBAD operon mRNA. The AUG starter codon for L-ribulokinase is located at positions 29-31. The sequence is: 5' A-C-C-C-G-U-U-U-U-U-U-U-U-G-G-A-U-G-G-A-G-U-G-A-A-A-C-G-A-U-G-G-C-G-A-U-U-G-C-A-A-U-U-G-G-C-C-U-C-G-A-U-U-U-U-G-C-A-G-U-G-A-U-U-C-U-G-(U)-. . .3'.

  3. Complete nucleotide sequence of the Actinomyces viscosus T14V sialidase gene: presence of a conserved repeating sequence among strains of Actinomyces spp.

    PubMed Central

    Yeung, M K

    1993-01-01

    The nucleotide sequence of the Actinomyces viscosus T14V sialidase gene (nanH) and flanking regions was determined. An open reading frame of 2,703 nucleotides that encodes a predominately hydrophobic protein of 901 amino acids (M(r), 92,871) was identified. The amino acid sequence at the amino terminus of the predicted protein exhibited properties characteristic of a typical leader peptide. Five 12-amino-acid units that shared between 33 and 67% sequence identity were noted within the central domain of the protein. Each unit contained the sequence Ser-X-Asp-X-Gly-X-Thr-Trp, which is conserved among other bacterial and trypanosoma sp. sialidases. Thus, the A. viscosus T14V nanH gene and the other prokaryotic and eukaryotic sialidase genes evolved from a common ancestor. Southern hybridization analyses under conditions of high stringency revealed the existence of DNA sequences homologous to A. viscosus T14V nanH in the genomes of 18 strains of five Actinomyces species that expressed various levels of sialidase activity. The data demonstrate that the sialidase genes from divergent groups of Actinomyces spp. are highly conserved. Images PMID:8418033

  4. An Interpretation of the Ancestral Codon from Miller’s Amino Acids and Nucleotide Correlations in Modern Coding Sequences

    PubMed Central

    Carels, Nicolas; de Leon, Miguel Ponce

    2015-01-01

    Purine bias, which is usually referred to as an “ancestral codon”, is known to result in short-range correlations between nucleotides in coding sequences, and it is common in all species. We demonstrate that RWY is a more appropriate pattern than the classical RNY, and purine bias (Rrr) is the product of a network of nucleotide compensations induced by functional constraints on the physicochemical properties of proteins. Through deductions from universal correlation properties, we also demonstrate that amino acids from Miller’s spark discharge experiment are compatible with functional primeval proteins at the dawn of living cell radiation on earth. These amino acids match the hydropathy and secondary structures of modern proteins. PMID:25922573

  5. Welding-induced alignment distortion in DIP LD packages: effect of laser welding sequence

    NASA Astrophysics Data System (ADS)

    Liu, Wenning; Lin, Yaomin; Shi, Frank G.

    2002-06-01

    In pigtailing of a single mode fiber to a semiconductor laser for optical communication applications, the tolerance for displacement of the fiber relative to the laser is extremely tight, a submicron movement can often lead to a significant misalignment and thus the reduction in the power coupled into the fiber. Among various fiber pigtailing assembly technologies, pulsed laser welding is the method with submicron accuracy and is most conducive to automation. However, the melting-solidification process during laser welding can often distort the pre-achieved fiber-optic alignment. This Welding-Induced-Alignment-Distortion (WIAD) is a serious concern and significantly affects the yield for single mode fiber pigtailing to a semiconductor laser. This work presents a method for predicting WIAD as a function of various processing, laser, tooling and materials parameters. More specifically, the degree of WIAD produced by the laser welding in a dual-in-line laser diode package is predicted for the first time. An optimal welding sequence is obtained for minimizing WIAD.

  6. Alignment of 3D Building Models and TIR Video Sequences with Line Tracking

    NASA Astrophysics Data System (ADS)

    Iwaszczuk, D.; Stilla, U.

    2014-11-01

    Thermal infrared imagery of urban areas became interesting for urban climate investigations and thermal building inspections. Using a flying platform such as UAV or a helicopter for the acquisition and combining the thermal data with the 3D building models via texturing delivers a valuable groundwork for large-area building inspections. However, such thermal textures are useful for further analysis if they are geometrically correctly extracted. This can be achieved with a good coregistrations between the 3D building models and thermal images, which cannot be achieved by direct georeferencing. Hence, this paper presents methodology for alignment of 3D building models and oblique TIR image sequences taken from a flying platform. In a single image line correspondences between model edges and image line segments are found using accumulator approach and based on these correspondences an optimal camera pose is calculated to ensure the best match between the projected model and the image structures. Among the sequence the linear features are tracked based on visibility prediction. The results of the proposed methodology are presented using a TIR image sequence taken from helicopter in a densely built-up urban area. The novelty of this work is given by employing the uncertainty of the 3D building models and by innovative tracking strategy based on a priori knowledge from the 3D building model and the visibility checking.

  7. Ribosomal ITS sequences allow resolution of freshwater sponge phylogeny with alignments guided by secondary structure prediction.

    PubMed

    Itskovich, Valeria; Gontcharov, Andrey; Masuda, Yoshiki; Nohno, Tsutomu; Belikov, Sergey; Efremova, Sofia; Meixner, Martin; Janussen, Dorte

    2008-12-01

    Freshwater sponges include six extant families which belong to the suborder Spongillina (Porifera). The taxonomy of freshwater sponges is problematic and their phylogeny and evolution are not well understood. Sequences of the ribosomal internal transcribed spacers (ITS1 and ITS2) of 11 species from the family Lubomirskiidae, 13 species from the family Spongillidae, and 1 species from the family Potamolepidae were obtained to study the phylogenetic relationships between endemic and cosmopolitan freshwater sponges and the evolution of sponges in Lake Baikal. The present study is the first one where ITS1 sequences were successfully aligned using verified secondary structure models and, in combination with ITS2, used to infer relationships between the freshwater sponges. Phylogenetic trees inferred using maximum likelihood, neighbor-joining, and parsimony methods and Bayesian inference revealed that the endemic family Lubomirskiidae was monophyletic. Our results do not support the monophyly of Spongillidae because Lubomirskiidae formed a robust clade with E. muelleri, and Trochospongilla latouchiana formed a robust clade with the outgroup Echinospongilla brichardi (Potamolepidae). Within the cosmopolitan family Spongillidae the genera Radiospongilla and Eunapius were found to be monophyletic, while Ephydatia muelleri was basal to the family Lubomirskiidae. The genetic distances between Lubomirskiidae species being much lower than those between Spongillidae species are indicative of their relatively recent radiation from a common ancestor. These results indicated that rDNA spacers sequences can be useful in the study of phylogenetic relationships of and the identification of species of freshwater sponges.

  8. IBBOMSA: An Improved Biogeography-based Approach for Multiple Sequence Alignment

    PubMed Central

    Yadav, Rohit Kumar; Banka, Haider

    2016-01-01

    In bioinformatics, multiple sequence alignment (MSA) is an NP-hard problem. Hence, nature-inspired techniques can better approximate the solution. In the current study, a novel biogeography-based optimization (NBBO) is proposed to solve an MSA problem. The biogeography-based optimization (BBO) is a new paradigm for optimization. But, there exists some deficiencies in solving complicated problems such as low population diversity and slow convergence rate. NBBO is an enhanced version of BBO, in which, a new migration operation is proposed to overcome the limitations of BBO. The new migration adopts more information from other habitats, maintains population diversity, and preserves exploitation ability. In the performance analysis, the proposed and existing techniques such as VDGA, MOMSA, and GAPAM are tested on publicly available benchmark datasets (ie, Bali base). It has been observed that the proposed method shows the superiority/competitiveness with the existing techniques. PMID:27812276

  9. Nucleotide sequence of the Klebsiella pneumoniae nifD gene and predicted amino acid sequence of the alpha-subunit of nitrogenase MoFe protein.

    PubMed Central

    Ioannidis, I; Buck, M

    1987-01-01

    The nucleotide sequence of the Klebsiella pneumoniae nifD gene is presented and together with the accompanying paper [Holland, Zilberstein, Zamir & Sussman (1987) Biochem. J. 247, 277-285] completes the sequence of the nifHDK genes encoding the nitrogenase polypeptides. The K. pneumoniae nifD gene encodes the 483-amino acid-residue nitrogenase alpha-subunit polypeptide of Mr 54156. The alpha-subunit has five strongly conserved cysteine residues at positions 63, 89, 155, 184 and 275, some occurring in a region showing both primary sequence and potential structural homology to the K. pneumoniae nitrogenase beta-subunit. A comparison with six other alpha-subunit amino acid sequences has been made, which indicates a number of potentially important domains within alpha-subunits. PMID:3322262

  10. Partition enrichment of nucleotide sequences (PINS)--a generally applicable, sequence based method for enrichment of complex DNA samples.

    PubMed

    Kvist, Thomas; Sondt-Marcussen, Line; Mikkelsen, Marie Just

    2014-01-01

    The dwindling cost of DNA sequencing is driving transformative changes in various biological disciplines including medicine, thus resulting in an increased need for routine sequencing. Preparation of samples suitable for sequencing is the starting point of any practical application, but enrichment of the target sequence over background DNA is often laborious and of limited sensitivity thereby limiting the usefulness of sequencing. The present paper describes a new method, Probability directed Isolation of Nucleic acid Sequences (PINS), for enrichment of DNA, enabling the sequencing of a large DNA region surrounding a small known sequence. A 275,000 fold enrichment of a target DNA sample containing integrated human papilloma virus is demonstrated. Specifically, a sample containing 0.0028 copies of target sequence per ng of total DNA was enriched to 786 copies per ng. The starting concentration of 0.0028 target copies per ng corresponds to one copy of target in a background of 100,000 complete human genomes. The enriched sample was subsequently amplified using rapid genome walking and the resulting DNA sequence revealed not only the sequence of a the truncated virus, but also 1026 base pairs 5' and 50 base pairs 3' to the integration site in chromosome 8. The demonstrated enrichment method is extremely sensitive and selective and requires only minimal knowledge of the sequence to be enriched and will therefore enable sequencing where the target concentration relative to background is too low to allow the use of other sample preparation methods or where significant parts of the target sequence is unknown.

  11. Differentiation of Erysipelothrix rhusiopathiae strains by nucleotide sequence analysis of a hypervariable region in the spaA gene: discrimination of a live vaccine strain from field isolates.

    PubMed

    Nagai, Shinya; To, Ho; Kanda, Akira

    2008-05-01

    Erysipelothrix rhusiopathiae causes erysipelas in swine and is considered a reemerging disease contributing substantially to economic losses in the swine industry. Since an attenuated live vaccine was commercialized in 1974 in Japan, outbreaks of acute septicemia or subacute urticaria of erysipelas have decreased dramatically. In contrast, a chronic form of erysipelas found during meat inspections in slaughterhouses has been increasing. In this study, a new strain-typing method was developed based on nucleotide sequencing of a hypervariable region in the surface protective antigen (spaA) gene for discrimination of the live vaccine strain from field isolates. Sixteen strains isolated from arthritic lesions found in slaughtered pigs were segregated into 4 major patterns: 1) identical nucleotide sequence with the vaccine strain: 3 isolates; 2) 1 nucleotide substitution (C to A) at position 555: 5 isolates; 3) 1 nucleotide substitution at various positions: 5 isolates; and 4) 2 nucleotide substitutions: 3 isolates. Isolates with the same nucleotide sequence as the vaccine strain were further characterized by other properties, including the mouse pathogenicity test. One strain isolated from pigs on a farm where the live vaccine had been used was found to be closely related to the vaccine strain. The phylogenetic tree constructed based on the spaA sequence suggests that the evolutionary distance of the isolates is related to the pathogenicity in mice. The new strain-typing system based on nucleotide sequencing of the spaA region is useful to discriminate the vaccine strain from field isolates.

  12. Nucleotide sequence polymorphism at the apical membrane antigen-1 locus reveals population history of Plasmodium vivax in Thailand

    PubMed Central

    Putaporntip, Chaturong; Jongwutiwes, Somchai; Grynberg, Priscila; Cui, Liwang; Hughes, Austin L.

    2009-01-01

    Apical membrane antigen-1 is a candidate for inclusion in a vaccine for the human malaria parasite Plasmodium vivax. We collected 231 complete sequences of the gene encoding this antigen (pvama-1) from three regions of Thailand, the most extensive collection to date of sequences at this locus. The domain II loop (previously mentioned as a potential vaccine component) was almost completely conserved, with a single amino acid variant (I313R) observed in a single sequence. The 3′ portion of the gene (domain II through the stop codon) showed significantly lower nucleotide diversity than the 5′ portion (start codon through domain I); and a given domain I sequence might be found in a haplotype with more than one domain II sequence. These results imply a hotspot of recombination between domains I and II. We found significant geographic subdivision among the three regions of Thailand (NW, East, and South) in which collections were made in 2007. Numbers of P. vivax infections have experienced overall declines since 1990 in all three regions; but the decline has been most recent in the NW, and there has been a rebound in numbers of infections in the South since 2000. Consistent with population history, amino acid sequence diversity was greatest in the NW. The South, which had by far the lowest sequence diversity of the three regions, showed signs of a population that has expanded from a small number of founders after a bottleneck. PMID:19643205

  13. Determination of the minimal essential nucleotide sequence for diphtheria tox repressor binding by in vitro affinity selection.

    PubMed

    Tao, X; Murphy, J R

    1994-09-27

    The expression of diphtheria toxin in lysogenic toxigenic strains of Corynebacterium diphtheriae is controlled by the heavy metal ion-activated regulatory protein DtxR. In the presence of divalent heavy metal ions, DtxR specifically binds to the diphtheria tox operator and protects a 27-bp interrupted palindromic sequence from DNase I digestion. To determine the consensus DNA sequence for DtxR binding, we have used gel electrophoresis mobility-shift assay and polymerase chain reaction (PCR) amplification for in vitro affinity selection of DNA binding sequences from a universe of 6.9 x 10(10) variants. After 10 rounds of in vitro affinity selection, each round coupled with 30 cycles of PCR amplification, we isolated and characterized a family of DNA sequences that function as DtxR-responsive genetic elements both in vitro and in vivo. Moreover, these DNA sequences were found to bind activated DtxR with an affinity similar to that of the wild-type tox operator. The DNA sequence analysis of 21 unique in vitro affinity-selected binding sites has revealed the minimal essential nucleotide sequence for DtxR binding to be a 9-bp palindrome separated by a single base pair.

  14. Nucleotide sequence of Zygosaccharomyces bailii virus Z: Evidence for +1 programmed ribosomal frameshifting and for assignment to family Amalgaviridae.

    PubMed

    Depierreux, Delphine; Vong, Minh; Nibert, Max L

    2016-06-02

    Zygosaccharomyces bailii virus Z (ZbV-Z) is a monosegmented dsRNA virus that infects the yeast Zygosaccharomyces bailii and remains unclassified to date despite its discovery >20years ago. The previously reported nucleotide sequence of ZbV-Z (GenBank AF224490) encompasses two nonoverlapping long ORFs: upstream ORF1 encoding the putative coat protein and downstream ORF2 encoding the RNA-dependent RNA polymerase (RdRp). The lack of overlap between these ORFs raises the question of how the downstream ORF is translated. After examining the previous sequence of ZbV-Z, we predicted that it contains at least one sequencing error to explain the nonoverlapping ORFs, and hence we redetermined the nucleotide sequence of ZbV-Z, derived from the same isolate of Z. bailii as previously studied, to address this prediction. The key finding from our new sequence, which includes several insertions, deletions, and substitutions relative to the previous one, is that ORF2 in fact overlaps ORF1 in the +1 frame. Moreover, a proposed sequence motif for +1 programmed ribosomal frameshifting, previously noted in influenza A viruses, plant amalgaviruses, and others, is also present in the newly identified ORF1-ORF2 overlap region of ZbV-Z. Phylogenetic analyses provided evidence that ZbV-Z represents a distinct taxon most closely related to plant amalgaviruses (genus Amalgavirus, family Amalgaviridae). We conclude that ZbV-Z is the prototype of a new species, which we propose to assign as type species of a new genus of monosegmented dsRNA mycoviruses in family Amalgaviridae. Comparisons involving other unclassified mycoviruses with RdRps apparently related to those of plant amalgaviruses, and having either mono- or bisegmented dsRNA genomes, are also discussed.

  15. SP-Designer: a user-friendly program for designing species-specific primer pairs from DNA sequence alignments.

    PubMed

    Villard, Pierre; Malausa, Thibaut

    2013-07-01

    SP-Designer is an open-source program providing a user-friendly tool for the design of specific PCR primer pairs from a DNA sequence alignment containing sequences from various taxa. SP-Designer selects PCR primer pairs for the amplification of DNA from a target species on the basis of several criteria: (i) primer specificity, as assessed by interspecific sequence polymorphism in the annealing regions, (ii) the biochemical characteristics of the primers and (iii) the intended PCR conditions. SP-Designer generates tables, detailing the primer pair and PCR characteristics, and a FASTA file locating the primer sequences in the original sequence alignment. SP-Designer is Windows-compatible and freely available from http://www2.sophia.inra.fr/urih/sophia_mart/sp_designer/info_sp_designer.php.

  16. Nucleotide sequences and operon structure of plasmid-borne genes mediating uptake and utilization of raffinose in Escherichia coli.

    PubMed Central

    Aslanidis, C; Schmid, K; Schmitt, R

    1989-01-01

    The plasmid-borne raf operon encodes functions required for inducible uptake and utilization of raffinose by Escherichia coli. Raf functions include active transport (Raf permease), alpha-galactosidase, and sucrose hydrolase, which are negatively controlled by the Raf repressor. We have defined the order and extent of the three structural genes, rafA, rafB, and rafD; these are contained in a 5,284-base-pair nucleotide sequence. By comparisons of derived primary structures with known subunit molecular weights and an N-terminal peptide sequence, rafA was assigned to alpha-galactosidase (708 amino acids), rafB was assigned to Raf permease (425 amino acids), and rafD was assigned to sucrose hydrolase (476 amino acids). Transcription was shown to initiate 13 nucleotides upstream of rafA; a putative promoter, a ribosome-binding site, and a transcription termination signal were identified. Striking similarities between Raf permease and lacY-encoded lactose permease, revealed by high sequence conservation (76%), overlapping substrate specificities, and similar transport kinetics, suggest a common origin of these transport systems. alpha-Galactosidase and sucrose hydrolase are not related to host enzymes but have their counterparts in other species. We propose a modular origin of the raf operon and discuss selective forces that favored the given gene organization also found in the E. coli lac operon. Images PMID:2556373

  17. Nucleotide sequence and infectious cDNA clone of the L1 isolate of Pea seed-borne mosaic potyvirus.

    PubMed

    Olsen, B S; Johansen, I E

    2001-01-01

    The complete nucleotide sequence of Pea seed-borne mosaic potyvirus isolate L1 has been determined from cloned virus cDNA. The PSbMV L1 genome is 9895 nucleotides in length excluding the poly(A) tail. Computer analysis of the sequence revealed a single long open reading frame (ORF) of 9594 nucleotides. The ORF potentially encodes a polyprotein of 3198 amino acids with a deduced Mr of 363537. Nine putative proteolytic cleavage sites were identified by analogy to consensus sequences and genome arrangement in other potyviruses. Two full-length cDNA clones, p35S-L1-4 and p35S-L1-5, were assembled under control of an enhanced 35S promoter and nopaline synthase terminator. Clone p35S-L1-4 was constructed with four introns and p35S-L1-5 with five introns inserted in the cDNA. Clone p35S-L1-4 was unstable in Escherichia coli often resulting in amplification of plasmids with deletions. Clone p35S-L1-5 was stable and apparently less toxic to Escherichia coli resulting in larger bacterial colonies and higher plasmid yield. Both clones were infectious upon mechanical inoculation of plasmid DNA on susceptible pea cultivars Fjord, Scout, and Brutus. Eight pea genotypes resistant to L1 virus were also resistant to the cDNA derived L1 virus. Both native PSbMV L1 and the cDNA derived virus infected Chenopodium quinoa systemically giving rise to characteristic necrotic lesions on uninoculated leaves.

  18. Open reading frame sequencing and structure-based alignment of polypeptides encoded by RT1-Bb, RT1-Ba, RT1-Db, and RT1-Da alleles.

    PubMed

    Ettinger, Ruth A; Moustakas, Antonis K; Lobaton, Suzanne D

    2004-11-01

    MHC class II genes are major genetic components in rats developing autoimmunity. The majority of rat MHC class II sequencing has focused on exon 2, which forms the first external domain. Sequence of the complete open reading frame for rat MHC class II haplotypes and structure-based alignment is lacking. Herein, the complete open reading frame for RT1-Bbeta, RT1-Balpha, RT1-Dbeta, and RT1-Dalpha was sequenced from ten different rat strains, covering eight serological haplotypes, namely a, b, c, d, k, l, n, and u. Each serological haplotype was unique at the nucleotide level of the sequenced RT1-B/D region. Within individual genes, the number of alleles identified was seven, seven, six, and three and the degree of amino-acid polymorphism between allotypes for each gene was 22%, 16%, 19%, and 0.4% for RT1-Bbeta, RT1-Balpha, RT1-Dbeta, and RT1-Dalpha, respectively. The extent and distribution of amino-acid polymorphism was comparable with mouse and human MHC class II. Structure-based alignment identified the beta65-66 deletion, the beta84a insertion, the alpha9a insertion, and the alpha1a-1c insertion in RT1-B previously described for H2-A. Rat allele-specific deletions were found at RT1-Balpha76 and RT1-Dbeta90-92. The mature RT1-Dbeta polypeptide was one amino acid longer than HLA-DRB1 due to the position of the predicted signal peptide cleavage site. These data are important to a comprehensive understanding of MHC class II structure-function and for mechanistic studies of rat models of autoimmunity.

  19. DNA sequencing by a single molecule detection of labeled nucleotides sequentially cleaved from a single strand of DNA

    SciTech Connect

    Goodwin, P.M.; Schecker, J.A.; Wilkerson, C.W.; Hammond, M.L.; Ambrose, W.P.; Jett, J.H.; Martin, J.C.; Marrone, B.L.; Keller, R.A. ); Haces, A.; Shih, P.J.; Harding, J.D. )

    1993-01-01

    We are developing a laser-based technique for the rapid sequencing of large DNA fragments (several kb in size) at a rate of 100 to 1000 bases per second. Our approach relies on fluorescent labeling of the bases in a single fragment of DNA, attachment of this labeled DNA fragment to a support, movement of the supported DNA into a flowing sample stream, sequential cleavage of the end nucleotide from the DNA fragment with an exonuclease, and detection of the individual fluorescently labeled bases by laser-induced fluorescence.

  20. DNA sequencing by a single molecul