Science.gov

Sample records for embl nucleotide sequence

  1. The EMBL Nucleotide Sequence Database.

    PubMed

    Stoesser, Guenter; Baker, Wendy; van den Broek, Alexandra; Camon, Evelyn; Garcia-Pastor, Maria; Kanz, Carola; Kulikova, Tamara; Leinonen, Rasko; Lin, Quan; Lombard, Vincent; Lopez, Rodrigo; Redaschi, Nicole; Stoehr, Peter; Tuli, Mary Ann; Tzouvara, Katerina; Vaughan, Robert

    2002-01-01

    The EMBL Nucleotide Sequence Database (aka EMBL-Bank; http://www.ebi.ac.uk/embl/) incorporates, organises and distributes nucleotide sequences from all available public sources. EMBL-Bank is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis. Major contributors to the EMBL database are individual scientists and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many other specialized databases. For sequence similarity searching, a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.

  2. Seqalert--a daily sequence alertness server for the EMBL and SWISSPROT databases.

    PubMed

    Shomer, B

    1997-10-01

    The aims were to: enable users to deposit complex search profiles against the sequence databases; interface to an independent Sequence Retrieval System (SRS) server through the network to perform these searches on a daily basis through the last day's updates of these databases; mail users the reformatted search results, enabling local usage when loaded by a WWW browser. The deposition of one to many search profiles by the user leads to a daily search of the EMBL and SWISSPROT databases. The search profile is restricted to entries that were deposited during the last 24 h by using the SRS query manager to combine search sets. If the search is successful, the resulting html page is modified from relative URLs to absolute ones, enabling local usage by loading from disk. The results are sent to the user by e-mail.

  3. [Computer programs for the analysis of nucleotide sequences (MALK)].

    PubMed

    Mironov, A A; Aleksandrov, N N; Liunovskaia-Gurova, L V; Kister, A E

    1987-01-01

    A system for the computer analysis of nucleic acid and protein sequences ("Helix") is described. Format of the DNA sequences is EMBL--compatible and may be easily commented with the help of convenient menus. "Helix" has also following possibilities: an effective alignment of gele reading data and formation of the final sequence; simple making of recombined molecules "in calcular"; calculations of nucleotide and dinucleotide distribution along the sequence; looking for coding frames; calculations percentage of codons and amino acids in coding frames; searching for direct and inverted repeats; sequences alignment; protein secondary structure prediction; restriction mapping; DNA--protein translation. "Helix" also contain programs for RNA-structure prediction, looking for homologies throughover the EMAL bank, choosing optimal sequence for probes and searching promoters. All the programs are written at FORTRAN-77 and automatically translated into FORTRAN-4. "Helix" require only 64 kbite.

  4. Automated Identification of Nucleotide Sequences

    NASA Technical Reports Server (NTRS)

    Osman, Shariff; Venkateswaran, Kasthuri; Fox, George; Zhu, Dian-Hui

    2007-01-01

    STITCH is a computer program that processes raw nucleotide-sequence data to automatically remove unwanted vector information, perform reverse-complement comparison, stitch shorter sequences together to make longer ones to which the shorter ones presumably belong, and search against the user s choice of private and Internet-accessible public 16S rRNA databases. ["16S rRNA" denotes a ribosomal ribonucleic acid (rRNA) sequence that is common to all organisms.] In STITCH, a template 16S rRNA sequence is used to position forward and reverse reads. STITCH then automatically searches known 16S rRNA sequences in the user s chosen database(s) to find the sequence most similar to (the sequence that lies at the smallest edit distance from) each spliced sequence. The result of processing by STITCH is the identification of the most similar well-described bacterium. Whereas previously commercially available software for analyzing genetic sequences operates on one sequence at a time, STITCH can manipulate multiple sequences simultaneously to perform the aforementioned operations. A typical analysis of several dozen sequences (length of the order of 103 base pairs) by use of STITCH is completed in a few minutes, whereas such an analysis performed by use of prior software takes hours or days.

  5. Submitting MIGS, MIMS, MIENS Information to EMBL and Standards and the Sequencing Pipelines of the Gordon and Betty Moore Foundation (GSC8 Meeting)

    ScienceCinema

    Vaughan, Bob [EMBL; Kaye, Jon [Gordon and Betty Moore Foundation

    2016-07-12

    The Genomic Standards Consortium was formed in September 2005. It is an international, open-membership working body which promotes standardization in the description of genomes and the exchange and integration of genomic data. The 2009 meeting was an activity of a five-year funding "Research Coordination Network" from the National Science Foundation and was organized held at the DOE Joint Genome Institute with organizational support provided by the JGI and by the University of California - San Diego. Bob Vaughan of EMBL on submitting MIGS/MIMS/MIENS information to EMBL-EBI's system, followed by a brief talk from Jon Kaye of the Gordon and Betty Moore Foundation on standards and the foundation's sequencing pipelines at the Genomic Standards Consortium's 8th meeting at the DOE JGI in Walnut Creek, Calif. on Sept. 9, 2009

  6. Nucleotide sequences encoding a thermostable alkaline protease

    DOEpatents

    Wilson, D.B.; Lao, G.

    1998-01-06

    Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium. 3 figs.

  7. Nucleotide sequences encoding a thermostable alkaline protease

    DOEpatents

    Wilson, David B.; Lao, Guifang

    1998-01-01

    Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium.

  8. Long-range correlations in nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Peng, C.-K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E.

    1992-03-01

    DNA SEQUENCES have been analysed using models, such as an it-step Markov chain, that incorporate the possibility of short-range nucleotide correlations1. We propose here a method for studying the stochastic properties of nucleotide sequences by constructing a 1:1 map of the nucleotide sequence onto a walk, which we term a 'DNA walk'. We then use the mapping to provide a quantitative measure of the correlation between nucleotides over long distances along the DNA chain. Thus we uncover in the nucleotide sequence a remarkably long-range power law correlation that implies a new scale-invariant property of DNA. We find such long-range correlations in intron-containing genes and in nontranscribed regulatory DNA sequences, but not in complementary DNA sequences or intron-less genes.

  9. Long-range correlations in nucleotide sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E.

    1992-01-01

    DNA sequences have been analysed using models, such as an n-step Markov chain, that incorporate the possibility of short-range nucleotide correlations. We propose here a method for studying the stochastic properties of nucleotide sequences by constructing a 1:1 map of the nucleotide sequence onto a walk, which we term a 'DNA walk'. We then use the mapping to provide a quantitative measure of the correlation between nucleotides over long distances along the DNA chain. Thus we uncover in the nucleotide sequence a remarkably long-range power law correlation that implies a new scale-invariant property of DNA. We find such long-range correlations in intron-containing genes and in nontranscribed regulatory DNA sequences, but not in complementary DNA sequences or intron-less genes.

  10. Long-range correlations in nucleotide sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Sciortino, F.; Simons, M.; Stanley, H. E.

    1992-01-01

    DNA sequences have been analysed using models, such as an n-step Markov chain, that incorporate the possibility of short-range nucleotide correlations. We propose here a method for studying the stochastic properties of nucleotide sequences by constructing a 1:1 map of the nucleotide sequence onto a walk, which we term a 'DNA walk'. We then use the mapping to provide a quantitative measure of the correlation between nucleotides over long distances along the DNA chain. Thus we uncover in the nucleotide sequence a remarkably long-range power law correlation that implies a new scale-invariant property of DNA. We find such long-range correlations in intron-containing genes and in nontranscribed regulatory DNA sequences, but not in complementary DNA sequences or intron-less genes.

  11. Nucleotide capacitance calculation for DNA sequencing

    SciTech Connect

    Lu, Jun-Qiang; Zhang, Xiaoguang

    2008-01-01

    Using a first-principles linear response theory, the capacitance of the DNA nucleotides, adenine, cytosine, guanine and thymine, are calculated. The difference in the capacitance between the nucleotides is studied with respect to conformational distortion. The result suggests that although an alternate current capacitance measurement of a single-stranded DNA chain threaded through a nano-gap electrodes may not sufficient to be used as a stand alone method for rapid DNA sequencing, the capacitance of the nucleotides should be taken into consideration in any GHz-frequency electric measurements and may also serve as an additional criterion for identifying the DNA sequence.

  12. The International Nucleotide Sequence Database Collaboration.

    PubMed

    Nakamura, Yasukazu; Cochrane, Guy; Karsch-Mizrachi, Ilene

    2013-01-01

    The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org), one of the longest-standing global alliances of biological data archives, captures, preserves and provides comprehensive public domain nucleotide sequence information. Three partners of the INSDC work in cooperation to establish formats for data and metadata and protocols that facilitate reliable data submission to their databases and support continual data exchange around the world. In this article, the INSDC current status and update for the year of 2012 are presented. Among discussed items of international collaboration meeting in 2012, BioSample database and changes in submission are described as topics.

  13. The International Nucleotide Sequence Database Collaboration

    PubMed Central

    Cochrane, Guy; Karsch-Mizrachi, Ilene; Takagi, Toshihisa; Sequence Database Collaboration, International Nucleotide

    2016-01-01

    The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org) comprises three global partners committed to capturing, preserving and providing comprehensive public-domain nucleotide sequence information. The INSDC establishes standards, formats and protocols for data and metadata to make it easier for individuals and organisations to submit their nucleotide data reliably to public archives. This work enables the continuous, global exchange of information about living things. Here we present an update of the INSDC in 2015, including data growth and diversification, new standards and requirements by publishers for authors to submit their data to the public archives. The INSDC serves as a model for data sharing in the life sciences. PMID:26657633

  14. Accessing and distributing EMBL data using CORBA (common object request broker architecture)

    PubMed Central

    Wang, Lichun; Rodriguez-Tomé, Patricia; Redaschi, Nicole; McNeil, Phil; Robinson, Alan; Lijnzaad, Philip

    2000-01-01

    Background: The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences and related information traditionally made available in flat-file format. Queries through tools such as SRS (Sequence Retrieval System) also return data in flat-file format. Flat files have a number of shortcomings, however, and the resources therefore currently lack a flexible environment to meet individual researchers' needs. The Object Management Group's common object request broker architecture (CORBA) is an industry standard that provides platform-independent programming interfaces and models for portable distributed object-oriented computing applications. Its independence from programming languages, computing platforms and network protocols makes it attractive for developing new applications for querying and distributing biological data. Results: A CORBA infrastructure developed by EMBL-EBI provides an efficient means of accessing and distributing EMBL data. The EMBL object model is defined such that it provides a basis for specifying interfaces in interface definition language (IDL) and thus for developing the CORBA servers. The mapping from the object model to the relational schema in the underlying Oracle database uses the facilities provided by PersistenceTM, an object/relational tool. The techniques of developing loaders and 'live object caching' with persistent objects achieve a smart live object cache where objects are created on demand. The objects are managed by an evictor pattern mechanism. Conclusions: The CORBA interfaces to the EMBL database address some of the problems of traditional flat-file formats and provide an efficient means for accessing and distributing EMBL data. CORBA also provides a flexible environment for users to develop their applications by building clients to our CORBA servers, which can be integrated into existing systems. PMID:11178259

  15. Nucleotide sequence of mouse satellite DNA.

    PubMed Central

    Hörz, W; Altenburger, W

    1981-01-01

    The nucleotide sequence of uncloned mouse satellite DNA has been determined by analyzing Sau96I restriction fragments that correspond to the repeat unit of the satellite DNA. An unambiguous sequence of 234 bp has been obtained. The sequence of the first 250 bases from dimeric satellite fragments present in Sau96I limit digests corresponds almost exactly to two tandemly arranged monomer sequences including a complete Sau96I site in the center. This is in agreement with the hypothesis that a low level of divergence which cannot be detected in sequence analyses of uncloned DNA is responsible for the appearance of dimeric fragments. Most of the sequence of the 5% fraction of Sau96 monomers that are susceptible to TaqI has also been determined and has been found to agree completely with the prototype sequence. The monomer sequence is internally repetitious being composed of eight diverged subrepeats. The divergence pattern has interesting implications for theories on the evolution of mouse satellite DNA. PMID:6261227

  16. Estimation of evolutionary distances between nucleotide sequences.

    PubMed

    Zharkikh, A

    1994-09-01

    A formal mathematical analysis of the substitution process in nucleotide sequence evolution was done in terms of the Markov process. By using matrix algebra theory, the theoretical foundation of Barry and Hartigan's (Stat. Sci. 2:191-210, 1987) and Lanave et al.'s (J. Mol. Evol. 20:86-93, 1984) methods was provided. Extensive computer simulation was used to compare the accuracy and effectiveness of various methods for estimating the evolutionary distance between two nucleotide sequences. It was shown that the multiparameter methods of Lanave et al.'s (J. Mol. Evol. 20:86-93, 1984), Gojobori et al.'s (J. Mol. Evol. 18:414-422, 1982), and Barry and Hartigan's (Stat. Sci. 2:191-210, 1987) are preferable to others for the purpose of phylogenetic analysis when the sequences are long. However, when sequences are short and the evolutionary distance is large, Tajima and Nei's (Mol. Biol. Evol. 1:269-285, 1984) method is superior to others.

  17. Remote access to ACNUC nucleotide and protein sequence databases at PBIL.

    PubMed

    Gouy, Manolo; Delmotte, Stéphane

    2008-04-01

    The ACNUC biological sequence database system provides powerful and fast query and extraction capabilities to a variety of nucleotide and protein sequence databases. The collection of ACNUC databases served by the Pôle Bio-Informatique Lyonnais includes the EMBL, GenBank, RefSeq and UniProt nucleotide and protein sequence databases and a series of other sequence databases that support comparative genomics analyses: HOVERGEN and HOGENOM containing families of homologous protein-coding genes from vertebrate and prokaryotic genomes, respectively; Ensembl and Genome Reviews for analyses of prokaryotic and of selected eukaryotic genomes. This report describes the main features of the ACNUC system and the access to ACNUC databases from any internet-connected computer. Such access was made possible by the definition of a remote ACNUC access protocol and the implementation of Application Programming Interfaces between the C, Python and R languages and this communication protocol. Two retrieval programs for ACNUC databases, Query_win, with a graphical user interface and raa_query, with a command line interface, are also described. Altogether, these bioinformatics tools provide users with either ready-to-use means of querying remote sequence databases through a variety of selection criteria, or a simple way to endow application programs with an extensive access to these databases. Remote access to ACNUC databases is open to all and fully documented (http://pbil.univ-lyon1.fr/databases/acnuc/acnuc.html).

  18. Nucleotide sequence alignment using sparse coding and belief propagation.

    PubMed

    Roozgard, Aminmohammad; Barzigar, Nafise; Wang, Shuang; Jiang, Xiaoqian; Ohno-Machado, Lucila; Cheng, Samuel

    2013-01-01

    Advances in DNA information extraction techniques have led to huge sequenced genomes from organisms spanning the tree of life. This increasing amount of genomic information requires tools for comparison of the nucleotide sequences. In this paper, we propose a novel nucleotide sequence alignment method based on sparse coding and belief propagation to compare the similarity of the nucleotide sequences. We used the neighbors of each nucleotide as features, and then we employed sparse coding to find a set of candidate nucleotides. To select optimum matches, belief propagation was subsequently applied to these candidate nucleotides. Experimental results show that the proposed approach is able to robustly align nucleotide sequences and is competitive to SOAPaligner [1] and BWA [2].

  19. Nucleotide Sequence of the Akv env Gene

    PubMed Central

    Lenz, Jack; Crowther, Robert; Straceski, Anthony; Haseltine, William

    1982-01-01

    The sequence of 2,191 nucleotides encoding the env gene of murine retrovirus Akv was determined by using a molecular clone of the Akv provirus. Deduction of the encoded amino acid sequence showed that a single open reading frame encodes a 638-amino acid precursor to gp70 and p15E. In addition, there is a typical leader sequence preceding the amino terminus of gp70. The locations of potential glycosylation sites and other structural features indicate that the entire gp70 molecule and most of p15E are located on the outer side of the membrane. Internal cleavage of the env precursor to generate gp70 and p15E occurs immediately adjacent to several basic amino acids at the carboxyl terminus of gp70. This cleavage generates a region of 42 uncharged, relatively hydrophobic amino acids at the amino terminus of p15E, which is located in a position analogous to the hydrophobic membrane fusion sequence of influenza virus hemagglutinin. The mature polypeptides are predicted to associate with the membrane via a region of 30 uncharged, mostly hydrophobic amino acids located near the carboxyl terminus of p15E. Distal to this membrane association region is a sequence of 35 amino acids at the carboxyl terminus of the env precursor, which is predicted to be located on the inner side of the membrane. By analogy to Moloney murine leukemia virus, a proteolytic cleavage in this region removes the terminal 19 amino acids, thus generating the carboxyl terminus of p15E. This leaves 15 amino acids at the carboxyl terminus of p15E on the inner side of the membrane in a position to interact with virion cores during budding. The precise location and order of the large RNase T1-resistant oligonucleotides in the env region were determined and compared with those from several leukemogenic viruses of AKR origin. This permitted a determination of how the differences in the leukemogenic viruses affect the primary structure of the env gene products. PMID:6283170

  20. Nucleotide sequence of the pyruvate decarboxylase gene from Zymomonas mobilis.

    PubMed

    Neale, A D; Scopes, R K; Wettenhall, R E; Hoogenraad, N J

    1987-02-25

    Pyruvate decarboxylase (EC 4.1.1.1), the penultimate enzyme in the alcoholic fermentation pathway of Zymomonas mobilis, converts pyruvate to acetaldehyde and carbon dioxide. The complete nucleotide sequence of the structural gene encoding pyruvate decarboxylase from Zymomonas mobilis has been determined. The coding region is 1704 nucleotides long and encodes a polypeptide of 567 amino acids with a calculated subunit mass of 60,790 daltons. The amino acid sequence was confirmed by comparison with the amino acid sequence of a selection of tryptic fragments of the enzyme. The amino acid composition obtained from the nucleotide sequence is in good agreement with that obtained experimentally.

  1. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid sequence...

  2. Nucleotide sequence of the luxC gene encoding fatty acid reductase of the lux operon from Photobacterium leiognathi.

    PubMed

    Lin, J W; Chao, Y F; Weng, S F

    1993-02-26

    The nucleotide sequence of the luxC gene (EMBL Accession No. 65156) encoding fatty acid reductase (FAR) of the lux operon from Photobacterium leiognathi PL741 was determined and the encoded amino acid sequence deduced. The fatty acid reductase is a component of the fatty acid reductase complex. The complex is responsible for converting fatty acid to aldehyde which serves as the substrate in the luciferase-catalyzed bioluminescent reaction. The protein comprises 478 amino acid residues and has a calculated M(r) of 53,858. Alignment and comparison of the fatty acid reductase of P. leiognathi with that of Vibrio harveyi B392 and Vibrio fischeri ATCC 7744 shows that there is 70% and 59% amino acid residues identity, respectively.

  3. Nucleotide sequence of papaya mosaic virus RNA.

    PubMed

    Sit, T L; Abouhaidar, M G; Holy, S

    1989-09-01

    The RNA genome of papaya mosaic virus is 6656 nucleotides long [excluding the poly(A) tail] with six open reading frames (ORFs) more than 200 nucleotides long. The four nearest the 5' end each overlap with adjacent ORFs and could code for proteins with Mr 176307, 26248, 11949 and 7224 (ORFs 1 to 4). The fifth ORF produces the capsid protein of Mr 23043 and the sixth ORF, located completely within ORF1, could code for a protein with Mr 14113. The translation products of ORFs 1 to 3 show strong similarity with those of other potexviruses but the ORF 4 protein has only limited similarity with the other potexvirus ORF 4 proteins of 7K to 11K.

  4. Reading biological processes from nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Murugan, Anand

    Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical

  5. Nucleotide sequence of SHV-2 beta-lactamase gene

    SciTech Connect

    Garbarg-Chenon, A.; Godard, V.; Labia, R.; Nicolas, J.C. )

    1990-07-01

    The nucleotide sequence of plasmid-mediated beta-lactamase SHV-2 from Salmonella typhimurium (SHV-2pHT1) was determined. The gene was very similar to chromosomally encoded beta-lactamase LEN-1 of Klebsiella pneumoniae. Compared with the sequence of the Escherichia coli SHV-2 enzyme (SHV-2E.coli) obtained by protein sequencing, the deduced amino acid sequence of SHV-2pHT1 differed by three amino acid substitutions.

  6. Nucleotide sequences important for translation initiation of enterovirus RNA.

    PubMed Central

    Iizuka, N; Yonekawa, H; Nomoto, A

    1991-01-01

    An infectious cDNA clone was constructed from the genome of coxsackievirus B1 strain. A number of RNA transcripts that have mutations in the 5' noncoding region were synthesized in vitro from the modified cDNA clones and examined for their abilities to act as mRNAs in a cell-free translation system prepared from HeLa S3 cells. RNAs that lack nucleotide sequences at positions 568 to 726 and 565 to 726 were found to be less efficient and inactive mRNAs, respectively. To understand the biological significance of this region of RNA, small deletions and point mutations were introduced in the nucleotide sequence between positions 538 and 601. Except for a nucleotide substitution at 592 (U----C) within the 7-base conserved sequence, mutations introduced in the sequence downstream of position 568 did not affect much, if any, of the ability of RNA to act as mRNA. Except for a point mutation at 558 (C----U), mutations upstream of position 567 appeared to inactivate the mRNA. In the upstream region, a sequence consisting of 21 nucleotides at positions 546 to 566 is perfectly conserved in the 5' noncoding regions of enterovirus and rhinovirus genomes. These results suggest that the 7-base conserved sequence functions to maintain the efficiency of translation initiation and that the nucleotide sequence upstream of position 567, including the 21-base conserved sequence, plays essential roles in translation initiation. A deletion mutant whose genome lacks the nucleotide sequence at positions 568 to 726 showed a small-plaque phenotype and less virulence against suckling mice than the wild-type virus. Thus, reduction of the efficiency of translation initiation may result in the construction of enteroviruses with the lower-virulence phenotype. Images PMID:1651409

  7. The Nucleotide Sequence of the lac Operator

    PubMed Central

    Gilbert, Walter; Maxam, Allan

    1973-01-01

    The lac repressor protects the lac operator against digestion with deoxyribonuclease. The protected fragment is double-stranded and about 27 base-pairs long. We determined the sequence of RNA transcription copies of this fragment and present a sequence for 24 base pairs. It is: 5′--T G G A A T T G T G A G C G G A T A A C A A T T 3′ 3′--A C C T T A A C A C T C G C C T A T T G T T A A 5′ The sequence has 2-fold symmetry regions; the two longest are separated by one turn of the DNA double helix. PMID:4587255

  8. Nucleotide sequence of the coat protein gene of canine parvovirus.

    PubMed Central

    Rhode, S L

    1985-01-01

    The nucleotide sequence of the canine parvovirus (CPV2) from map units 33 to 95 has been determined. This includes the entire coat protein gene and noncoding sequences at the 3' end of the gene, exclusive of the terminal inverted repeat. The predicted capsid protein structures are discussed and compared with those of the rodent parvoviruses H-1 and MVM. PMID:3989914

  9. [Tabular excel editor for analysis of aligned nucleotide sequences].

    PubMed

    Demkin, V V

    2010-01-01

    Excel platform was used for transition of results of multiple aligned nucleotide sequences obtained using the BLAST network service to the form appropriate for visual analysis and editing. Two macros operators for MS Excel 2007 were constructed. The array of aligned sequences transformed into Excel table and processed using macros operators is more appropriate for analysis than initial html data.

  10. Nucleotide sequence of the tobacco (Nicotiana tabacum) anionic peroxidase gene

    SciTech Connect

    Diaz-De-Leon, F.; Klotz, K.L.; Lagrimini, L.M. )

    1993-03-01

    Peroxidases have been implicated in numerous physiological processes including lignification (Grisebach, 1981), wound-healing (Espelie et al., 1986), phenol oxidation (Lagrimini, 1991), pathogen defense (Ye et al., 1990), and the regulation of cell elongation through the formation of interchain covalent bonds between various cell wall polymers (Fry, 1986; Goldberg et al., 1986; Bradley et al., 1992). However, a complete description of peroxidase action in vivo is not available because of the vast number of potential substrates and the existence of multiple isoenzymes. The tobacco anionic peroxidase is one of the better-characterized isoenzymes. This enzyme has been shown to oxidize a number of significant plant secondary compounds in vitro including cinnamyl alcohols, phenolic acids, and indole-3-acetic acid (Maeder, 1980; Lagrimini, 1991). A cDNA encoding the enzyme has been obtained, and this enzyme was shown to be expressed at the highest levels in lignifying tissues (xylem and tracheary elements) and also in epidermal tissue (Lagrimini et al., 1987). It was shown at this time that there were four distinct copies of the anionic peroxidase gene in tobacco (Nicotiana tabacum). A tobacco genomic DNA library was constructed in the [lambda]-phase EMBL3, from which two unique peroxidase genes were sequenced. One of these clones, [lambda]POD1, was designated as a pseudogene when the exonic sequences were found to differ from the cDNA sequences by 1%, and several frame shifts in the coding sequences indicated a dysfunctional gene (the authors' unpublished results). The other clone, [lambda]POD3, described in this manuscript, was designated as the functional tobacco anionic peroxidase gene because of 100% homology with the cDNA. Significant structural elements include an AS-2 box indicated in shoot-specific expression (Lam and Chua, 1989), a TATA box, and two intervening sequences. 10 refs., 1 tab.

  11. Cloning and characterization of a highly repetitive fish nucleotide sequence.

    PubMed

    Datta, U; Dutta, P; Mandal, R K

    1988-01-01

    We have cloned and sequenced a highly repetitive HindIII fragment of DNA from the common carp Cyprinus carpio. It represents a tandemly repeated sequence with a monomeric unit of 245 bp and comprises 8% of the fish genome. Higher units of this monomer appear as a ladder in Southern blots. The monomeric unit has been sequenced; it is A + T-rich with some direct and some inverse-repeat nucleotide clusters.

  12. Nucleotide sequence composition and method for detection of neisseria gonorrhoeae

    SciTech Connect

    Lo, A.; Yang, H.L.

    1990-02-13

    This patent describes a composition of matter that is specific for {ital Neisseria gonorrhoeae}. It comprises: at least one nucleotide sequence for which the ratio of the amount of the sequence which hybridizes to chromosomal DNA of {ital Neisseria gonorrhoeae} to the amount of the sequence which hybridizes to chromosomal DNA of {ital Neisseria meningitidis} is greater than about five. The ratio being obtained by a method described.

  13. Complete nucleotide sequence of tobacco streak virus RNA 3.

    PubMed Central

    Cornelissen, B J; Janssen, H; Zuidema, D; Bol, J F

    1984-01-01

    Double-stranded cDNA of in vitro polyadenylated tobacco streak virus (TSV) RNA 3 has been cloned and sequenced. The complete primary structure of 2,205 nucleotides reveals two open reading frames flanked by a leader sequence of 210 bases, an intercistronic region of 123 nucleotides and a 3'-extracistronic sequence of 288 nucleotides. The 5'-terminal open reading frame codes for a Mr 31,742 protein, which probably corresponds to the only in vitro translation product of TSV RNA 3. The 3'-terminal coding region predicts a Mr 26,346 protein, probably the viral coat protein, which is the translation product of the subgenomic messenger, RNA 4. Although the coat proteins of alfalfa mosaic virus (A1MV) and TSV are functionally equivalent in activating their own and each others genomes, no homology between the primary structures of those two proteins is detectable. PMID:6546793

  14. Nucleotide correlations and electronic transport of DNA sequences

    NASA Astrophysics Data System (ADS)

    Albuquerque, E. L.; Vasconcelos, M. S.; Lyra, M. L.; de Moura, F. A. B. F.

    2005-02-01

    We use a tight-binding formulation to investigate the transmissivity and wave-packet dynamics of sequences of single-strand DNA molecules made up from the nucleotides guanine G , adenine A , cytosine C , and thymine T . In order to reveal the relevance of the underlying correlations in the nucleotides distribution, we compare the results for the genomic DNA sequence with those of two artificial sequences: (i) the Rudin-Shapiro one, which has long-range correlations; (ii) a random sequence, which is a kind of prototype of a short-range correlated system, presented here with the same first-neighbor pair correlations of the human DNA sequence. We found that the long-range character of the correlations is important to the persistence of resonances of finite segments. On the other hand, the wave-packet dynamics seems to be mostly influenced by the short-range correlations.

  15. The complete nucleotide sequence of bean yellow mosaic potyvirus RNA.

    PubMed

    Guyatt, K J; Proll, D F; Menssen, A; Davidson, A D

    1996-01-01

    The complete nucleotide sequence of an Australian strain of bean yellow mosaic virus (BYMV-S) has been determined from cloned viral cDNAs. The BYMV-S genome is 9 547 nucleotides in length excluding a poly(A) tail. Computer analysis of the sequence revealed a single long open reading frame (ORF) of 9168 nucleotides, commencing at position 206 and terminating with UAG at position 9374-6. The ORF potentially encodes a polyprotein of 3056 amino acids with a deduced Mr of 347 409. The 5' and 3' untranslated regions are 205 and 174 nucleotides in length respectively. Alignment of the amino acid sequence of the BYMV-S polyprotein with those of other potyviruses identified nine putative proteolytic cleavage sites. The predicted consensus cleavage site of the BYMV NIa protease was found to differ from that described for other potyviruses. Processing of the BYMV polyprotein at the designated proteolytic cleavage sites would result in a typical potyviral genome arrangement. The amino acid sequences of the putative BYMV encoded proteins were compared to the homologous gene products of twelve individual potyviruses to identify overall and specific regions of amino acid sequence homology.

  16. The nucleotide sequence and genome organization of Plasmopara halstedii virus

    PubMed Central

    2011-01-01

    Background Only very few viruses of Oomycetes have been studied in detail. Isometric virions were found in different isolates of the oomycete Plasmopara halstedii, the downy mildew pathogen of sunflower. However, complete nucleotide sequences and data on the genome organization were lacking. Methods Viral RNA of different P. halstedii isolates was subjected to nucleotide sequencing and analysis of the viral genome. The N-terminal sequence of the viral coat protein was determined using Top-Down MALDI-TOF analysis. Results The complete nucleotide sequences of both single-stranded RNA segments (RNA1 and RNA2) were established. RNA1 consisted of 2793 nucleotides (nt) exclusive its 3' poly(A) tract and a single open-reading frame (ORF1) of 2745 nt. ORF1 was framed by a 5' untranslated region (5' UTR) of 18 nt and a 3' untranslated region (3' UTR) of 30 nt. ORF1 contained motifs of RNA-dependent RNA polymerases (RdRp) and showed similarities to RdRp of Scleropthora macrospora virus A (SmV A) and viruses within the Nodaviridae family. RNA2 consisted of 1526 nt exclusive its 3' poly(A) tract and a second ORF (ORF2) of 1128 nt. ORF2 coded for the single viral coat protein (CP) and was framed by a 5' UTR of 164 nt and a 3' UTR of 234 nt. The deduced amino acid sequence of ORF2 was verified by nano-LC-ESI-MS/MS experiments. Top-Down MALDI-TOF analysis revealed the N-terminal sequence of the CP. The N-terminal sequence represented a region within ORF2 suggesting a proteolytic processing of the CP in vivo. The CP showed similarities to CP of SmV A and viruses within the Tombusviridae family. Fragments of RNA1 (ca. 1.9 kb) and RNA2 (ca. 1.4 kb) were used to analyze the nucleotide sequence variation of virions in different P. halstedii isolates. Viral sequence variation was 0.3% or less regardless of their host's pathotypes, the geographical origin and the sensitivity towards the fungicide metalaxyl. Conclusions The results showed the presence of a single and new virus type in

  17. Nucleotide sequencing and identification of some wild mushrooms.

    PubMed

    Das, Sudip Kumar; Mandal, Aninda; Datta, Animesh K; Gupta, Sudha; Paul, Rita; Saha, Aditi; Sengupta, Sonali; Dubey, Priyanka Kumari

    2013-01-01

    The rDNA-ITS (Ribosomal DNA Internal Transcribed Spacers) fragment of the genomic DNA of 8 wild edible mushrooms (collected from Eastern Chota Nagpur Plateau of West Bengal, India) was amplified using ITS1 (Internal Transcribed Spacers 1) and ITS2 primers and subjected to nucleotide sequence determination for identification of mushrooms as mentioned. The sequences were aligned using ClustalW software program. The aligned sequences revealed identity (homology percentage from GenBank data base) of Amanita hemibapha [CN (Chota Nagpur) 1, % identity 99 (JX844716.1)], Amanita sp. [CN 2, % identity 98 (JX844763.1)], Astraeus hygrometricus [CN 3, % identity 87 (FJ536664.1)], Termitomyces sp. [CN 4, % identity 90 (JF746992.1)], Termitomyces sp. [CN 5, % identity 99 (GU001667.1)], T. microcarpus [CN 6, % identity 82 (EF421077.1)], Termitomyces sp. [CN 7, % identity 76 (JF746993.1)], and Volvariella volvacea [CN 8, % identity 100 (JN086680.1)]. Although out of 8 mushrooms 4 could be identified up to species level, the nucleotide sequences of the rest may be relevant to further characterization. A phylogenetic tree is constructed using Neighbor-Joining method showing interrelationship between/among the mushrooms. The determined nucleotide sequences of the mushrooms may provide additional information enriching GenBank database aiding to molecular taxonomy and facilitating its domestication and characterization for human benefits.

  18. Nucleotide Sequencing and Identification of Some Wild Mushrooms

    PubMed Central

    Das, Sudip Kumar; Mandal, Aninda; Datta, Animesh K.; Gupta, Sudha; Paul, Rita; Saha, Aditi; Sengupta, Sonali; Dubey, Priyanka Kumari

    2013-01-01

    The rDNA-ITS (Ribosomal DNA Internal Transcribed Spacers) fragment of the genomic DNA of 8 wild edible mushrooms (collected from Eastern Chota Nagpur Plateau of West Bengal, India) was amplified using ITS1 (Internal Transcribed Spacers 1) and ITS2 primers and subjected to nucleotide sequence determination for identification of mushrooms as mentioned. The sequences were aligned using ClustalW software program. The aligned sequences revealed identity (homology percentage from GenBank data base) of Amanita hemibapha [CN (Chota Nagpur) 1, % identity 99 (JX844716.1)], Amanita sp. [CN 2, % identity 98 (JX844763.1)], Astraeus hygrometricus [CN 3, % identity 87 (FJ536664.1)], Termitomyces sp. [CN 4, % identity 90 (JF746992.1)], Termitomyces sp. [CN 5, % identity 99 (GU001667.1)], T. microcarpus [CN 6, % identity 82 (EF421077.1)], Termitomyces sp. [CN 7, % identity 76 (JF746993.1)], and Volvariella volvacea [CN 8, % identity 100 (JN086680.1)]. Although out of 8 mushrooms 4 could be identified up to species level, the nucleotide sequences of the rest may be relevant to further characterization. A phylogenetic tree is constructed using Neighbor-Joining method showing interrelationship between/among the mushrooms. The determined nucleotide sequences of the mushrooms may provide additional information enriching GenBank database aiding to molecular taxonomy and facilitating its domestication and characterization for human benefits. PMID:24489501

  19. Method for the detection of specific nucleic acid sequences by polymerase nucleotide incorporation

    DOEpatents

    Castro, Alonso

    2004-06-01

    A method for rapid and efficient detection of a target DNA or RNA sequence is provided. A primer having a 3'-hydroxyl group at one end and having a sequence of nucleotides sufficiently homologous with an identifying sequence of nucleotides in the target DNA is selected. The primer is hybridized to the identifying sequence of nucleotides on the DNA or RNA sequence and a reporter molecule is synthesized on the target sequence by progressively binding complementary nucleotides to the primer, where the complementary nucleotides include nucleotides labeled with a fluorophore. Fluorescence emitted by fluorophores on single reporter molecules is detected to identify the target DNA or RNA sequence.

  20. Complete nucleotide sequences of Nipah virus isolates from Malaysia.

    PubMed

    Chan, Y P; Chua, K B; Koh, C L; Lim, M E; Lam, S K

    2001-09-01

    We have completely sequenced the genomes of two Nipah virus (NiV) isolates, one from the throat secretion and the other from the cerebrospinal fluid (CSF) of the sole surviving encephalitic patient with positive CSF virus isolation in Malaysia. The two genomes have 18246 nucleotides each and differ by only 4 nucleotides. The NiV genome is 12 nucleotides longer than the Hendra virus (HeV) genome and both genomes have identical leader and trailer sequence lengths and hexamer-phasing positions for all their genes. Both NiV and HeV are also very closely related with respect to their genomic end sequences, gene start and stop signals, P gene-editing signals and deduced amino acid sequences of nucleocapsid protein, phosphoprotein, matrix protein, fusion protein, glycoprotein and RNA polymerase. The existing evidence demonstrates a clear need for the creation of a new genus within the subfamily Paramyxovirinae to accommodate the close similarities between NiV and HeV and their significant differences from other members of the subfamily.

  1. Nucleotide sequence and structure of the human apolipoprotein E gene.

    PubMed Central

    Paik, Y K; Chang, D J; Reardon, C A; Davies, G E; Mahley, R W; Taylor, J M

    1985-01-01

    The gene for human apolipoprotein E (apo-E) was selected from a library of cloned genomic DNA by screening with a specific cDNA hybridization probe, and its structure was characterized. The complete nucleotide sequence of the gene as well as 856 nucleotides of the 5' flanking region and 629 nucleotides of the 3' flanking region were determined. Analysis of the sequence showed that the mRNA-encoding region of the apo-E gene consists of four exons separated by three introns. In comparison to the structure of the mRNA, the introns are located in the 5' noncoding region, in the codon for glycine at position -4 of the signal peptide region, and in the codon for arginine at position +61 of the mature protein. The overall lengths of the apo-E gene and its corresponding mRNA are 3597 and 1163 nucleotides, respectively; a mature plasma protein of 299 amino acids is produced by this gene. Examination of the 5' terminus of the gene by S1 nuclease mapping shows apparent multiple transcription initiation sites. The proximal 5' flanking region contains a "TATA box" element as well as two nearby inverted repeat elements. In addition, there are four Alu family sequences associated with the apo-E gene: an Alu sequence located near each end of the gene and two Alu sequences located in the second intron. This knowledge of the structure permits a molecular approach to characterizing the regulation of the apo-E gene. Images PMID:2987927

  2. Complete nucleotide sequence and genome organization of bovine parvovirus.

    PubMed Central

    Chen, K C; Shull, B C; Moses, E A; Lederman, M; Stout, E R; Bates, R C

    1986-01-01

    We determined the complete nucleotide sequence of bovine parvovirus (BPV), an autonomous parvovirus. The sequence is 5,491 nucleotides long. The terminal regions contain nonidentical imperfect palindromic sequences of 150 and 121 nucleotides. In the plus strand, there are three large open reading frames (left ORF, mid ORF, and right ORF) with coding capacities of 729, 255, and 685 amino acids, respectively. As with all parvoviruses studied to date, the left ORF of BPV codes for the nonstructural protein NS-1 and the right ORF codes for the major parts of the three capsid proteins. The mid ORF probably encodes the major part of the nonstructural protein NP-1. There are promoterlike sequences at map units 4.5, 12.8, and 38.7 and polyadenylation signals at map units 61.6, 64.6, and 98.5. BPV has little DNA homology with the defective parvovirus AAV, with the human autonomous parvovirus B19, or with the other autonomous parvoviruses sequenced (canine parvovirus, feline panleukopenia virus, H-1, and minute virus of mice). Even though the overall DNA homology of BPV with other parvoviruses is low, several small regions of high homology are observed when the amino acid sequences encoded by the left and right ORFs are compared. From these comparisons, it can be shown that the evolutionary relationship among the parvoviruses is B19 in equilibrium with AAV in equilibrium with BPV in equilibrium with MVM. The highly conserved amino acid sequences observed among all parvoviruses may be useful in the identification and detection of parvoviruses and in the design of a general parvovirus vaccine. PMID:3783814

  3. Nucleotide-Specific Contrast for DNA Sequencing by Electron Spectroscopy

    PubMed Central

    Schmid, Andreas K.; Davis, Ronald W.

    2016-01-01

    DNA sequencing by imaging in an electron microscope is an approach that holds promise to deliver long reads with low error rates and without the need for amplification. Earlier work using transmission electron microscopes, which use high electron energies on the order of 100 keV, has shown that low contrast and radiation damage necessitates the use of heavy atom labeling of individual nucleotides, which increases the read error rates. Other prior work using scattering electrons with much lower energy has shown to suppress beam damage on DNA. Here we explore possibilities to increase contrast by employing two methods, X-ray photoelectron and Auger electron spectroscopy. Using bulk DNA samples with monomers of each base, both methods are shown to provide contrast mechanisms that can distinguish individual nucleotides without labels. Both spectroscopic techniques can be readily implemented in a low energy electron microscope, which may enable label-free DNA sequencing by direct imaging. PMID:27149617

  4. Nucleotide-Specific Contrast for DNA Sequencing by Electron Spectroscopy.

    PubMed

    Mankos, Marian; Persson, Henrik H J; N'Diaye, Alpha T; Shadman, Khashayar; Schmid, Andreas K; Davis, Ronald W

    2016-01-01

    DNA sequencing by imaging in an electron microscope is an approach that holds promise to deliver long reads with low error rates and without the need for amplification. Earlier work using transmission electron microscopes, which use high electron energies on the order of 100 keV, has shown that low contrast and radiation damage necessitates the use of heavy atom labeling of individual nucleotides, which increases the read error rates. Other prior work using scattering electrons with much lower energy has shown to suppress beam damage on DNA. Here we explore possibilities to increase contrast by employing two methods, X-ray photoelectron and Auger electron spectroscopy. Using bulk DNA samples with monomers of each base, both methods are shown to provide contrast mechanisms that can distinguish individual nucleotides without labels. Both spectroscopic techniques can be readily implemented in a low energy electron microscope, which may enable label-free DNA sequencing by direct imaging.

  5. The nucleotide sequence of the human beta-globin gene.

    PubMed

    Lawn, R M; Efstratiadis, A; O'Connell, C; Maniatis, T

    1980-10-01

    We report the complete nucleotide sequence of the human beta-globin gene. The purpose of this study is to obtain information necessary to study the evolutionary relationships between members of the human beta-like globin gene family and to provide the basis for comparing normal beta-globin genes with those obtained from the DNA of individuals with genetic defects in hemoglobin expression.

  6. The complete nucleotide sequence of pelargonium leaf curl virus.

    PubMed

    McGavin, Wendy J; MacFarlane, Stuart A

    2016-05-01

    Investigation of a tombusvirus isolated from tulip plants in Scotland revealed that it was pelargonium leaf curl virus (PLCV) rather than the originally suggested tomato bushy stunt virus. The complete sequence of the PLCV genome was determined for the first time, revealing it to be 4789 nucleotides in size and to have an organization similar to that of the other, previously described tombusviruses. Primers derived from the sequence were used to construct a full-length infectious clone of PLCV that recapitulates the disease symptoms of leaf curling in systemically infected pelargonium plants.

  7. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2011-07-01 2011-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  8. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  9. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  10. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2012-07-01 2012-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  11. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2014-07-01 2014-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  12. Identification of repeats in DNA sequences using nucleotide distribution uniformity.

    PubMed

    Yin, Changchuan

    2017-01-07

    Repetitive elements are important in genomic structures, functions and regulations, yet effective methods in precisely identifying repetitive elements in DNA sequences are not fully accessible, and the relationship between repetitive elements and periodicities of genomes is not clearly understood. We present an ab initio method to quantitatively detect repetitive elements and infer the consensus repeat pattern in repetitive elements. The method uses the measure of the distribution uniformity of nucleotides at periodic positions in DNA sequences or genomes. It can identify periodicities, consensus repeat patterns, copy numbers and perfect levels of repetitive elements. The results of using the method on different DNA sequences and genomes demonstrate efficacy and accuracy in identifying repeat patterns and periodicities. The complexity of the method is linear with respect to the lengths of the analyzed sequences. The Python programs in this study are freely available to the public upon request or at https://github.com/cyinbox/DNADU. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. Complete nucleotide sequence of primitive vertebrate immunoglobulin light chain genes.

    PubMed Central

    Shamblott, M J; Litman, G W

    1989-01-01

    Antibody to Heterodontus francisci (horned shark) immunoglobulin light chain was used to screen a spleen cDNA expression library, and recombinant clones encoding light chain genes were isolated. The complete sequences of the mature coding regions of two light chain genes in this phylogenetically distant vertebrate have been determined and are reported here. Comparisons of the sequences are consistent with the presence of mammalian-like framework and complementarity-determining regions. The predicted amino acid sequences of the genes are more related to mammalian lambda than to kappa light chains. The nucleotide sequences of the genes are most related to mammalian T-cell antigen receptor beta chain. Heterodontus light chain genes may reflect characteristics of the common ancestor of immunoglobulin and T-cell antigen receptors before its evolutionary diversification. PMID:2499889

  14. Complete nucleotide sequence of primitive vertebrate immunoglobulin light chain genes.

    PubMed

    Shamblott, M J; Litman, G W

    1989-06-01

    Antibody to Heterodontus francisci (horned shark) immunoglobulin light chain was used to screen a spleen cDNA expression library, and recombinant clones encoding light chain genes were isolated. The complete sequences of the mature coding regions of two light chain genes in this phylogenetically distant vertebrate have been determined and are reported here. Comparisons of the sequences are consistent with the presence of mammalian-like framework and complementarity-determining regions. The predicted amino acid sequences of the genes are more related to mammalian lambda than to kappa light chains. The nucleotide sequences of the genes are most related to mammalian T-cell antigen receptor beta chain. Heterodontus light chain genes may reflect characteristics of the common ancestor of immunoglobulin and T-cell antigen receptors before its evolutionary diversification.

  15. Differential direct coding: a compression algorithm for nucleotide sequence data

    PubMed Central

    Vey, Gregory

    2009-01-01

    While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. This issue becomes even more important in light of the recent increase of very large data sets, such as metagenomes. In this article, I propose the Differential Direct Coding algorithm, a general-purpose nucleotide compression protocol that can differentiate between sequence data and auxiliary data by supporting the inclusion of supplementary symbols that are not members of the set of expected nucleotide bases, thereby offering reconciliation between sequence-specific and general-purpose compression strategies. This algorithm permits a sequence to contain a rich lexicon of auxiliary symbols that can represent wildcards, annotation data and special subsequences, such as functional domains or special repeats. In particular, the representation of special subsequences can be incorporated to provide structure-based coding that increases the overall degree of compression. Moreover, supporting a robust set of symbols removes the requirement of wildcard elimination and restoration phases, resulting in a complexity of O(n) for execution time, making this algorithm suitable for very large data sets. Because this algorithm compresses data on the basis of triplets, it is highly amenable to interpretation as a polypeptide at decompression time. Also, an encoded sequence may be further compressed using other existing algorithms, like gzip, thereby maximizing the final degree of compression. Overall, the Differential Direct Coding algorithm can offer a beneficial impact on disk traffic for database queries and other disk-intensive operations. PMID:20157486

  16. Bioinformatics comparison of sulfate-reducing metabolism nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Tremberger, G.; Dehipawala, Sunil; Nguyen, A.; Cheung, E.; Sullivan, R.; Holden, T.; Lieberman, D.; Cheung, T.

    2015-09-01

    The sulfate-reducing bacteria can be traced back to 3.5 billion years ago. The thermodynamics details of the sulfur cycle have been well documented. A recent sulfate-reducing bacteria report (Robator, Jungbluth, et al , 2015 Jan, Front. Microbiol) with Genbank nucleotide data has been analyzed in terms of the sulfite reductase (dsrAB) via fractal dimension and entropy values. Comparison to oil field sulfate-reducing sequences was included. The AUCG translational mass fractal dimension versus ATCG transcriptional mass fractal dimension for the low temperature dsrB and dsrA sequences reported in Reference Thirteen shows correlation R-sq ~ 0.79 , with a probably of about 3% in simulation. A recent report of using Cystathionine gamma-lyase sequence to produce CdS quantum dot in a biological method, where the sulfur is reduced just like in the H2S production process, was included for comparison. The AUCG mass fractal dimension versus ATCG mass fractal dimension for the Cystathionine gamma-lyase sequences was found to have R-sq of 0.72, similar to the low temperature dissimilatory sulfite reductase dsr group with 3% probability, in contrary to the oil field group having R-sq ~ 0.94, a high probable outcome in the simulation. The other two simulation histograms, namely, fractal dimension versus entropy R-sq outcome values, and di-nucleotide entropy versus mono-nucleotide entropy R-sq outcome values are also discussed in the data analysis focusing on low probability outcomes.

  17. Nucleotide sequence and genome organization of canine parvovirus.

    PubMed Central

    Reed, A P; Jones, E V; Miller, T J

    1988-01-01

    The genome of a canine parvovirus isolate strain (CPV-N) was cloned, and the DNA sequence was determined. The entire genome, including ends, was 5,323 nucleotides in length. The terminal repeat at the 3' end of the genome shared similar structural characteristics but limited homology with the rodent parvoviruses. The 5' terminal repeat was not detected in any of the clones. Instead, a region of DNA starting near the capsid gene stop codon and extending 248 base pairs into the coding region had been duplicated and inserted 75 base pairs downstream from the poly(A) addition site. Consensus sequences for the 5' donor and 3' acceptor sites as well as promotors and poly(A) addition sites were identified and compared with the available information on related parvoviruses. The genomic organization of CPV-N is similar to that of feline parvovirus (FPV) in that there are two major open reading frames (668 and 722 amino acids) in the plus strand (mRNA polarity). Both coding domains are in the same frame, and no significant open reading frames were apparent in any of the other frames of both minus and plus DNA strands. The nucleotide and amino acid homologies of the capsid genes between CPV-N and FPV were 98 and 99%, respectively. In contrast, the nucleotide and amino acid homologies of the capsid genes for CPV-N and CPV-b (S. Rhode III, J. Virol. 54:630-633, 1985) were 95 and 98%, respectively. These results indicate that very few nucleotide or amino acid changes differentiate the antigenic and host range specificity of FPV and CPV. PMID:2824850

  18. Nucleotide sequence of the Rhodospirillum rubrum atp operon.

    PubMed Central

    Falk, G; Hampe, A; Walker, J E

    1985-01-01

    The nucleotide sequence was determined of a 8775-base-pair region of DNA cloned from the photosynthetic non-sulphur bacterium Rhodospirillum rubrum. It contains a cluster of five genes encoding F1-ATPase subunits. The genes are arranged in the same order as F1 genes in the Escherichia coli unc operon. However, as in the related organism Rhodopseudomonas blastica, neither genes for components of F0, the membrane sector of ATP synthase, nor a homologue of the E. coli uncI gene are associated with this locus, as they are in E. coli. Images Fig. 2. PMID:2861810

  19. A new bioinformatics analysis tools framework at EMBL-EBI.

    PubMed

    Goujon, Mickael; McWilliam, Hamish; Li, Weizhong; Valentin, Franck; Squizzato, Silvano; Paern, Juri; Lopez, Rodrigo

    2010-07-01

    The EMBL-EBI provides access to various mainstream sequence analysis applications. These include sequence similarity search services such as BLAST, FASTA, InterProScan and multiple sequence alignment tools such as ClustalW, T-Coffee and MUSCLE. Through the sequence similarity search services, the users can search mainstream sequence databases such as EMBL-Bank and UniProt, and more than 2000 completed genomes and proteomes. We present here a new framework aimed at both novice as well as expert users that exposes novel methods of obtaining annotations and visualizing sequence analysis results through one uniform and consistent interface. These services are available over the web and via Web Services interfaces for users who require systematic access or want to interface with customized pipe-lines and workflows using common programming languages. The framework features novel result visualizations and integration of domain and functional predictions for protein database searches. It is available at http://www.ebi.ac.uk/Tools/sss for sequence similarity searches and at http://www.ebi.ac.uk/Tools/msa for multiple sequence alignments.

  20. Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns

    PubMed Central

    Amir, Amnon; McDonald, Daniel; Navas-Molina, Jose A.; Kopylova, Evguenia; Morton, James T.; Zech Xu, Zhenjiang; Kightley, Eric P.; Thompson, Luke R.; Hyde, Embriette R.; Gonzalez, Antonio

    2017-01-01

    ABSTRACT High-throughput sequencing of 16S ribosomal RNA gene amplicons has facilitated understanding of complex microbial communities, but the inherent noise in PCR and DNA sequencing limits differentiation of closely related bacteria. Although many scientific questions can be addressed with broad taxonomic profiles, clinical, food safety, and some ecological applications require higher specificity. Here we introduce a novel sub-operational-taxonomic-unit (sOTU) approach, Deblur, that uses error profiles to obtain putative error-free sequences from Illumina MiSeq and HiSeq sequencing platforms. Deblur substantially reduces computational demands relative to similar sOTU methods and does so with similar or better sensitivity and specificity. Using simulations, mock mixtures, and real data sets, we detected closely related bacterial sequences with single nucleotide differences while removing false positives and maintaining stability in detection, suggesting that Deblur is limited only by read length and diversity within the amplicon sequences. Because Deblur operates on a per-sample level, it scales to modern data sets and meta-analyses. To highlight Deblur’s ability to integrate data sets, we include an interactive exploration of its application to multiple distinct sequencing rounds of the American Gut Project. Deblur is open source under the Berkeley Software Distribution (BSD) license, easily installable, and downloadable from https://github.com/biocore/deblur. IMPORTANCE Deblur provides a rapid and sensitive means to assess ecological patterns driven by differentiation of closely related taxa. This algorithm provides a solution to the problem of identifying real ecological differences between taxa whose amplicons differ by a single base pair, is applicable in an automated fashion to large-scale sequencing data sets, and can integrate sequencing runs collected over time. PMID:28289731

  1. The nucleotide sequence of a nematode vitellogenin gene.

    PubMed Central

    Spieth, J; Denison, K; Zucker, E; Blumenthal, T

    1985-01-01

    The nematode, Caenorhabditis elegans, contains a family of six genes that code for vitellogenins. Here we report the complete nucleotide sequence of one of these genes, vit-5. The gene specifies a mRNA of 4869 nucleotides, including untranslated regions of 9 bases at the 5' end and 51 bases at the 3' end. Vit-5 contains four short introns totalling 218 bp. The predicted vitellogenin, yp170A, has a molecular weight of 186,430. At its N terminus it is clearly related to the vitellogenins of vertebrates. However, the vit-5-encoded protein does not contain a serine-rich sequence related to the vertebrate vitellin, phosvitin. In fact, the amino acid composition of the nematode protein is very similar to that of the vertebrate protein without phosvitin. Vit-5 has a highly asymmetric codon choice dictionary. The favored codons are different from those favored in other organisms, but are characteristic of highly expressed C. elegans genes. The strong selection against rare codons is not as great near the 5' end of the gene; rare codons are 15 times more frequent within the first 54 bp than in the next 4.8 kb. PMID:3855245

  2. Nucleotide-Specific Contrast for DNA Sequencing by Electron Spectroscopy

    DOE PAGES

    Mankos, Marian; Persson, Henrik H. J.; N’Diaye, Alpha T.; ...

    2016-05-05

    DNA sequencing by imaging in an electron microscope is an approach that holds promise to deliver long reads with low error rates and without the need for amplification. Earlier work using transmission electron microscopes, which use high electron energies on the order of 100 keV, has shown that low contrast and radiation damage necessitates the use of heavy atom labeling of individual nucleotides, which increases the read error rates. Other prior work using scattering electrons with much lower energy has shown to suppress beam damage on DNA. Here we explore possibilities to increase contrast by employing two methods, X-ray photoelectronmore » and Auger electron spectroscopy. Using bulk DNA samples with monomers of each base, both methods are shown to provide contrast mechanisms that can distinguish individual nucleotides without labels. In conclusion, both spectroscopic techniques can be readily implemented in a low energy electron microscope, which may enable label-free DNA sequencing by direct imaging.« less

  3. The complete nucleotide sequence of chrysanthemum stem necrosis virus.

    PubMed

    Dullemans, A M; Verhoeven, J Th J; Kormelink, R; van der Vlugt, R A A

    2015-02-01

    The complete genome sequence of chrysanthemum stem necrosis virus (CSNV) was determined using Roche 454 next-generation sequencing. CSNV is a tentative member of the genus Tospovirus within the family Bunyaviridae, whose members are arthropod-borne. This is the first report of the entire RNA genome sequence of a CSNV isolate. The large RNA of CSNV is 8955 nucleotides (nt) in size and contains a single open reading frame of 8625 nt in the antisense arrangement, coding for the putative RNA-dependent RNA polymerase (L protein) of 2874 aa with a predicted Mr of 331 kDa. Two untranslated regions of 397 and 33 nt are present at the 5' and 3' termini, respectively. The medium (M) and small (S) RNAs are 4830 and 2947 nt in size, respectively, and show 99 % identity to the corresponding genomic segments of previously partially characterized CSNV genomes. Protein sequences for the precursor of the Gn/Gc proteins, N and NSs, are identical in length in all of the analysed CSNV isolates.

  4. Generalized Levy-walk model for DNA nucleotide sequences

    NASA Technical Reports Server (NTRS)

    Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Simons, M.; Stanley, H. E.

    1993-01-01

    We propose a generalized Levy walk to model fractal landscapes observed in noncoding DNA sequences. We find that this model provides a very close approximation to the empirical data and explains a number of statistical properties of genomic DNA sequences such as the distribution of strand-biased regions (those with an excess of one type of nucleotide) as well as local changes in the slope of the correlation exponent alpha. The generalized Levy-walk model simultaneously accounts for the long-range correlations in noncoding DNA sequences and for the apparently paradoxical finding of long subregions of biased random walks (length lj) within these correlated sequences. In the generalized Levy-walk model, the lj are chosen from a power-law distribution P(lj) varies as lj(-mu). The correlation exponent alpha is related to mu through alpha = 2-mu/2 if 2 < mu < 3. The model is consistent with the finding of "repetitive elements" of variable length interspersed within noncoding DNA.

  5. Nucleotide sequences specific to Yersinia pestis and methods for the detection of Yersinia pestis

    DOEpatents

    McCready, Paula M [Tracy, CA; Radnedge, Lyndsay [San Mateo, CA; Andersen, Gary L [Berkeley, CA; Ott, Linda L [Livermore, CA; Slezak, Thomas R [Livermore, CA; Kuczmarski, Thomas A [Livermore, CA; Motin, Vladinir L [League City, TX

    2009-02-24

    Nucleotide sequences specific to Yersinia pestis that serve as markers or signatures for identification of this bacterium were identified. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  6. Nucleotide sequences specific to Francisella tularensis and methods for the detection of Francisella tularensis

    DOEpatents

    McCready, Paula M.; Radnedge, Lyndsay; Andersen, Gary L.; Ott, Linda L.; Slezak, Thomas R.; Kuczmarski, Thomas A.; Vitalis, Elizabeth A

    2009-02-24

    Described herein is the identification of nucleotide sequences specific to Francisella tularensis that serves as a marker or signature for identification of this bacterium. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  7. Nucleotide sequences specific to Francisella tularensis and methods for the detection of Francisella tularensis

    DOEpatents

    McCready, Paula M.; Radnedge, Lyndsay; Andersen, Gary L.; Ott, Linda L.; Slezak, Thomas R.; Kuczmarski, Thomas A.; Vitalis, Elizabeth A

    2007-02-06

    Described herein is the identification of nucleotide sequences specific to Francisella tularensis that serves as a marker or signature for identification of this bacterium. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  8. Nucleotide sequences specific to Brucella and methods for the detection of Brucella

    DOEpatents

    McCready, Paula M.; Radnedge, Lyndsay; Andersen, Gary L.; Ott, Linda L.; Slezak, Thomas R.; Kuczmarski, Thomas A.

    2009-02-24

    Nucleotide sequences specific to Brucella that serves as a marker or signature for identification of this bacterium were identified. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  9. Nucleotide sequence of the chicken 5-aminolevulinate synthase gene.

    PubMed Central

    Maguire, D J; Day, A R; Borthwick, I A; Srivastava, G; Wigley, P L; May, B K; Elliott, W H

    1986-01-01

    5-Aminolevulinate synthase, the first and rate-controlling enzyme of heme biosynthesis, is regulated in the liver by the end-product heme. To study this negative control mechanism, we have isolated the chicken gene for ALA-synthase and determined the nucleotide sequence. The structural gene is 6.9 kb long and contains 10 exons. The transcriptional start site for ALA-synthase was determined by primer extension analysis. A fragment of 291 bp from the 5' flanking region including 34 bp of the first exon shows promoter activity when introduced upstream of a chicken histone H2B gene and injected into the nuclei of Xenopus laevis oocytes. Images PMID:3005973

  10. Complete nucleotide sequence of a native plasmid from Brevibacterium linens.

    PubMed

    Moore, Mathew; Svenson, Charles; Bowling, David; Glenn, Dianne

    2003-03-01

    Brevibacterium linens has commercial significance in the dairy industry and potential application in the production of bacteriocins and carotenoids. Strain development of these industrially significant organisms would be facilitated by the use of vectors, yet few are available. In this study we report the isolation of four novel plasmids from the Gram-positive coryneform B. linens, and determine the first complete nucleotide sequence of a native plasmid of B. linens. The cryptic plasmid pLIM is 7610 bp in length, and belongs to a subfamily of theta replicating ColE2-related plasmids. Initial investigation suggests that replication in pLIM requires two replicases, a primase (RepA) and a DNA binding protein (RepB), encoded by a single operon repAB. The origin of replication is located upstream of repAB transcription.

  11. Base sequence context effects on nucleotide excision repair.

    PubMed

    Cai, Yuqin; Patel, Dinshaw J; Broyde, Suse; Geacintov, Nicholas E

    2010-08-23

    Nucleotide excision repair (NER) plays a critical role in maintaining the integrity of the genome when damaged by bulky DNA lesions, since inefficient repair can cause mutations and human diseases notably cancer. The structural properties of DNA lesions that determine their relative susceptibilities to NER are therefore of great interest. As a model system, we have investigated the major mutagenic lesion derived from the environmental carcinogen benzo[a]pyrene (B[a]P), 10S (+)-trans-anti-B[a]P-N(2)-dG in six different sequence contexts that differ in how the lesion is positioned in relation to nearby guanine amino groups. We have obtained molecular structural data by NMR and MD simulations, bending properties from gel electrophoresis studies, and NER data obtained from human HeLa cell extracts for our six investigated sequence contexts. This model system suggests that disturbed Watson-Crick base pairing is a better recognition signal than a flexible bend, and that these can act in concert to provide an enhanced signal. Steric hinderance between the minor groove-aligned lesion and nearby guanine amino groups determines the exact nature of the disturbances. Both nearest neighbor and more distant neighbor sequence contexts have an impact. Regardless of the exact distortions, we hypothesize that they provide a local thermodynamic destabilization signal for repair.

  12. Nucleotide sequence of the hemolysin I gene from Actinobacillus pleuropneumoniae.

    PubMed Central

    Frey, J; Meier, R; Gygi, D; Nicolet, J

    1991-01-01

    The DNA sequence of the gene encoding the structural protein of hemolysin I (HlyI) of Actinobacillus pleuropneumoniae serotype 1 strain 4074 was analyzed. The nucleotide sequence shows a 3,072-bp reading frame encoding a protein of 1,023 amino acids with a calculated molecular size of 110.1 kDa. This corresponds to the HlyI protein, which has an apparent molecular size on sodium dodecyl sulfate gels of 105 kDa. The structure of the protein derived from the DNA sequence shows three hydrophobic regions in the N-terminal part of the protein, 13 glycine-rich domains in the second half of the protein, and a hydrophilic C-terminal area, all of which are typical of the cytotoxins of the RTX (repeats in the structural toxin) toxin family. The derived amino acid sequence of HlyI shows 42% homology with the hemolysin of A. pleuropneumoniae serotype 5, 41% homology with the leukotoxin of Pasteurella haemolytica, and 56% homology with the Escherichia coli alpha-hemolysin. The 13 glycine-rich repeats and three hydrophobic areas of the HlyI sequence show more similarity to the E. coli alpha-hemolysin than to either the A. pleuropneumoniae serotype 5 hemolysin or the leukotoxin (while the last two are more similar to each other). Two types of RTX hemolysins therefore seem to be present in A. pleuropneumoniae, one (HlyI) resembling the alpha-hemolysin and a second more closely related to the leukotoxin. Ca(2+)-binding experiments using HlyI and recombinant A. pleuropneumoniae prohemolysin (HlyIA) that was produced in E. coli shows that HlyI binds 45Ca2+, probably because of the 13 glycine-rich repeated domains. Activation of the prohemolysin is not required for Ca2+ binding. Images PMID:1879928

  13. [Nucleotide sequence of genes for alpha- and beta-subunits of luciferase from Photobacterium leiognathi].

    PubMed

    Illarionov, B A; Protopopova, M V; Karginov, V A; Mertvetsov, N P; Gitel'zon, I I

    1988-03-01

    Nucleotide sequence of the Photobacterium leiognathi DNA containing genes of alpha and beta subunits of luciferase has been determined. We also deduced amino acid sequence and molecular mass of luciferase and localized luciferase genes in the sequenced DNA fragment.

  14. Recognizing nucleotides by cross-tunneling currents for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Bagci, V. M. K.; Kaun, Chao-Cheng

    2011-07-01

    Using first-principles calculations, we study electron transport through nucleotides inside a rectangular nanogap formed by two pairs of gold electrodes which are perpendicular and parallel to the nucleobase plane. We propose that this setup will enhance the nucleotide selectivity of tunneling signals to a great extent. Information from three electrical probing processes offers full nucleotide recognition, which survives the noise from neighboring nucleotides and configuration fluctuations.

  15. Cloning and nucleotide sequence of the gene encoding the 136-kilodalton surface protein (muramidase-released protein) of Streptococcus suis type 2.

    PubMed Central

    Smith, H E; Vecht, U; Gielkens, A L; Smits, M A

    1992-01-01

    We cloned and sequenced the gene encoding the muramidase-released protein (MRP) of a pathogenic Streptococcus suis type 2 strain to determine whether its amino acid sequence resembles that of proteins with known functions and to determine its function in virulence. The complete nucleotide sequence composing the gene and the regions flanking it was determined. The deduced amino acid sequence revealed the presence of a signal peptide at the N terminus and a cell envelope anchor at the C terminus, both of which resembled similar regions in several other surface proteins from gram-positive bacteria. The processed form of MRP has a length of 1,209 amino acids and a calculated molecular weight of 131,094. A highly repetitive region preceded the envelope anchor. The repeated units were preceded by a proline-rich stretch of amino acids (26 of 86). No overall homologies were observed between the amino acid sequence of MRP and protein sequences in the EMBL data bank. A particular region within the amino acid sequence, however, showed some similarity with the fibronectin-binding protein of Staphylococcus aureus. Binding of MRP to human fibronectin, however, could not be confirmed. Images PMID:1587602

  16. Nucleotide sequence of the human N-myc gene

    SciTech Connect

    Stanton, L.W.; Schwab, M.; Bishop, J.M.

    1986-03-01

    Human neuroblastomas frequently display amplification and augmented expression of a gene known as N-myc because of its similarity to the protooncogene c-myc. It has therefore been proposed that N-myc is itself a protooncogene, and subsequent tests have shown that N-myc and c-myc have similar biological activities in cell culture. The authors have now detailed the kinship between N-myc and c-myc by determining the nucleotide sequence of human N-myc and deducing the amino acid sequence of the protein encoded by the gene. The topography of N-myc is strikingly similar to that of c-myc: both genes contain three exons of similar lengths; the coding elements of both genes are located in the second and third exons; and both genes have unusually long 5' untranslated regions in their mRNAs, with features that raise the possibility that expression of the genes may be subject to similar controls of translation. The resemblance between the proteins encoded by N-myc and c-myc sustains previous suspicions that the genes encode related functions.

  17. Nucleotide sequence from the coding region of rabbit β-globin messenger RNA

    PubMed Central

    Proudfoot, N.J.

    1976-01-01

    A sequence of 89 nucleotides from rabbit β-globin mRNA has been determined and is shown to code for residues 107 to 137 of the β-globin protein. In addition, a sequence heterogeneity has been identified within this 89 nucleotide long sequence which corresponds to a known polymorphic variant of rabbit β-globin. Images PMID:61580

  18. The ChEMBL database in 2017.

    PubMed

    Gaulton, Anna; Hersey, Anne; Nowotka, Michał; Bento, A Patrícia; Chambers, Jon; Mendez, David; Mutowo, Prudence; Atkinson, Francis; Bellis, Louisa J; Cibrián-Uhalte, Elena; Davies, Mark; Dedman, Nathan; Karlsson, Anneli; Magariños, María Paula; Overington, John P; Papadatos, George; Smit, Ines; Leach, Andrew R

    2017-01-04

    ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 and 2014 Nucleic Acids Research Database Issues. Since then, alongside the continued extraction of data from the medicinal chemistry literature, new sources of bioactivity data have also been added to the database. These include: deposited data sets from neglected disease screening; crop protection data; drug metabolism and disposition data and bioactivity data from patents. A number of improvements and new features have also been incorporated. These include the annotation of assays and targets using ontologies, the inclusion of targets and indications for clinical candidates, addition of metabolic pathways for drugs and calculation of structural alerts. The ChEMBL data can be accessed via a web-interface, RDF distribution, data downloads and RESTful web-services.

  19. The ChEMBL database in 2017

    PubMed Central

    Gaulton, Anna; Hersey, Anne; Nowotka, Michał; Bento, A. Patrícia; Chambers, Jon; Mendez, David; Mutowo, Prudence; Atkinson, Francis; Bellis, Louisa J.; Cibrián-Uhalte, Elena; Davies, Mark; Dedman, Nathan; Karlsson, Anneli; Magariños, María Paula; Overington, John P.; Papadatos, George; Smit, Ines; Leach, Andrew R.

    2017-01-01

    ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 and 2014 Nucleic Acids Research Database Issues. Since then, alongside the continued extraction of data from the medicinal chemistry literature, new sources of bioactivity data have also been added to the database. These include: deposited data sets from neglected disease screening; crop protection data; drug metabolism and disposition data and bioactivity data from patents. A number of improvements and new features have also been incorporated. These include the annotation of assays and targets using ontologies, the inclusion of targets and indications for clinical candidates, addition of metabolic pathways for drugs and calculation of structural alerts. The ChEMBL data can be accessed via a web-interface, RDF distribution, data downloads and RESTful web-services. PMID:27899562

  20. Spatially localized generation of nucleotide sequence-specific DNA damage

    PubMed Central

    Oh, Dennis H.; King, Brett A.; Boxer, Steven G.; Hanawalt, Philip C.

    2001-01-01

    Psoralens linked to triplex-forming oligonucleotides (psoTFOs) have been used in conjunction with laser-induced two-photon excitation (TPE) to damage a specific DNA target sequence. To demonstrate that TPE can initiate photochemistry resulting in psoralen–DNA photoadducts, target DNA sequences were incubated with psoTFOs to form triple-helical complexes and then irradiated in liquid solution with pulsed 765-nm laser light, which is half the quantum energy required for conventional one-photon excitation, as used in psoralen + UV A radiation (320–400 nm) therapy. Target DNA acquired strand-specific psoralen monoadducts in a light dose-dependent fashion. To localize DNA damage in a model tissue-like medium, a DNA–psoTFO mixture was prepared in a polyacrylamide gel and then irradiated with a converging laser beam targeting the rear of the gel. The highest number of photoadducts formed at the rear while relatively sparing DNA at the front of the gel, demonstrating spatial localization of sequence-specific DNA damage by TPE. To assess whether TPE treatment could be extended to cells without significant toxicity, cultured monolayers of normal human dermal fibroblasts were incubated with tritium-labeled psoralen without TFO to maximize detectable damage and irradiated by TPE. DNA from irradiated cells treated with psoralen exhibited a 4- to 7-fold increase in tritium activity relative to untreated controls. Functional survival assays indicated that the psoralen–TPE treatment was not toxic to cells. These results demonstrate that DNA damage can be simultaneously manipulated at the nucleotide level and in three dimensions. This approach for targeting photochemical DNA damage may have photochemotherapeutic applications in skin and other optically accessible tissues. PMID:11572980

  1. Nucleotide sequence of a cloned woodchuck hepatitis virus genome: comparison with the hepatitis B virus sequence.

    PubMed Central

    Galibert, F; Chen, T N; Mandart, E

    1982-01-01

    The complete nucleotide sequence of a woodchuck hepatitis virus genome cloned in Escherichia coli was determined by the method of Maxam and Gilbert. This sequence was found to be 3,308 nucleotides long. Potential ATG initiator triplets and nonsense codons were identified and used to locate regions with a substantial coding capacity. A striking similarity was observed between the organization of human hepatitis B virus and woodchuck hepatitis virus. Nucleotide sequences of these open regions in the woodchuck virus were compared with corresponding regions present in hepatitis B virus. This allowed the location of four viral genes on the L strand and indicated the absence of protein coded by the S strand. Evolution rates of the various parts of the genome as well as of the four different proteins coded by hepatitis B virus and woodchuck hepatitis virus were compared. These results indicated that: (i) the core protein has evolved slightly less rapidly than the other proteins; and (ii) when a region of DNA codes for two different proteins, there is less freedom for the DNA to evolve and, moreover, one of the proteins can evolve more rapidly than the other. A hairpin structure, very well conserved in the two genomes, was located in the only region devoid of coding function, suggesting the location of the origin of replication of the viral DNA. Images PMID:7086958

  2. Complete nucleotide sequence of a monopartite Begomovirus and associated satellites infecting Carica papaya in Nepal.

    PubMed

    Shahid, M S; Yoshida, S; Khatri-Chhetri, G B; Briddon, R W; Natsuaki, K T

    2013-06-01

    Carica papaya (papaya) is a fruit crop that is cultivated mostly in kitchen gardens throughout Nepal. Leaf samples of C. papaya plants with leaf curling, vein darkening, vein thickening, and a reduction in leaf size were collected from a garden in Darai village, Rampur, Nepal in 2010. Full-length clones of a monopartite Begomovirus, a betasatellite and an alphasatellite were isolated. The complete nucleotide sequence of the Begomovirus showed the arrangement of genes typical of Old World begomoviruses with the highest nucleotide sequence identity (>99 %) to an isolate of Ageratum yellow vein virus (AYVV), confirming it as an isolate of AYVV. The complete nucleotide sequence of betasatellite showed greater than 89 % nucleotide sequence identity to an isolate of Tomato leaf curl Java betasatellite originating from Indonesian. The sequence of the alphasatellite displayed 92 % nucleotide sequence identity to Sida yellow vein China alphasatellite. This is the first identification of these components in Nepal and the first time they have been identified in papaya.

  3. The nucleotide sequences of 5S ribosomal RNAs from four Bryophyta-species.

    PubMed Central

    Katoh, K; Hori, H; Osawa, S

    1983-01-01

    The nucleotide sequences of cytoplasmic 5S rRNA from four bryophytes, Marchantia polymorpha, Lophocolea heterophylla, Plagiomnium trichomanes and Anthoceros punctatus have been determined. These RNAs are 119 nucleotides long except for the Anthoceros RNA that has 118 nucleotides. Their sequences are highly similar to each other (91-99% identity) and are more related to those from seed plants (78-83% identity) than to those from green algae (61-73% identity). PMID:6571698

  4. Nucleotide sequences of the cylindrical inclusion protein genes of two Japanese zucchini yellow mosaic virus isolates.

    PubMed

    Kundu, A K; Ohshima, K; Sako, N; Yaegashi, H

    1999-02-01

    The nucleotide sequences of the cylindrical inclusion protein (CIP) genes of two Japanese zucchini yellow mosaic virus (ZYMV) isolates (ZYMV-169 and ZYMV-M) were determined. The CIP genes of both isolates comprised 1902 nucleotides and encoded 634 amino acids containing consensus nucleotide binding motif. The sequence similarities between the two isolates at the nucleotide and amino acid levels were 91% and 98%, respectively. When the CIP gene sequences of the Japanese ZYMV isolates were compared with those of previously reported ZYMV isolates, the nucleotide and amino acid sequence similarities ranged between 81% and 97%, and between 95% and 97%, respectively. Phylogenetic analysis of the deduced amino acid sequences of the CIP genes indicated that the Japanese ZYMV isolates were closely related to those of other ZYMV isolates.

  5. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  6. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  7. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  8. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  9. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  10. Complete nucleotide sequence of a potyvirus causing maize dwarf mosaic disease in central China.

    PubMed

    Liu, X; Wang, X; Zhao, Y; Zheng, C; Zhou, G

    2003-01-01

    The full-length nucleotide sequence of a potyvirus causing the maize dwarf mosaic (MDM) disease in Henan province, central China, was obtained by reverse transcription-polymerase chain reaction (RT-PCR) and rapid amplification of the cDNA 5'-end (5'-RACE). The viral genome comprised of 9596 nucleotides except the polyA tail and encoded a putative polyprotein of 3603 amino acids. The entire genomic sequence of this isolate shared identities of 94.2% and 98.3% with Sugarcane mosaic virus (SCMV) HZ isolate at the nucleotide and deduced amino acid levels, respectively, but only a 69.1% identity with MDM virus (MDMV) Bulgarian isolate (MDMV-Bg) at the nucleotide level. Phylogenetical tree analysis of the complete nucleotide sequences indicated that the Henan isolate of a potyvirus causing MDM disease is in fact a Henan strain of SCMV (SCMV-HN).

  11. Nucleotide sequence of the Lactococcus lactis NCDO 763 (ML3) rpoD gene.

    PubMed

    Gansel, X; Hartke, A; Boutibonnes, P; Auffray, Y

    1993-10-19

    The complete nucleotide sequence of rpoD gene from Lactococcus lactis has been determined. The nucleotide data have indicated the presence of an open reading frame of 1020 base pairs encoding a polypeptide which shares the framework structure for principal sigma factors of eubacteria strains.

  12. Nucleotide sequence of a lysine transfer ribonucleic Acid from bakers' yeast.

    PubMed

    Madison, J T; Boguslawski, S J; Teetor, G H

    1972-05-12

    The nucleotide sequence of one of the two major lysine transfer RNA's from bakers' yeast has been determined. Its structure is compared to that of a lysine tRNA from a haploid yeast. A total of 21 nucleotides differ in the two molecules. Only the T-psi-C-G (thymidine-pseudouridine-cytidine-guanosine) loop and its supporting stem are identical.

  13. Variation in the nucleotide sequence of a prolamin gene family in wild rice.

    PubMed

    Barbier, P; Ishihama, A

    1990-07-01

    Variation in the DNA sequence of the 10 kDa prolamin gene family within the wild rice species Oryza rufipogon was probed using the direct sequencing of PCR-amplified genes. A comparison of the nucleotide and deduced amino-acid sequences of eight Asian strains of O. rufipogon and one strain of the related African species O. longistaminata is presented.

  14. The EMBL-EBI bioinformatics web and programmatic tools framework.

    PubMed

    Li, Weizhong; Cowley, Andrew; Uludag, Mahmut; Gur, Tamer; McWilliam, Hamish; Squizzato, Silvano; Park, Young Mi; Buso, Nicola; Lopez, Rodrigo

    2015-07-01

    Since 2009 the EMBL-EBI Job Dispatcher framework has provided free access to a range of mainstream sequence analysis applications. These include sequence similarity search services (https://www.ebi.ac.uk/Tools/sss/) such as BLAST, FASTA and PSI-Search, multiple sequence alignment tools (https://www.ebi.ac.uk/Tools/msa/) such as Clustal Omega, MAFFT and T-Coffee, and other sequence analysis tools (https://www.ebi.ac.uk/Tools/pfa/) such as InterProScan. Through these services users can search mainstream sequence databases such as ENA, UniProt and Ensembl Genomes, utilising a uniform web interface or systematically through Web Services interfaces (https://www.ebi.ac.uk/Tools/webservices/) using common programming languages, and obtain enriched results with novel visualisations. Integration with EBI Search (https://www.ebi.ac.uk/ebisearch/) and the dbfetch retrieval service (https://www.ebi.ac.uk/Tools/dbfetch/) further expands the usefulness of the framework. New tools and updates such as NCBI BLAST+, InterProScan 5 and PfamScan, new categories such as RNA analysis tools (https://www.ebi.ac.uk/Tools/rna/), new databases such as ENA non-coding, WormBase ParaSite, Pfam and Rfam, and new workflow methods, together with the retirement of depreciated services, ensure that the framework remains relevant to today's biological community. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Complete nucleotide sequence of the 23S rRNA gene of the Cyanobacterium, Anacystis nidulans.

    PubMed Central

    Douglas, S E; Doolittle, W F

    1984-01-01

    The nucleotide sequence of the Anacystis nidulans 23S rRNA gene, including the 5'- and 3'-flanking regions has been determined. The gene is 2876 nucleotides long and shows higher primary sequence homology to the 23S rRNAs of plastids (84.5%) than to that of E. coli (79%). The predicted rRNA transcript also shares many secondary structural features with those of plastids, reinforcing the endosymbiont hypothesis for the origin of these organelles. PMID:6326060

  16. Statistical analysis of nucleotide sequences of the hemagglutinin gene of human influenza A viruses.

    PubMed Central

    Ina, Y; Gojobori, T

    1994-01-01

    To examine whether positive selection operates on the hemagglutinin 1 (HA1) gene of human influenza A viruses (H1 subtype), 21 nucleotide sequences of the HA1 gene were statistically analyzed. The nucleotide sequences were divided into antigenic and nonantigenic sites. The nucleotide diversities for antigenic and nonantigenic sites of the HA1 gene were computed at synonymous and nonsynonymous sites separately. For nonantigenic sites, the nucleotide diversities were larger at synonymous sites than at nonsynonymous sites. This is consistent with the neutral theory of molecular evolution. For antigenic sites, however, the nucleotide diversities at nonsynonymous sites were larger than those at synonymous sites. These results suggest that positive selection operates on antigenic sites of the HA1 gene of human influenza A viruses (H1 subtype). PMID:8078892

  17. FASH: A web application for nucleotides sequence search

    PubMed Central

    Veksler-Lublinksy, Isana; Barash, Danny; Avisar, Chai; Troim, Einav; Chew, Paul; Kedem, Klara

    2008-01-01

    FASH (Fourier Alignment Sequence Heuristics) is a web application, based on the Fast Fourier Transform, for finding remote homologs within a long nucleic acid sequence. Given a query sequence and a long text-sequence (e.g, the human genome), FASH detects subsequences within the text that are remotely-similar to the query. FASH offers an alternative approach to Blast/Fasta for querying long RNA/DNA sequences. FASH differs from these other approaches in that it does not depend on the existence of contiguous seed-sequences in its initial detection phase. The FASH web server is user friendly and very easy to operate. FASH can be accessed at (secured website) PMID:18505581

  18. Nucleotide sequence of Neurospora crassa cytoplasmic initiator tRNA.

    PubMed Central

    Gillum, A M; Hecker, L I; Silberklang, M; Schwartzbach, S D; RajBhandary, U L; Barnett, W E

    1977-01-01

    Initiator methionine tRNA from the cytoplasm of Neurospora crassa has been purified and sequenced. The sequence is: pAGCUGCAUm1GGCGCAGCGGAAGCGCM22GCY*GGGCUCAUt6AACCCGGAGm7GU (or D) - CACUCGAUCGm1AAACGAG*UUGCAGCUACCAOH. Similar to initiator tRNAs from the cytoplasm of other eukaryotes, this tRNA also contains the sequence -AUCG- instead of the usual -TphiCG (or A)- found in loop IV of other tRNAs. The sequence of the N. crassa cytoplasmic initiator tRNA is quite different from that of the corresponding mitochondrial initiator tRNA. Comparison of the sequence of N. crassa cytoplasmic initiator tRNA to those of yeast, wheat germ and vertebrate cytoplasmic initiator tRNA indicates that the sequences of the two fungal tRNAs are no more similar to each other than they are to those of other initiator tRNAs. Images PMID:146192

  19. Complete nucleotide sequence of a new isolate of passion fruit woodiness virus from Western Australia.

    PubMed

    Fukumoto, Tomohiro; Nakamura, Masayuki; Wylie, Stephen J; Chiaki, Yuya; Iwai, Hisashi

    2013-08-01

    We determined the complete genome sequence of the passion fruit woodiness virus Gld-1 isolate (PWV-Gld-1) from Australia and compared it with that of PWV-MU-2, another Australian isolate of PWV. The genomes shared high sequence identity in both the complete nucleotide sequence and the ORF amino acid sequence. All of the cleavage sites of each protein were identical to those of MU-2, and the sequence identity for the individual proteins ranged from 97.2 % to 100.0 %. However, the 5' untranslated region (5'UTR) of the Gld-1 isolate shared only 46.8 % sequence identity with that of PWV-MU-2 and was 177 nucleotides shorter. Re-sequencing of the 5'UTR of MU-2 revealed that the 5' end of the original sequence includes an artifact generated by deep sequencing.

  20. RNA Secondary Structures Having a Compatible Sequence of Certain Nucleotide Ratios.

    PubMed

    Barrett, Christopher L; Li, Thomas J X; Reidys, Christian M

    2016-11-01

    Given a random RNA secondary structure, S, we study RNA sequences having fixed ratios of nucleotides that are compatible with S. We perform this analysis for RNA secondary structures subject to various base-pairing rules and minimum arc- and stack-length restrictions. Our main result reads as follows: in the simplex of nucleotide ratios, there exists a convex region, in which, in the limit of long sequences, a random structure asymptotically almost surely (a.a.s.) has compatible sequence with these ratios and outside of which a.a.s. a random structure has no such compatible sequence. We localize this region for RNA secondary structures subject to various base-pairing rules and minimum arc- and stack-length restrictions. In particular, for GC-sequences (GC denoting the nucleotides guanine and cytosine, respectively) having a ratio of G nucleotides smaller than 1/3, a random RNA secondary structure without any minimum arc- and stack-length restrictions has a.a.s. no such compatible sequence. For sequences having a ratio of G nucleotides larger than 1/3, a random RNA secondary structure has a.a.s. such compatible sequences. We discuss our results in the context of various families of RNA structures.

  1. Cloning and nucleotide sequence of the aroA gene of Bordetella pertussis.

    PubMed Central

    Maskell, D J; Morrissey, P; Dougan, G

    1988-01-01

    The aroA locus of Bordetella pertussis, encoding 5-enolpyruvylshikimate 3-phosphate synthase, has been cloned into Escherichia coli by using a cosmid vector. The gene is expressed in E. coli and complemented an E. coli aroA mutant. The nucleotide sequence of the B. pertussis aroA gene was determined and contains an open reading frame encoding 442 amino acids, with a calculated molecular weight for 5-enolpyruvylshikimate 3-phosphate synthase of 46,688. The amino acid sequence derived from the nucleotide sequence shows homology with the published amino acid sequences of aroA gene products of other microorganisms. PMID:2897356

  2. Isolation and complete nucleotide sequence of the measles virus IMB-1 strain in China.

    PubMed

    Ma, Shao-hui; Wang, Li-chun; Liu, Jian-sheng; Shi, Hai-jing; Liu, Long-ding; Li, Qi-han

    2010-12-01

    The complete nucleotide sequence of the measles virus strain IMB-1, which was isolated in China, was determined. As in other measles viruses, its genome is 15,894 nucleotides in length and encodes six proteins. The full-length nucleotide sequence of the IMB-1 isolate differed from vaccine strains (including wild-type Edmonston strain) by 4%-5% at the nucleotide sequence level. This isolate has amino acid variations over the full genome, including in the hemagglutinin and fusion genes. This report is the first to describe the full-length genome of a genotype H1 strain and provide an overview of the diversity of genetic characteristics of a circulating measles virus.

  3. Nucleotide sequence and genetic organization of Hungarian grapevine chrome mosaic nepovirus RNA2.

    PubMed Central

    Brault, V; Hibrand, L; Candresse, T; Le Gall, O; Dunez, J

    1989-01-01

    The complete nucleotide sequence of hungarian grapevine chrome mosaic nepovirus (GCMV) RNA2 has been determined. The RNA sequence is 4441 nucleotides in length, excluding the poly(A) tail. A polyprotein of 1324 amino acids with a calculated molecular weight of 146 kDa is encoded in a single long open reading frame extending from nucleotides 218 to 4190. This polyprotein is homologous with the protein encoded by the S strain of tomato black ring virus (TBRV) RNA2, the only other nepovirus sequenced so far. Direct sequencing of the viral coat protein and in vitro translation of transcripts derived from cDNA sequences demonstrate that, as for comoviruses, the coat protein is located at the carboxy terminus of the polyprotein. A model for the expression of GCMV RNA2 is presented. Images PMID:2798129

  4. Insertion sites and the terminal nucleotide sequences of the Tn4 transposon.

    PubMed

    Hyde, D R; Tu, C P

    1982-07-10

    The nucleotide sequences at the ends of the Tn4 transposon (mercury spectinomycin and sulfonamide resistance) have been determined. They are inverted repeated sequences of 38 nucleotides with three mismatched base pairs. These sequences are strongly homologous with the terminal sequences of Tn501 (mercury resistance) but less so with those of Tn3 (ampicillin resistance). The Tn4 transposon generates pentanucleotide members (Tn3, Tn1000, Tn501, Tn551, IS2) with the exception of Tn1721 and bacteriophage Mu. Among the three Tn4 insertion sites examined here, two of them occurred near a nonanucleotide sequence in perfect homology with part of the terminal inverted-repeat sequence of Tn4 and the third insertion occurred near a sequence of partial homology to one end of Tn4. All three insertions were in the same orientation such that IRb is proximal to its homologous sequence on the recipient DNA.

  5. Quantum Point Contact Single-Nucleotide Conductance for DNA and RNA Sequence Identification.

    PubMed

    Afsari, Sepideh; Korshoj, Lee E; Abel, Gary R; Khan, Sajida; Chatterjee, Anushree; Nagpal, Prashant

    2017-10-06

    Several nanoscale electronic methods have been proposed for high-throughput single-molecule nucleic acid sequence identification. While many studies display a large ensemble of measurements as "electronic fingerprints" with some promise for distinguishing the DNA and RNA nucleobases (adenine, guanine, cytosine, thymine, and uracil), important metrics such as accuracy and confidence of base calling fall well below the current genomic methods. Issues such as unreliable metal-molecule junction formation, variation of nucleotide conformations, insufficient differences between the molecular orbitals responsible for single-nucleotide conduction, and lack of rigorous base calling algorithms lead to overlapping nanoelectronic measurements and poor nucleotide discrimination, especially at low coverage on single molecules. Here, we demonstrate a technique for reproducible conductance measurements on conformation-constrained single nucleotides and an advanced algorithmic approach for distinguishing the nucleobases. Our quantum point contact single-nucleotide conductance sequencing (QPICS) method uses combed and electrostatically bound single DNA and RNA nucleotides on a self-assembled monolayer of cysteamine molecules. We demonstrate that by varying the applied bias and pH conditions, molecular conductance can be switched ON and OFF, leading to reversible nucleotide perturbation for electronic recognition (NPER). We utilize NPER as a method to achieve >99.7% accuracy for DNA and RNA base calling at low molecular coverage (∼12×) using unbiased single measurements on DNA/RNA nucleotides, which represents a significant advance compared to existing sequencing methods. These results demonstrate the potential for utilizing simple surface modifications and existing biochemical moieties in individual nucleobases for a reliable, direct, single-molecule, nanoelectronic DNA and RNA nucleotide identification method for sequencing.

  6. Diverse nucleotide compositions and sequence fluctuation in Rubisco protein genes

    NASA Astrophysics Data System (ADS)

    Holden, Todd; Dehipawala, S.; Cheung, E.; Bienaime, R.; Ye, J.; Tremberger, G., Jr.; Schneider, P.; Lieberman, D.; Cheung, T.

    2011-10-01

    The Rubisco protein-enzyme is arguably the most abundance protein on Earth. The biology dogma of transcription and translation necessitates the study of the Rubisco genes and Rubisco-like genes in various species. Stronger correlation of fractal dimension of the atomic number fluctuation along a DNA sequence with Shannon entropy has been observed in the studied Rubisco-like gene sequences, suggesting a more diverse evolutionary pressure and constraints in the Rubisco sequences. The strategy of using metal for structural stabilization appears to be an ancient mechanism, with data from the porphobilinogen deaminase gene in Capsaspora owczarzaki and Monosiga brevicollis. Using the chi-square distance probability, our analysis supports the conjecture that the more ancient Rubisco-like sequence in Microcystis aeruginosa would have experienced very different evolutionary pressure and bio-chemical constraint as compared to Bordetella bronchiseptica, the two microbes occupying either end of the correlation graph. Our exploratory study would indicate that high fractal dimension Rubisco sequence would support high carbon dioxide rate via the Michaelis- Menten coefficient; with implication for the control of the whooping cough pathogen Bordetella bronchiseptica, a microbe containing a high fractal dimension Rubisco-like sequence (2.07). Using the internal comparison of chi-square distance probability for 16S rRNA (~ E-22) versus radiation repair Rec-A gene (~ E-05) in high GC content Deinococcus radiodurans, our analysis supports the conjecture that high GC content microbes containing Rubisco-like sequence are likely to include an extra-terrestrial origin, relative to Deinococcus radiodurans. Similar photosynthesis process that could utilize host star radiation would not compete with radiation resistant process from the biology dogma perspective in environments such as Mars and exoplanets.

  7. Complete nucleotide sequences of a distinct bipartite begomovirus, bitter gourd yellow vein virus, infecting Momordica charantia.

    PubMed

    Tahir, Muhammad; Haider, Muhammad Saleem; Briddon, Rob W

    2010-11-01

    Momordica charantia (Cucurbitaceae), a vegetable crop commonly cultivated throughout Pakistan, and begomoviruses, a serious threat to crop plants, are natives of tropical and subtropical regions of the world. Leaf samples of M. charantia with yellow vein symptoms typical of begomovirus infections and samples from apparently healthy plants were collected from areas around Lahore in 2004. Full-length clones of a bipartite begomovirus were isolated from symptomatic samples. The complete nucleotide sequences of the components of one isolate were determined, and these showed the arrangement of genes typical of Old World begomoviruses. The complete nucleotides sequence of DNA A showed the highest nucleotide sequence identity (86.9%) to an isolate of Tomato leaf curl New Delhi virus (ToLCNDV), confirming it to belong to a distinct species of begomovirus, for which the name Bitter gourd yellow vein virus (BGYVV) is proposed. Sequence comparisons showed that BGYVV likely emerged as a result of inter-specific recombination between ToLCNDV and tomato leaf curl Bangladesh virus (ToLCBDV). The complete nucleotide sequence of DNA B showed 97.2% nucleotide sequence identity to that of an Indian strain of Squash leaf curl China virus.

  8. Classification of nucleotide sequences using support vector machines.

    PubMed

    Seo, Tae-Kun

    2010-10-01

    Species identification is one of the most important issues in biological studies. Due to recent increases in the amount of genomic information available and the development of DNA sequencing technologies, the applicability of using DNA sequences to identify species (commonly referred to as "DNA barcoding") is being tested in many areas. Several methods have been suggested to identify species using DNA sequences, including similarity scores, analysis of phylogenetic and population genetic information, and detection of species-specific sequence patterns. Although these methods have demonstrated good performance under a range of circumstances, they also have limitations, as they are subject to loss of information, require intensive computation and are sensitive to model mis-specification, and can be difficult to evaluate in terms of the significance of identification. Here, we suggest a new DNA barcoding method in which support vector machine (SVM) procedures are adopted. Our new method is nonparametric and thus is expected to be robust for a wide range of evolutionary scenarios as well as multilocus analyses. Furthermore, we describe bootstrap procedures that can be used to test the significances of species identifications. We implemented a novel conversion technique for transforming sequence data to real-valued vectors, and therefore, bootstrap procedures can be easily combined with our SVM approach. In this study, we present the results of simulation studies and empirical data analyses to demonstrate the performance of our method and discuss its properties.

  9. Nature and distribution of feline sarcoma virus nucleotide sequences.

    PubMed Central

    Frankel, A E; Gilbert, J H; Porzig, K J; Scolnick, E M; Aaronson, S A

    1979-01-01

    The genomes of three independent isolates of feline sarcoma virus (FeSV) were compared by molecular hybridization techniques. Using complementary DNAs prepared from two strains, SM- and ST-FeSV, common complementary DNA'S were selected by sequential hybridization to FeSV and feline leukemia virus RNAs. These DNAs were shown to be highly related among the three independent sarcoma virus isolates. FeSV-specific complementary DNAs were prepared by selection for hybridization by the homologous FeSV RNA and against hybridization by fline leukemia virus RNA. Sarcoma virus-specific sequences of SM-FeSV were shown to differ from those of either ST- or GA-FeSV strains, whereas ST-FeSV-specific DNA shared extensive sequence homology with GA-FeSV. By molecular hybridization, each set of FeSV-specific sequences was demonstrated to be present in normal cat cellular DNA in approximately one copy per haploid genome and was conserved throughout Felidae. In contrast, FeSV-common sequences were present in multiple DNA copies and were found only in Mediterranean cats. The present results are consistent with the concept that each FeSV strain has arisen by a mechanism involving recombination between feline leukemia virus and cat cellular DNA sequences, the latter represented within the cat genome in a manner analogous to that of a cellular gene. PMID:225544

  10. Nucleotide sequence of the Agrobacterium tumefaciens octopine Ti plasmid-encoded tmr gene.

    PubMed Central

    Heidekamp, F; Dirkse, W G; Hille, J; van Ormondt, H

    1983-01-01

    The nucleotide sequence of the tmr gene, encoded by the octopine Ti plasmid from Agrobacterium tumefaciens (pTiAch5), was determined. The T-DNA, which encompasses this gene, is involved in tumor formation and maintenance, and probably mediates the cytokinin-independent growth of transformed plant cells. The nucleotide sequence of the tmr gene displays a continuous open reading frame specifying a polypeptide chain of 240 amino acids. The 5'- terminus of the polyadenylated tmr mRNA isolated from octopine tobacco tumor cell lines was determined by nuclease S1 mapping. The nucleotide sequence 5'-TATAAAA-3', which sequence is identical to the canonical "TATA" box, was found 29 nucleotides upstream from the major initiation site for RNA synthesis. Two potential polyadenylation signals 5'-AATAAA-3' were found at 207 and 275 nucleotides downstream from the TAG stopcodon of the tmr gene. A comparison was made of nucleotide stretches, involved in transcription control of T-DNA genes. Images PMID:6312414

  11. The nucleotide sequence of tomato mottle virus, a new geminivirus isolated from tomatoes in Florida.

    PubMed

    Abouzid, A M; Polston, J E; Hiebert, E

    1992-12-01

    A new geminivirus, tomato mottle virus (TMoV), affecting tomato production in Florida has been cloned and sequenced. Sequence analysis of the cloned replicative forms of TMoV revealed four potential coding regions for the A component [2601 nucleotides (nt)] and two for the B component (2541 nt). Comparisons of the nucleotide sequence of the TMoV genome with those of other whitefly-transmitted geminiviruses indicate that TMoV is a typical bipartite geminivirus of the New World and is closely related to but distinct from abutilon mosaic virus.

  12. Nucleotide sequences of 5S rRNAs from four jellyfishes.

    PubMed

    Hori, H; Ohama, T; Kumazaki, T; Osawa, S

    1982-11-25

    The nucleotide sequences of 5S rRNAs from four jellyfishes, Spirocodon saltatrix, Nemopsis dofleini, Aurelia aurita and Chrysaora quinquecirrha have been determined. The sequences are highly similar to each other. A fairly high similarity was also found between these jellyfishes and a sea anemone, Anthopleura japonica.

  13. Should nucleotide sequence analyzing computer algorithms always extend homologies by extending homologies?

    PubMed

    Burnett, L; Basten, A; Hensley, W J

    1986-01-10

    Most computer algorithms used for comparing or aligning nucleotide sequences rely on the premise that the best way to extend a homology between the two sequences is to select a match rather than a mismatch. We have tested this assumption and found that it is not always valid.

  14. Nucleotide sequences of 5S rRNAs from four jellyfishes.

    PubMed Central

    Hori, H; Ohama, T; Kumazaki, T; Osawa, S

    1982-01-01

    The nucleotide sequences of 5S rRNAs from four jellyfishes, Spirocodon saltatrix, Nemopsis dofleini, Aurelia aurita and Chrysaora quinquecirrha have been determined. The sequences are highly similar to each other. A fairly high similarity was also found between these jellyfishes and a sea anemone, Anthopleura japonica. PMID:6130512

  15. Nucleotide sequence conservation in paramyxoviruses; the concept of codon constellation.

    PubMed

    Rima, Bert K

    2015-05-01

    The stability and conservation of the sequences of RNA viruses in the field and the high error rates measured in vitro are paradoxical. The field stability indicates that there are very strong selective constraints on sequence diversity. The nature of these constraints is discussed. Apart from constraints on variation in cis-acting RNA and the amino acid sequences of viral proteins, there are other ones relating to the presence of specific dinucleotides such CpG and UpA as well as the importance of RNA secondary structures and RNA degradation rates. Recent other constraints identified in other RNA viruses, such as effects of secondary RNA structure on protein folding or modification of cellular tRNA complements, are also discussed. Using the family Paramyxoviridae, I show that the codon usage pattern (CUP) is (i) specific for each virus species and (ii) that it is markedly different from the host - it does not vary even in vaccine viruses that have been derived by passage in a number of inappropriate host cells. The CUP might thus be an additional constraint on variation, and I propose the concept of codon constellation to indicate the informational content of the sequences of RNA molecules relating not only to stability and structure but also to the efficiency of translation of a viral mRNA resulting from the CUP and the numbers and position of rare codons.

  16. Nucleotide sequence of a human tRNA gene heterocluster

    SciTech Connect

    Chang, Y.N.; Pirtle, I.L.; Pirtle, R.M.

    1986-05-01

    Leucine tRNA from bovine liver was used as a hybridization probe to screen a human gene library harbored in Charon-4A of bacteriophage lambda. The human DNA inserts from plaque-pure clones were characterized by restriction endonuclease mapping and Southern hybridization techniques, using both (3'-/sup 32/P)-labeled bovine liver leucine tRNA and total tRNA as hybridization probes. An 8-kb Hind III fragment of one of these ..gamma..-clones was subcloned into the Hind III site of pBR322. Subsequent fine restriction mapping and DNA sequence analysis of this plasmid DNA indicated the presence of four tRNA genes within the 8-kb DNA fragment. A leucine tRNA gene with an anticodon of AAG and a proline tRNA gene with an anticodon of AGG are in a 1.6-kb subfragment. A threonine tRNA gene with an anticodon of UGU and an as yet unidentified tRNA gene are located in a 1.1-kb subfragment. These two different subfragments are separated by 2.8 kb. The coding regions of the three sequenced genes contain characteristic internal split promoter sequences and do not have intervening sequences. The 3'-flanking region of these three genes have typical RNA polymerase III termination sites of at least four consecutive T residues.

  17. Methods for making nucleotide probes for sequencing and synthesis

    DOEpatents

    Church, George M; Zhang, Kun; Chou, Joseph

    2014-07-08

    Compositions and methods for making a plurality of probes for analyzing a plurality of nucleic acid samples are provided. Compositions and methods for analyzing a plurality of nucleic acid samples to obtain sequence information in each nucleic acid sample are also provided.

  18. A new hybrid fractal algorithm for predicting thermophilic nucleotide sequences.

    PubMed

    Lu, Jin-Long; Hu, Xue-Hai; Hu, Dong-Gang

    2012-01-21

    Knowledge of thermophilic mechanisms about some organisms whose optimum growth temperature (OGT) ranges from 50 to 80 degree plays a major role in helping design stable proteins. How to predict a DNA sequence to be thermophilic is a long but not fairly resolved problem. Chaos game representation (CGR) can investigate the patterns hiding in DNA sequences, and can visually reveal previously unknown structure. Fractal dimensions are good tools to measure sizes of complex, highly irregular geometric objects. In this paper, we convert every DNA sequence into a high dimensional vector by CGR algorithm and fractal dimension, and then predict the DNA sequence thermostability by these fractal features and support vector machine (SVM). We have conducted experiments on three groups: 17-dimensional vector, 65-dimensional vector, and 257-dimensional vector. Each group is evaluated by the 10-fold cross-validation test. For the results, the group of 257-dimensional vector gets the best results: the average accuracy is 0.9456 and average MCC is 0.8878. The results are also compared with the previous work with single CGR features. The comparison shows the high effectiveness of the new hybrid fractal algorithm.

  19. Nucleotide sequences and phylogeny of the nucleocapsid gene of Oropouche virus.

    PubMed

    Saeed, M F; Wang, H; Nunes, M; Vasconcelos, P F; Weaver, S C; Shope, R E; Watts, D M; Tesh, R B; Barrett, A D

    2000-03-01

    The nucleotide sequence of the S RNA segment of the Oropouche (ORO) virus prototype strain TRVL 9760 was determined and found to be 754 nucleotides in length. In the virion-complementary orientation, the RNA contained two overlapping open reading frames of 693 and 273 nucleotides that were predicted to encode proteins of 231 and 91 amino acids, respectively. Subsequently, the nucleotide sequences of the nucleocapsid genes of 27 additional ORO virus strains, representing a 42 year interval and a wide geographical range in South America, were determined. Phylogenetic analyses revealed that all the ORO virus strains formed a monophyletic group that comprised three distinct lineages. Lineage I contained the prototype strain from Trinidad and most of the Brazilian strains, lineage II contained six Peruvian strains isolated between 1992 and 1998, and two strains from western Brazil isolated in 1991, while lineage III comprised four strains isolated in Panama during 1989.

  20. Mayaro virus: complete nucleotide sequence and phylogenetic relationships with other alphaviruses.

    PubMed

    Lavergne, Anne; de Thoisy, Benoît; Lacoste, Vincent; Pascalis, Hervé; Pouliquen, Jean-François; Mercier, Véronique; Tolou, Hugues; Dussart, Philippe; Morvan, Jacques; Talarmin, Antoine; Kazanji, Mirdad

    2006-05-01

    Mayaro (MAY) virus is a member of the genus Alphavirus in the family Togaviridae. Alphaviruses are distributed throughout the world and cause a wide range of diseases in humans and animals. Here, we determined the complete nucleotide sequence of MAY from a viral strain isolated from a French Guianese patient. The deduced MAY genome was 11,429 nucleotides in length, excluding the 5' cap nucleotide and 3' poly(A) tail. Nucleotide and amino acid homologies, as well as phylogenetic analyses of the obtained sequence confirmed that MAY is not a recombinant virus and belongs to the Semliki Forest complex according to the antigenic complex classification. Furthermore, analyses based on the E1 region revealed that MAY is closely related to Una virus, the only other South American virus clustering with the Old World viruses. On the basis of our results and of the alphaviruses diversity and pathogenicity, we suggest that alphaviruses may have an Old World origin.

  1. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations.

    PubMed

    Abascal, Federico; Zardoya, Rafael; Telford, Maximilian J

    2010-07-01

    We present TranslatorX, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. Many comparisons between biological sequences (nucleic acids and proteins) involve the construction of multiple alignments. Alignments represent a statement regarding the homology between individual nucleotides or amino acids within homologous genes. As protein-coding DNA sequences evolve as triplets of nucleotides (codons) and it is known that sequence similarity degrades more rapidly at the DNA than at the amino acid level, alignments are generally more accurate when based on amino acids than on their corresponding nucleotides. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. The TranslatorX server is freely available at http://translatorx.co.uk.

  2. Nucleotide sequence and taxonomical distribution of the bacteriocin gene lin cloned from Brevibacterium linens M18.

    PubMed

    Valdes-Stauber, N; Scherer, S

    1996-04-01

    Linocin M18 is an antilisterial bacteriocin produced by the red smear cheese bacterium Brevibacterium linens M18. Oligonucleotide probes based on the N-terminal amino acid sequence were used to locate its single copy gene, lin, on the chromosomal DNA. The amino acid composition, N-terminal sequence, and molecular mass derived from the nucleotide sequence of an open reading frame of 798 nucleotides coding for 266 amino acids found on a 3-kb BamHI restriction fragment correspond closely to those obtained from the purified protein (N. Valdés-Stauber and S. Scherer, Appl. Environ. Microbiol. 60:3809-3814, 1994). No sequence homology to any protein or nucleotide sequences deposited in databases was found. Comparison of the nucleotide sequence and the N-terminal amino acid sequence derived from the protein suggests that B. linens M18 produces an N-formyl-methionyl-CAC tRNA. A wide taxonomical distribution of the gene within coryneform bacteria has been demonstrated by PCR amplification. The structural gene from linocin M18 is present at least in three Brevibacterium species, five Arthrobacter species, and five Corynebacterium species.

  3. Clusters of nucleotide substitutions and insertion/deletion mutations are associated with repeat sequences.

    PubMed

    McDonald, Michael J; Wang, Wei-Chi; Huang, Hsien-Da; Leu, Jun-Yi

    2011-06-01

    The genome-sequencing gold rush has facilitated the use of comparative genomics to uncover patterns of genome evolution, although their causal mechanisms remain elusive. One such trend, ubiquitous to prokarya and eukarya, is the association of insertion/deletion mutations (indels) with increases in the nucleotide substitution rate extending over hundreds of base pairs. The prevailing hypothesis is that indels are themselves mutagenic agents. Here, we employ population genomics data from Escherichia coli, Saccharomyces paradoxus, and Drosophila to provide evidence suggesting that it is not the indels per se but the sequence in which indels occur that causes the accumulation of nucleotide substitutions. We found that about two-thirds of indels are closely associated with repeat sequences and that repeat sequence abundance could be used to identify regions of elevated sequence diversity, independently of indels. Moreover, the mutational signature of indel-proximal nucleotide substitutions matches that of error-prone DNA polymerases. We propose that repeat sequences promote an increased probability of replication fork arrest, causing the persistent recruitment of error-prone DNA polymerases to specific sequence regions over evolutionary time scales. Experimental measures of the mutation rates of engineered DNA sequences and analyses of experimentally obtained collections of spontaneous mutations provide molecular evidence supporting our hypothesis. This study uncovers a new role for repeat sequences in genome evolution and provides an explanation of how fine-scale sequence contextual effects influence mutation rates and thereby evolution.

  4. Complete nucleotide sequence of Alfalfa mosaic virus isolated from alfalfa (Medicago sativa L.) in Argentina.

    PubMed

    Trucco, Verónica; de Breuil, Soledad; Bejerman, Nicolás; Lenardon, Sergio; Giolitti, Fabián

    2014-06-01

    The complete nucleotide sequence of an Alfalfa mosaic virus (AMV) isolate infecting alfalfa (Medicago sativa L.) in Argentina, AMV-Arg, was determined. The virus genome has the typical organization described for AMV, and comprises 3,643, 2,593, and 2,038 nucleotides for RNA1, 2 and 3, respectively. The whole genome sequence and each encoding region were compared with those of other four isolates that have been completely sequenced from China, Italy, Spain and USA. The nucleotide identity percentages ranged from 95.9 to 99.1 % for the three RNAs and from 93.7 to 99 % for the protein 1 (P1), protein 2 (P2), movement protein and coat protein (CP) encoding regions, whereas the amino acid identity percentages of these proteins ranged from 93.4 to 99.5 %, the lowest value corresponding to P2. CP sequences of AMV-Arg were compared with those of other 25 available isolates, and the phylogenetic analysis based on the CP gene was carried out. The highest percentage of nucleotide sequence identity of the CP gene was 98.3 % with a Chinese isolate and 98.6 % at the amino acid level with four isolates, two from Italy, one from Brazil and the remaining one from China. The phylogenetic analysis showed that AMV-Arg is closely related to subgroup I of AMV isolates. To our knowledge, this is the first report of a complete nucleotide sequence of AMV from South America and the first worldwide report of complete nucleotide sequence of AMV isolated from alfalfa as natural host.

  5. Cloning and nucleotide sequence of the Lactobacillus casei lactate dehydrogenase gene.

    PubMed Central

    Kim, S F; Baek, S J; Pack, M Y

    1991-01-01

    An allosteric L-(+)-lactate dehydrogenase gene of Lactobacillus casei ATCC 393 was cloned in Escherichia coli, and the nucleotide sequence of the gene was determined. The gene was composed of an open reading frame of 981 bp, starting with a GTG codon and ending with a TAA codon. The sequences for the promoter and ribosome binding site were identified, and a sequence for a structure resembling a rho-independent transcription terminator was also found. Images PMID:1768113

  6. Nucleotide sequence of an Escherichia coli chromosomal hemolysin.

    PubMed Central

    Felmlee, T; Pellett, S; Welch, R A

    1985-01-01

    We determined the DNA sequence of an 8,211-base-pair region encompassing the chromosomal hemolysin, molecularly cloned from an O4 serotype strain of Escherichia coli. All four hemolysin cistrons (transcriptional order, C, A, B, and D) were encoded on the same DNA strand, and their predicted molecular masses were, respectively, 19.7, 109.8, 79.9, and 54.6 kilodaltons. The identification of pSF4000-encoded polypeptides in E. coli minicells corroborated the assignment of the predicted polypeptides for hlyC, hlyA, and hlyD. However, based on the minicell results, two polypeptides appeared to be encoded on the hlyB region, one similar in size to the predicted molecular mass of 79.9 kilodaltons, and the other a smaller 46-kilodalton polypeptide. The four hemolysin gene displayed similar codon usage, which is atypical for E. coli. This reflects the low guanine-plus-cytosine content (40.2%) of the hemolysin DNA sequence and suggests the non-E. coli origin of the hemolysin determinant. In vitro-derived deletions of the hemolysin recombinant plasmid pSF4000 indicated that a region between 433 and 301 base pairs upstream of the putative start of hlyC is necessary for hemolysin synthesis. Based on the DNA sequence, a stem-loop transcription terminator-like structure (a 16-base-pair stem followed by seven uridylates) in the mRNA was predicted distal to the C-terminal end of hlyA. A model for the general transcriptional organization of the E. coli hemolysin determinant is presented. Images PMID:3891743

  7. Nucleotide Sequence of the Protective Antigen Gene of Bacillus Anthracis

    DTIC Science & Technology

    1988-02-02

    which appear to encode a sIgnal peptide having characteristics in common with those of other secreted proteins. A consensus TATAAT sequence was located ...UNCLASSIFIED 4144MIT? @.MICATION OF TWOS Ph" r~ .Ewa ..4 20. ABSTRACT (cont) was located seven bp upstream of the ATG initiation codon. The codon usage f.’r...TATAAT seqc. e was located at the putative -10 promoter site. A Shine-Dalgarno site similar to that found in genes of other Bacillus sp. was located seven

  8. Complete nucleotide sequence and transcriptional analysis of snakehead fish retrovirus.

    PubMed

    Hart, D; Frerichs, G N; Rambaut, A; Onions, D E

    1996-06-01

    The complete genome of the snakehead fish retrovirus has been cloned and sequenced, and its transcriptional profile in cell culture has been determined. The 11.2-kb provirus displays a complex expression pattern capable of encoding accessory proteins and is unique in the predicted location of the env initiation codon and signal peptide upstream of gag and the common splice donor site. The virus is distinguishable from all known retrovirus groups by the presence of an arginine tRNA primer binding site. The coding regions are highly divergent and show a number of unusual characteristics, including a large Gag coiled-coil region, a Pol domain of unknown function, and a long, lentiviral-like, Env cytoplasmic domain. Phylogenetic analysis of the Pol sequence emphasizes the divergent nature of the virus from the avian and mammalian retroviruses. The snakehead virus is also distinct from a previously characterized complex fish retrovirus, suggesting that discrete groups of these viruses have yet to be identified in the lower vertebrates.

  9. Nucleotide sequence of the capsid protein gene of papaya leaf-distortion mosaic potyvirus.

    PubMed

    Maoka, T; Kashiwazaki, S; Tsuda, S; Usugi, T; Hibino, H

    1996-01-01

    The DNA complementary to the 3'-terminal 1 404 nucleotides [excluding the poly(A) tail] of papaya leaf-distortion mosaic potyvirus (PLDMV) RNA was cloned and sequenced. The sequence starts within a long open reading frame (ORF) of 1 195 nucleotides and is followed by a 3' non-coding region of 209 nucleotides. Capsid protein (CP) is encoded at the 3' terminus of the ORF. The CP contains 293 residues and has a Mr of 33 277. The CP of PLDMV exhibits 49 to 59% sequence similarity at the amino acid level to the CPs of papaya ringspot potyvirus (PRSV) and other potyviruses. This result is consistent with the absence of a serological relationship between PLDMV and PRSV or other potyviruses. The results support the assignment of PLDMV as a distinct member of the genus Potyvirus.

  10. Complete nucleotide sequence of the polymerase 3 gene of human influenza virus A/WSN/33.

    PubMed Central

    Kaptein, J S; Nayak, D P

    1982-01-01

    The complete nucleotide sequence of polymerase 3 (P3) gene of a human influenza virus (A/WSN/33) has been determined using cDNA clones except for the last 11 nucleotides which were obtained by direct RNA sequencing. The WSN P3 gene contains 2,341 nucleotides and codes for a protein of 759 amino acids (molecular weight 85,800). The WSN P3 protein, as deduced from the plus-strand DNA sequence, is basic and enriched in positively charged amino acids. In addition, it contains clusters of basic amino acids which may provide sites for the interaction of P3 protein with the capped primer, template, and/or other polymerase proteins during the transcriptive and replicative processes of influenza viral RNA. PMID:7045393

  11. Statistical analysis of nucleotide runs in coding and noncoding DNA sequences.

    PubMed

    Sprizhitsky YuA; Nechipurenko YuD; Alexandrov, A A; Volkenstein, M V

    1988-10-01

    A statistical analysis of the occurrence of particular nucleotide runs in DNA sequences of different species has been carried out. There are considerable differences of run distributions in DNA sequences of procaryotes, invertebrates and vertebrates. There is an abundance of short runs (1-2 nucleotides long) in the coding sequences and there is a deficiency of such runs in the noncoding regions. However, some interesting exceptions from this rule exist for the run distribution of adenine in procaryotes and for the arrangement of purine-pyrimidine runs in eucaryotes. The similarity in the distributions of such runs in the coding and noncoding regions may be due to some structural features of the DNA molecule as a whole. Runs of guanine (or cytosine) of three to six nucleotides occur predominantly in noncoding DNA regions in eucaryotes, especially in vertebrates.

  12. Intraspecific nucleotide sequence differences in the major noncoding region of human mitochondrial DNA.

    PubMed Central

    Horai, S; Hayasaka, K

    1990-01-01

    Nucleotide sequences of the major noncoding region of human mitochondrial DNA (mtDNA) from 95 human placentas have been determined. These sequences include at least a 482-bp-long region encompassing most of the D-loop-forming region. Comparisons of these sequences with those previously determined have revealed remarkable features of nucleotide substitutions and insertion/deletion events. The nucleotide diversity among the sequences is estimated as 1.45%, which is three- to fourfold higher than the corresponding value estimated from restriction-enzyme analysis of whole mtDNA genome. A hypervariable region has also been defined. In this 14-bp region, 17 different sequences were detected. More than 97% of the base changes are transitions. A significantly nonrandom distribution of nucleotide substitutions and sequence length variations were also noted. The phylogenetic analysis indicates that diversity among the negroids is much larger than that among the caucasoids or the mongoloids. In fact, part of the negroids first diverged from other humans in the phylogenetic tree. A striking finding in the phylogenetic analysis is that the mongoloids can be separated into two distinct groups. Divergence of part of the mongoloids follows the earliest divergence of part of the negroids. The remainder of the mongoloids subsequently diverged together with the caucasoids. This observation confirmed our earlier study, which clearly demonstrated, by the restriction-enzyme analysis, existence of two distinct groups in the Japanese. Images Figure 3 PMID:2316527

  13. Free molecular biological software available from the EMBL file server.

    PubMed

    Fuchs, R

    1990-04-01

    A new service provided by EMBL (EMBL Software File Server) is described that will make free molecular biology software available to anyone with computer network access. MS-DOS, Apple Macintosh and VAX/VMX are supported at the moment. The programs will be delivered by normal electronic mail; conversion mechanisms will transform binary files to ASCII to allow mail transfer. This service will also help authors to distribute their software conveniently.

  14. Detecting selection in noncoding regions of nucleotide sequences.

    PubMed Central

    Wong, Wendy S W; Nielsen, Rasmus

    2004-01-01

    We present a maximum-likelihood method for examining the selection pressure and detecting positive selection in noncoding regions using multiple aligned DNA sequences. The rate of substitution in noncoding regions relative to the rate of synonymous substitution in coding regions is modeled by a parameter zeta. When a site in a noncoding region is evolving neutrally zeta = 1, while zeta > 1 indicates the action of positive selection, and zeta < 1 suggests negative selection. Using a combined model for the evolution of noncoding and coding regions, we develop two likelihood-ratio tests for the detection of selection in noncoding regions. Data analysis of both simulated and real viral data is presented. Using the new method we show that positive selection in viruses is acting primarily in protein-coding regions and is rare or absent in noncoding regions. PMID:15238543

  15. An Integrated System for DNA Sequencing by Synthesis Using Novel Nucleotide Analogues

    PubMed Central

    Guo, Jia; Yu, Lin; Turro, Nicholas J.; Ju, Jingyue

    2010-01-01

    Conspectus The Human Genome Project has concluded, but its successful completion has increased, rather than decreased, the need for high-throughput DNA sequencing technologies. The possibility of clinically screening a full genome for an individual's mutations offers tremendous benefits, both for pursuing personalized medicine as well as uncovering the genomic contributions to diseases. The Sanger sequencing method—although enormously productive for more than 30 years—requires an electrophoretic separation step that, unfortunately, remains a key technical obstacle for achieving economically acceptable full-genome results. Alternative sequencing approaches thus focus on innovations that can reduce costs. The DNA sequencing by synthesis (SBS) approach has shown great promise as a new sequencing platform, with particular progress reported recently. The general fluorescent SBS approach involves (i) incorporation of nucleotide analogs bearing fluorescent reporters, (ii) identification of the incorporated nucleotide by its fluorescent emissions, and (iii) cleavage of the fluorophore, along with the reinitiation of the polymerase reaction for continuing sequence determination. In this Account, we review the construction of a DNA-immobilized chip and the development of novel nucleotide reporters for the SBS sequencing platform. Click chemistry, with its high selectivity and coupling efficiency, was explored for surface immobilization of DNA. The first generation (G-1) modified nucleotides for SBS feature a small chemical moiety capping the 3′-OH and a fluorophore tethered to the base through a chemically cleavable linker; the design ensures that the nucleotide reporters are good substrates for the polymerase. The 3′-capping moiety and the fluorophore on the DNA extension products, generated by the incorporation of the G-1 modified nucleotides, are cleaved simultaneously to reinitiate the polymerase reaction. The sequence of a DNA template immobilized on a surface

  16. The nucleotide sequence of 5S ribosomal RNA from slime mold Physarum polycephalum.

    PubMed

    Komiya, H; Takemura, S

    1981-12-01

    The nucleotide sequence of 5S ribosomal RNA from plasmodia of the slime mold Physarum polycephalum was determined as pppGGAUGCGGC CAUACUAAGG 20 AGAAAGCACC 30 UCAUCCCGUC 40 CGAUCUGAGA 50 AGUUAAGCUC 60 CUUCAGGCGU 70 GGUUAGUACU 80 GGGGUGGGGG 90 ACCACCUGGG 100 AAUCCCACGU 110 GCUGCAUUCU 120 Uoh by chemical and enzymatic gel sequencing technics using 3' and 5' end-labeled RNA. This RNA is very different from 5S rRNA of the cellular slime mold Dictyostelium discoideum (36 nucleotides are different), and shows greater similarity to 5S rRNAs from Protozoa and Metazoa than to those from fungi.

  17. Long-range macromolecule interaction and “speed reading” long nucleotide sequences in DNA

    NASA Astrophysics Data System (ADS)

    Namiot, V. A.; Anashkina, A. A.; Filatov, I. V.; Tumanyan, V. G.; Esipova, N. G.

    2013-01-01

    Methods based on the phenomenon of the specific long-range interaction between long macromolecules proposed for “speed reading” nucleotide sequences in single DNA molecules. One way is to measure the electric field potential along the preliminary stretched double DNA strand. Another way of information “reading” is to measure deformation of strand elements caused by an electric field that is generated by the “straightening” electrode due to an alternating voltage applied to it. On the base of the obtained information the sequence of nucleotides in the strand could be determined in principle.

  18. The complete nucleotide sequence and genomic characterization of tropical soda apple mosaic virus.

    PubMed

    Fillmer, Kornelia; Adkins, Scott; Pongam, Patchara; D'Elia, Tom

    2016-08-01

    We report the first complete genome sequence of tropical soda apple mosaic virus (TSAMV), a tobamovirus originally isolated from tropical soda apple (Solanum viarum) collected in Okeechobee, Florida. The complete genome of TSAMV is 6,350 nucleotides long and contains four open reading frames encoding the following proteins: i) 126-kDa methyltransferase/helicase (3354 nt), ii) 183-kDa polymerase (4839 nt), iii) movement protein (771 nt) and iv) coat protein (483 nt). The complete genome sequence of TSAMV shares 80.4 % nucleotide sequence identity with pepper mild mottle virus (PMMoV) and 71.2-74.2 % identity with other tobamoviruses naturally infecting members of the Solanaceae plant family. Phylogenetic analysis of the deduced amino acid sequences of the 126-kDa and 183-kDa proteins and the complete genome sequence place TSAMV in a subcluster with PMMoV within the Solanaceae-infecting subgroup of tobamoviruses.

  19. Cloning and nucleotide sequence of wild type and a mutant histidine decarboxylase from Lactobacillus 30a.

    PubMed

    Vanderslice, P; Copeland, W C; Robertus, J D

    1986-11-15

    Prohistidine decarboxylase from Lactobacillus 30a is a protein that autoactivates to histidine decarboxylase by cleaving its peptide chain between serines 81 and 82 and converting Ser-82 to a pyruvoyl moiety. The pyruvoyl group serves as the prosthetic group for the decarboxylation reaction. We have cloned and determined the nucleotide sequence of the gene for this enzyme from a wild type strain and from a mutant with altered autoactivation properties. The nucleotide sequence modifies the previously determined amino acid sequence of the protein. A tripeptide missed in the chemical sequence is inserted, and three other amino acids show conservative changes. The activation mutant shows a single change of Gly-58 to an Asp. Sequence analysis up- and downstream from the gene suggests that histidine decarboxylase is part of a polycistronic message, and that the transcriptional promotor region is strongly homologous to those of other Gram-positive organisms.

  20. Population genetics and phylogenetic analysis of the vrs1 nucleotide sequence in wild and cultivated barley.

    PubMed

    Ren, Xifeng; Wang, Yonggang; Yan, Songxian; Sun, Dongfa; Sun, Genlou

    2014-04-01

    Spike morphology is a key characteristic in the study of barley genetics, breeding, and domestication. Variation at the six-rowed spike 1 (vrs1) locus is sufficient to control the development and fertility of the lateral spikelet of barley. To study the genetic variation of vrs1 in wild barley (Hordeum vulgare subsp. spontaneum) and cultivated barley (Hordeum vulgare subsp. vulgare), nucleotide sequences of vrs1 were examined in 84 wild barleys (including 10 six-rowed) and 20 cultivated barleys (including 10 six-rowed) from four populations. The length of the vrs1 sequence amplified was 1536 bp. A total of 40 haplotypes were identified in the four populations. The highest nucleotide diversity, haplotype diversity, and per-site nucleotide diversity were observed in the Southwest Asian wild barley population. The nucleotide diversity, number of haplotypes, haplotype diversity, and per-site nucleotide diversity in two-rowed barley were higher than those in six-rowed barley. The phylogenetic analysis of the vrs1 sequences partially separated the six-rowed and the two-rowed barley. The six-rowed barleys were divided into four groups.

  1. Nucleotide composition of CO1 sequences in Chelicerata (Arthropoda): detecting new mitogenomic rearrangements.

    PubMed

    Arabi, Juliette; Judson, Mark L I; Deharveng, Louis; Lourenço, Wilson R; Cruaud, Corinne; Hassanin, Alexandre

    2012-02-01

    Here we study the evolution of nucleotide composition in third codon-positions of CO1 sequences of Chelicerata, using a phylogenetic framework, based on 180 taxa and three markers (CO1, 18S, and 28S rRNA; 5,218 nt). The analyses of nucleotide composition were also extended to all CO1 sequences of Chelicerata found in GenBank (1,701 taxa). The results show that most species of Chelicerata have a positive strand bias in CO1, i.e., in favor of C nucleotides, including all Amblypygi, Palpigradi, Ricinulei, Solifugae, Uropygi, and Xiphosura. However, several taxa show a negative strand bias, i.e., in favor of G nucleotides: all Scorpiones, Opisthothelae spiders and several taxa within Acari, Opiliones, Pseudoscorpiones, and Pycnogonida. Several reversals of strand-specific bias can be attributed to either a rearrangement of the control region or an inversion of a fragment containing the CO1 gene. Key taxa for which sequencing of complete mitochondrial genomes will be necessary to determine the origin and nature of mtDNA rearrangements involved in the reversals are identified. Acari, Opiliones, Pseudoscorpiones, and Pycnogonida were found to show a strong variability in nucleotide composition. In addition, both mitochondrial and nuclear genomes have been affected by higher substitution rates in Acari and Pseudoscorpiones. The results therefore indicate that these two orders are more liable to fix mutations of all types, including base substitutions, indels, and genomic rearrangements.

  2. Nucleotide sequence and genome organization of a new proposed crinivirus, tetterwort vein chlorosis virus.

    PubMed

    Zhao, Fumei; Yoo, Ran Hee; Lim, Seungmo; Igori, Davaajargal; Lee, Su-Heon; Moon, Jae Sun

    2015-11-01

    The genome of tetterwort vein chlorosis virus (TVCV) from South Korea has been completely sequenced. Its genomic organization resembles those of other criniviruses, with several new features, indicating that TVCV is a member of a new species in the genus Crinivirus, family Closteroviridae. RNA1 contains 8467 nucleotides, with at least four opening reading frames (ORFs). ORF1a encodes a protein with predicted papain-like protease, methyltransferase, and helicase activities. ORF1b encodes a putative RNA-dependent RNA polymerase that is apparently expressed through a +1 ribosomal frameshift. RNA2 contains 8113 nucleotides encoding at least nine proteins, similar to most crinivirus RNA2s. The 3' untranslated regions of the bipartite RNA genome share 82.1% nucleotide sequence identity.

  3. Complete nucleotide sequence of the new potexvirus "Alstroemeria virus X". Brief report.

    PubMed

    Fuji, S; Shinoda, K; Ikeda, M; Furuya, H; Naito, H; Fukumoto, F

    2005-11-01

    A flexuous virus was isolated in Japan from an alstroemeria plant showing mosaic symptoms. The virus had a broad host range but had systemically latent infectivity in alstroemeria. The virus was assigned to the genus Potexvirus based on morphology and physical properties and on an analysis of the complete nucleotide sequence. The genomic RNA of the virus was 7,009 nucleotides in length, excluding the 3'-terminal poly (A) tail. It contained five open reading frames (ORFs), which was consistent with other members of the genus Potexvirus. Although nucleotide sequences of the ORFs differ from previously reported potexviruses, a phylogenetic analysis placed it phylogenetically close to Narcissus mosaic virus and Scallion virus X. Therefore, we propose that this virus should be designated as Alstroemeria virus X (AlsVX).

  4. Transport properties of nucleotides in a graphene nanogap for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Prasongkit, J.; Grigoriev, A.; Scheicher, R. H.; Ahuja, R.

    2011-03-01

    The application of graphene nanogaps for DNA sequencing has been proposed [H. W. Ch. Postma, Nano Lett. 10, 420 (2010)]. We used density functional theory and the non-equilibrium Green's function method to study the electron transport properties of nucleotides located inside a graphene nanogap. Our setup considered different positions and orientations of the bases with respect to the graphene electrodes, and we analyzed how the transmission spectra depend on such shifts and rotations. Even when taking into account current changes due to base fluctuations, we find that each nucleotide possesses a different characteristic current magnitude, owing to its distinctive electronic properties. Based on our results, it thus seems that the electrical readout from a graphene nanogap could in principle be sufficiently sensitive to distinguish between the four nucleotides, and thus achieve the goal of rapid and economical whole-genome sequencing. Swedish Research Council (VR, grant no. 621-2009-3628).

  5. Complete nucleotide sequence of a begomovirus and associated betasatellite infecting croton (Croton bonplandianus) in Pakistan.

    PubMed

    Hussain, Khadim; Hussain, Mazhar; Mansoor, Shahid; Briddon, Rob W

    2011-06-01

    The complete sequences of a begomovirus and an associated betasatellite isolated from Croton bonplandianus originating from Pakistan were determined. The sequence of the begomovirus showed the highest level of nucleotide sequence identity (88.9%) to an isolate of papaya leaf curl virus and thus represents a new species, for which we propose the name Croton yellow vein virus (CYVV). The sequence of the betasatellite showed the highest levels of sequence identity (82 to 98.4%) to six sequences in the databases that have yet to be reported, followed by isolates of tomato leaf curl Joydebpur betasatellite (48.7 to 52.5%). This indicates that the betasatellite identified here (and the six sequences in the databases) is an isolate of a newly identified species for which the name Croton yellow vein mosaic betasatellite (CroYVMB) is proposed. For the begomovirus, an analysis of the sequence indicates that it has a recombinant origin.

  6. Complete nucleotide sequence of a novel strain of fig fleck-associated virus from China.

    PubMed

    He, Zhen; Mijit, Mahmut; Li, Shifang; Zhang, Zhixiang

    2017-04-01

    The complete nucleotide sequence of fig fleck-associated virus from Xinjiang Uygur Autonomous Region of China (FFkaV-CN) was determined. The 6,723-nucleotide-long viral genome, excluding a terminal poly(A) tail, contains three open reading frames (ORFs). Pairwise comparisons showed that FFkaV-CN shares 83% and 92% sequence identity with FFkaV-Italy based on the complete genomic sequence and CP aa sequence, respectively, slightly higher than the species demarcation criterion for the genus Maculavirus. Phylogenetic analysis showed that FFkaV-CN and FFkaV-Italy clustered into one group. These results indicate that FFkaV-CN is a novel strain of FFkaV with a genome organization somewhat different from what was reported for FFkaV-Italy.

  7. Nucleotide sequence determination of bacteriophage T4 glycine transfer ribonucleic acid

    PubMed Central

    Stahl, Stephen; Paddock, Gary V.; Abelson, John

    1974-01-01

    The nucleotide sequence of a T4 tRNA with an anticodon for glycine has been determined using 32P-labeled material from T4-infected cultures of Escherichia coli. The sequence is: pGCGGAUAUCGUAUAAUGmGDAUUACCUCAGACUUCCAAψCUGAUGAUGUGAGTψCGAUUCUCAUUAUCCGCUCCA-OH. The 74 nucleotide sequence can be arranged in the classic cloverleaf pattern for tRNAs. The anticodon of T4 tRNAGly is UCC with a possible modification of the U. The tRNA molecule would thus be expected to recognize the glycine codons GGG and GGA. Comparative analysis of tRNAsGly from T2 and T6 indicate that their sequences are identical with that from T4. Images PMID:10793690

  8. Dependence of the E. coli promoter strength and physical parameters upon the nucleotide sequence

    PubMed Central

    Berezhnoy, Andrey Y.; Shckorbatov, Yuriy G.

    2005-01-01

    The energy of interaction between complementary nucleotides in promoter sequences of E. coli was calculated and visualized. The graphic method for presentation of energy properties of promoter sequences was elaborated on. Data obtained indicated that energy distribution through the length of promoter sequence results in picture with minima at −35, −8 and +7 regions corresponding to areas with elevated AT (adenine-thymine) content. The most important difference from the random sequences area is related to −8. Four promoter groups and their energy properties were revealed. The promoters with minimal and maximal energy of interaction between complementary nucleotides have low strengths, the strongest promoters correspond to promoter clusters characterized by intermediate energy values. PMID:16252339

  9. On the feasibility of using the intrinsic fluorescence of nucleotides for DNA sequencing.

    SciTech Connect

    Chowdhury, M. H.; Ray, K.; Johnson, R. L.; Gray, S. K.; Pond, J.; Lakowicz, J. R.; Univ. of Maryland; Univ. of Virginia; Lumerical Solutions, Inc.

    2010-04-29

    There is presently a worldwide effort to increase the speed and decrease the cost of DNA sequencing as exemplified by the goal of the National Human Genome Research Institute (NHGRI) to sequence a human genome for under $1000. Several high throughput technologies are under development. Among these, single strand sequencing using exonuclease appear very promising. However, this approach requires complete labeling of at least two bases at a time, with extrinsic high quantum yield probes. This is necessary because nucleotides absorb in the deep ultraviolet (UV) and emit with extremely low quantum yields. Hence intrinsic emission from DNA and nucleotides is not being exploited for DNA sequencing. In the present paper we consider the possibility of identifying single nucleotides using their intrinsic emission. We used the finite-difference time-domain (FDTD) method to calculate the effects of aluminum nanoparticles on nearby fluorophores that emit in the UV. We find that the radiated power of UV fluorophores is significantly increased when they are in close proximity to aluminum nanostructures. We show that there will be increased localized excitation near aluminum particles at wavelengths used to excite intrinsic nucleotide emission. Using FDTD simulation we show that a typical DNA base when coupled to appropriate aluminum nanostructures leads to highly directional emission. Additionally we present experimental results showing that a thin film of nucleotides show enhanced emission when in close proximity to aluminum nanostructures. Finally we provide Monte Carlo simulations that predict high levels of base calling accuracy for an assumed number of photons that is derived from the emission spectra of the intrinsic fluorescence of the bases. Our results suggest that single nucleotides can be detected and identified using aluminum nanostructures that enhance their intrinsic emission. This capability would be valuable for the ongoing efforts toward the $1000 genome.

  10. On the Feasibility of Using the Intrinsic Fluorescence of Nucleotides for DNA Sequencing

    PubMed Central

    Chowdhury, Mustafa H.; Ray, Krishanu; Johnson, Michael L.; Gray, Stephen K.; Pond, James; Lakowicz, Joseph R.

    2010-01-01

    There is presently a worldwide effort to increase the speed and decrease the cost of DNA sequencing as exemplified by the goal of the National Human Genome Research Institute (NHGRI) to sequence a human genome for under $1000. Several high throughput technologies are under development. Among these, single strand sequencing using exonuclease appear very promising. However, this approach requires complete labeling of at least two bases at a time, with extrinsic high quantum yield probes. This is necessary because nucleotides absorb in the deep ultra-violet (UV) and emit with extremely low quantum yields. Hence intrinsic emission from DNA and nucleotides is not being exploited for DNA sequencing. In the present paper we consider the possibility of identifying single nucleotides using their intrinsic emission. We used the finite-difference time-domain (FDTD) method to calculate the effects of aluminum nanoparticles on nearby fluorophores that emit in the UV. We find that the radiated power of UV fluorophores is significantly increased when they are in close proximity to aluminum nanostructures. We show that there will be increased localized excitation near aluminum particles at wavelengths used to excite intrinsic nucleotide emission. Using FDTD simulation we show that a typical DNA base when coupled to appropriate aluminum nanostructures leads to highly directional emission. Additionally we present experimental results showing that a thin film of nucleotides show enhanced emission when in close proximity to aluminum nanostructures. Finally we provide Monte Carlo simulations that predict high levels of base calling accuracy for an assumed number of photons that is derived from the emission spectra of the intrinsic fluorescence of the bases. Our results suggest that single nucleotides can be detected and identified using aluminum nanostructures that enhance their intrinsic emission. This capability would be valuable for the ongoing efforts towards the $1000 genome. PMID

  11. Genomic organization and nucleotide sequences of two corn histone H4 genes.

    PubMed

    Philipps, G; Chaubet, N; Chaboute, M E; Ehling, M; Gigot, C

    1986-01-01

    The sea urchin histone H4 gene has been used as a probe to clone two corn histone H4 genes from a lambda gtWES X lambda B corn genomic library. The nucleotide (nt) sequences of both genes showed that the encoded amino acid sequences were identical to that of the H4 of pea and one variant of wheat. The nt sequences of the coding regions showed 92% homology. 5'- and 3'-flanking regions do not show extensive nt sequence analogies. Southern blotting of the EcoRI digested genomic DNA suggests the existence of multiple H4 genes dispersed throughout the genome.

  12. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.

    PubMed

    Otto, Thomas D; Sanders, Mandy; Berriman, Matthew; Newbold, Chris

    2010-07-15

    The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications. The software is available at http://icorn.sourceforge.net

  13. Molecular cloning and nucleotide sequencing of human immunoglobulin epsilon chain cDNA.

    PubMed Central

    Seno, M; Kurokawa, T; Ono, Y; Onda, H; Sasada, R; Igarashi, K; Kikuchi, M; Sugino, Y; Nishida, Y; Honjo, T

    1983-01-01

    DNA complementary to mRNA of human immunoglobulin E heavy chain (epsilon chain) isolated and purified from U266 cells has been synthesized and inserted into the PstI site of pBR322 by G-C tailing. This recombinant plasmid was used to transform E. coli chi 1776 to screen 1445 tetracycline resistant colonies. Nine clones (pGETI - 9) containing cDNA coding for the human epsilon chain were recognized by colony hybridization and Southern blotting analysis with a nick-translated human IgE genome fragment. The nucleotide sequence of the longest cDNA contained in pGET2 was determined. The results indicate that the sequence of 1657 nucleotides codes for 494 amino acids covering a part of the variable region and all of the constant region of the human epsilon chain. Most of the amino acid sequence deduced from the nucleotide sequence is in substantial agreement with that reported. Furthermore a termination codon after the -COOH terminal amino acid marks the beginning of a 3' untranslated region of 125 nucleotides with a poly A tail. Taking this into account, the structure of the human epsilon chain mRNA, except a part of the 5' end, is conserved fairly well in the cDNA insert in pGET2. Images PMID:6300763

  14. Complete Nucleotide Sequence of a Citrobacter freundii Plasmid Carrying KPC-2 in a Unique Genetic Environment

    PubMed Central

    Yao, Yancheng; Imirzalioglu, Can; Hain, Torsten; Kaase, Martin; Gatermann, Soeren; Exner, Martin; Mielke, Martin; Hauri, Anja; Dragneva, Yolanta; Bill, Rita; Wendt, Constanze; Wirtz, Angela; Chakraborty, Trinad

    2014-01-01

    The complete and annotated nucleotide sequence of a 54,036-bp plasmid harboring a blaKPC-2 gene that is clonally present in Citrobacter isolates from different species is presented. The plasmid belongs to incompatibility group N (IncN) and harbors the class A carbapenemase KPC-2 in a unique genetic environment. PMID:25395635

  15. Nucleotide sequence of the 3'-terminal region of potato virus YN RNA.

    PubMed

    van der Vlugt, R; Allefs, S; de Haan, P; Goldbach, R

    1989-01-01

    The sequence of the 3'-terminal 1611 nucleotides of the genome of the tobacco veinal necrosis strain of potato virus Y (PVYN) was determined. The sequence revealed an open reading frame of 1285 nucleotides, of which the start was not identified, and an untranslated region of 316 nucleotides upstream of a poly(A) tract. Comparison of the open reading frame with the amino-terminal sequence of the viral coat protein enabled mapping of the start of the coat protein at amino acid -267, and indicated that maturation of this protein requires proteolytic processing from a larger polyprotein precursor at a glutamine/glycine dipeptide sequence. The coat protein of PVYN displayed significant (51 to 63%) sequence homology to the coat proteins of four other potyviruses, tobacco etch virus, tobacco vein mottling virus, plum pox virus and sugarcane mosaic virus. Even higher sequence homology (91%) was detected with the coat protein of a fifth potyvirus, pepper mottle virus (PeMV). This homology was of the same level as found between the coat proteins of PVYN and a second strain of this virus, PVYD. Since, moreover, PVYN and PeMV were the only potyviruses displaying homology in the 3'-terminal, non-translated regions of their genomes, we conclude that PeMV should be regarded as a strain of PVY.

  16. BSviewer: a genotype-preserving, nucleotide-level visualizer for bisulfite sequencing data.

    PubMed

    Sun, Kun; Lun, Fiona F M; Jiang, Peiyong; Sun, Hao

    2017-08-08

    The bisulfite sequencing technology has been widely used to study the DNA methylation profile in many species. However, most of the current visualization tools for bisulfite sequencing data only provide high-level views (i.e., overall methylation densities) while miss the methylation dynamics at nucleotide level. Meanwhile, they also focus on CpG sites while omit other information (such as genotypes on SNP sites) which could be helpful for interpreting the methylation pattern of the sequencing data. A bioinformatics tool that visualizes the methylation statuses at nucleotide level and preserves the most essential information of the sequencing data is thus valuable and needed. We have developed BSviewer, a lightweight nucleotide-level visualization tool for bisulfite sequencing data. Using an imprinting gene as an example, we show that BSviewer could be specifically helpful for interpreting the bisulfite sequencing data with allele-specific DNA methylation pattern. BSviewer is implemented in Perl and runs on most GNU/Linux platforms. Source code and testing dataset are freely available at http://sunlab.cpy.cuhk.edu.hk/BSviewer/ . haosun@cuhk.edu.hk.

  17. The nucleotide sequence and genome structure of mung bean yellow mosaic geminivirus.

    PubMed

    Morinaga, T; Ikegami, M; Miura, K

    1993-01-01

    Complete nucleotide sequences of the infectious cloned DNA components (DNA 1 and DNA 2) of mung bean yellow mosaic virus (MYMV) were determined. MYMV DNA 1 and DNA 2 consists of 2,723 and 2,675 nucleotides respectively. DNA 1 and DNA 2 have little sequence similarity except for a region of approximately 200 bases which is almost identical in the two molecules. Analysis of open reading frames revealed nine potential coding regions for proteins of mol. wt. > 10,000, six in DNA 1 and three in DNA 2. The nucleotide sequence of MYMV DNA was compared with that of bean golden mosaic virus (BGMV), tomato golden mosaic virus (TGMV) and African cassava mosaic virus (ACMV). The 200-base region common to the two DNAs of each virus had little sequence similarity, except for a highly conserved 33-36 base sequence potentially capable of forming a stable hairpin structure. The potential coding regions in the MYMV DNAs had counterparts in the BGMV, TGMV and ACMV, suggesting an overall similarity in genome organization, except for absence of 1L3 in MYMV DNA 1. The most highly conserved ORFs, MYMV 1R1, BGMV 1R1, TGMV 1R1 and ACMV 1R1, are the putative genes for the coat proteins of MYMV, BGMV, TGMV and ACMV, respectively. MYMV 1L1 has also a high degree of sequence similarity with BGMV 1L1, TGMV 1L1 and ACMV 1L1.

  18. The discrepancy among single nucleotide variants detected by DNA and RNA high throughput sequencing data.

    PubMed

    Guo, Yan; Zhao, Shilin; Sheng, Quanhu; Samuels, David C; Shyr, Yu

    2017-10-03

    High throughput sequencing technology enables the both the human genome and transcriptome to be screened at the single nucleotide resolution. Tools have been developed to infer single nucleotide variants (SNVs) from both DNA and RNA sequencing data. To evaluate how much difference can be expected between DNA and RNA sequencing data, and among tissue sources, we designed a study to examine the single nucleotide difference among five sources of high throughput sequencing data generated from the same individual, including exome sequencing from blood, tumor and adjacent normal tissue, and RNAseq from tumor and adjacent normal tissue. Through careful quality control and analysis of the SNVs, we found little difference between DNA-DNA pairs (1%-2%). However, between DNA-RNA pairs, SNV differences ranged anywhere from 10% to 20%. Only a small portion of these differences can be explained by RNA editing. Instead, the majority of the DNA-RNA differences should be attributed to technical errors from sequencing and post-processing of RNAseq data. Our analysis results suggest that SNV detection using RNAseq is subject to high false positive rates.

  19. Template sequence near the initiation nucleotide can modulate brome mosaic virus RNA accumulation in plant protoplasts.

    PubMed

    Hema, M; Kao, C Cheng

    2004-02-01

    Bromoviral templates for plus-strand RNA synthesis are rich in A or U nucleotides in comparison to templates for minus-strand RNA synthesis. Previous studies demonstrated that plus-strand RNA synthesis by the brome mosaic virus (BMV) RNA replicase is more efficient if the template contains an A/U-rich template sequence near the initiation site (K. Sivakumaran and C. C. Kao, J. Virol. 73:6415-6423, 1999). These observations led us to examine the effects of nucleotide changes near the template's initiation site on the accumulation of BMV RNA3 genomic minus-strand, genomic plus-strand, and subgenomic RNAs in barley protoplasts transfected with wild-type and mutant BMV transcripts. Mutations in the template for minus-strand synthesis had only modest effects on BMV replication in barley protoplasts. Mutants with changes to the +3, +5, and +7 template nucleotides accumulated minus-strand RNA at levels similar to the the wild-type level. However, mutations at positions adjacent to the initiation cytidylate in the templates for genomic and subgenomic plus-strand RNA synthesis significantly decreased RNA accumulation. For example, changes at the third template nucleotide for plus-strand RNA3 synthesis resulted in RNA accumulation at between 18 and 24% of the wild-type level, and mutations in the third template nucleotide for subgenomic RNA4 resulted in accumulations at between 7 and 14% of the wild-type level. The effects of the mutations generally decreased as the mutations occurred further from the initiation nucleotide. These findings demonstrate that there are different requirements of the template sequence near the initiation nucleotide for BMV RNA accumulation in plant cells.

  20. Prediction of human rotavirus serotype by nucleotide sequence analysis of the VP7 protein gene.

    PubMed Central

    Green, K Y; Sears, J F; Taniguchi, K; Midthun, K; Hoshino, Y; Gorziglia, M; Nishikawa, K; Urasawa, S; Kapikian, A Z; Chanock, R M

    1988-01-01

    Human rotavirus field isolates were characterized by direct sequence analysis of the gene encoding the serotype-specific major neutralization protein (VP7). Single-stranded RNA transcripts were prepared from virus particles obtained directly from stool specimens or after two or three passages in MA-104 cells. Two regions of the gene (nucleotides 307 through 351 and 670 through 711) which had previously been shown to contain regions of sequence divergence among rotavirus serotypes were sequenced by the dideoxynucleotide method with two different synthetic oligonucleotide primers. The resulting nucleotide sequences were compared with the corresponding sequences from rotaviruses of known serotype (serotype 1, 2, 3, or 4). A total of 25 field isolates and 10 laboratory strains examined by this method exhibited marked sequence identity in both areas of the gene with the corresponding regions of 1 of the 4 reference strains. In addition, the predicted serotype from the sequence analysis correlated in each case with the serotype determined when the rotaviruses were examined by plaque reduction neutralization or reactivity with serotype-specific monoclonal antibodies. These data suggest that as a result of the high degree of sequence conservation observed among rotaviruses of the same serotype, it is possible to predict the serotype of a rotavirus isolate by direct sequence analysis of its VP7 gene. PMID:2833626

  1. Complete nucleotide sequence and affinities of the genomic RNA of Narcissus common latent virus (genus Carlavirus).

    PubMed

    Zheng, H-Y; Chen, J; Adams, M J; Chen, J-P

    2006-08-01

    The complete sequence of an isolate of Narcissus common latent virus (NCLV) from Zhangzhou city, Fujian, China was determined from amplified fragments of purified viral RNA. Excluding the poly(A) tail, the genomic RNA of NCLV was 8539 nucleotides (nt) long and had the typical organization for a member of the genus Carlavirus. The most closely related species were Potato virus M, Hop latent virus and Aconitum latent virus, which had 58-59% nt identity to NCLV in their entire genomes. These relationships were confirmed by a phylogenetic analysis using a composite nucleotide alignment of all the open reading frames.

  2. Nucleotide sequence of miRNA precursor contributes to cleavage site selection by Dicer.

    PubMed

    Starega-Roslan, Julia; Galka-Marciniak, Paulina; Krzyzosiak, Wlodzimierz J

    2015-12-15

    The ribonuclease Dicer excises mature miRNAs from a diverse group of precursors (pre-miRNAs), most of which contain various secondary structure motifs in their hairpin stem. In this study, we analyzed Dicer cleavage in hairpin substrates deprived of such motifs. We searched for the factors other than the secondary structure, which may influence the length diversity and heterogeneity of miRNAs. We found that the nucleotide sequence at the Dicer cleavage site influences both of these miRNA characteristics. With regard to cleavage mechanism, we demonstrate that the Dicer RNase IIIA domain that cleaves within the 3' arm of the pre-miRNA is more sensitive to the nucleotide sequence of its substrate than is the RNase IIIB domain. The RNase IIIA domain avoids releasing miRNAs with G nucleotide and prefers to generate miRNAs with a U nucleotide at the 5' end. We also propose that the sequence restrictions at the Dicer cleavage site might be the factor that contributes to the generation of miRNA duplexes with 3' overhangs of atypical lengths. This finding implies that the two RNase III domains forming the single processing center of Dicer may exhibit some degree of flexibility, which allows for the formation of these non-standard 3' overhangs.

  3. Nucleotide sequence of miRNA precursor contributes to cleavage site selection by Dicer

    PubMed Central

    Starega-Roslan, Julia; Galka-Marciniak, Paulina; Krzyzosiak, Wlodzimierz J.

    2015-01-01

    The ribonuclease Dicer excises mature miRNAs from a diverse group of precursors (pre-miRNAs), most of which contain various secondary structure motifs in their hairpin stem. In this study, we analyzed Dicer cleavage in hairpin substrates deprived of such motifs. We searched for the factors other than the secondary structure, which may influence the length diversity and heterogeneity of miRNAs. We found that the nucleotide sequence at the Dicer cleavage site influences both of these miRNA characteristics. With regard to cleavage mechanism, we demonstrate that the Dicer RNase IIIA domain that cleaves within the 3′ arm of the pre-miRNA is more sensitive to the nucleotide sequence of its substrate than is the RNase IIIB domain. The RNase IIIA domain avoids releasing miRNAs with G nucleotide and prefers to generate miRNAs with a U nucleotide at the 5′ end. We also propose that the sequence restrictions at the Dicer cleavage site might be the factor that contributes to the generation of miRNA duplexes with 3′ overhangs of atypical lengths. This finding implies that the two RNase III domains forming the single processing center of Dicer may exhibit some degree of flexibility, which allows for the formation of these non-standard 3′ overhangs. PMID:26424848

  4. The complete nucleotide sequence of RNA beta from the type strain of barley stripe mosaic virus.

    PubMed Central

    Gustafson, G; Armour, S L

    1986-01-01

    The complete nucleotide sequence of RNA beta from the type strain of barley stripe mosaic virus (BSMV) has been determined. The sequence is 3289 nucleotides in length and contains four open reading frames (ORFs) which code for proteins of Mr 22,147 (ORF1), Mr 58,098 (ORF2), Mr 17,378 (ORF3), and Mr 14,119 (ORF4). The predicted N-terminal amino acid sequence of the polypeptide encoded by the ORF nearest the 5'-end of the RNA (ORF1) is identical (after the initiator methionine) to the published N-terminal amino acid sequence of BSMV coat protein for 29 of the first 30 amino acids. ORF2 occupies the central portion of the coding region of RNA beta and ORF3 is located at the 3'-end. The ORF4 sequence overlaps the 3'-region of ORF2 and the 5'-region of ORF3 and differs in codon usage from the other three RNA beta ORFs. The coding region of RNA beta is followed by a poly(A) tract and a 238 nucleotide tRNA-like structure which are common to all three BSMV genomic RNAs. Images PMID:3754962

  5. Nucleotide sequence and genome organization of atractylodes mottle virus, a new member of the genus Carlavirus.

    PubMed

    Zhao, Fumei; Igori, Davaajargal; Lim, Seungmo; Yoo, Ran Hee; Lee, Su-Heon; Moon, Jae Sun

    2015-11-01

    The complete genome sequence of a member of a distinct species of the genus Carlavirus in the family Betaflexiviridae, tentatively named atractylodes mottle virus (AtrMoV), has been determined. Analysis of its genomic organization indicates that it has a single-stranded, positive-sense genomic RNA of 8866 nucleotides, excluding the poly(A) tail, and consists of six open reading frames typical of members of the genus Carlavirus. The individual open reading frames of AtrMoV show moderately low sequence similarity to those of other carlaviruses at the nucleotide and amino acid sequence levels. Pairwise comparison and phylogenetic analysis suggest that AtrMoV is most closely related to chrysanthemum virus B.

  6. A novel HLA-B*51 allele (B*5116) identified by nucleotide sequencing.

    PubMed

    Tamouza, R; Carbonnelle, E; Schaeffer, V; Sadki, K; Abed, Y; Marzais, F; Poirier, J C; Fortier, C; Toubert, A; Raffoux, C; Charron, D

    2000-02-01

    We report here an additional HLA-B*51 variant designated HLA-B*5116. Detected by an abnormal serological reactivity pattern, this variant was identified as a B*51 allele by polymerase chain reaction using sequence-specific primers (PCR-SSP) and characterized by nucleotide sequencing. The new variant sequence match closely with the classical HLA-B*5101 excepted two adjacent nucleotide substitutions at positions 216 and 217 of the third exon and the subsequent Leucine to Glutamic acid change at codon 163 of the alpha2 domain (CTG-->GAG). This new variant was not detected in three different ethnic groups (French, Algerian and Lebanese) suggesting a very rare frequency.

  7. Isolation and nucleotide sequence of a cDNA clone encoding rat mitochondrial malate dehydrogenase.

    PubMed Central

    Grant, P M; Tellam, J; May, V L; Strauss, A W

    1986-01-01

    We have determined the complete sequence of the rat mitochondrial malate dehydrogenase (mMDH) precursor derived from nucleotide sequence of the cDNA. A single synthetic oligodeoxynucleotide probe was used to screen a rat atrial cDNA library constructed in lambda gt10. A 1.2 kb full-length cDNA clone provided the first complete amino acid sequence of pre-mMDH. The 1014 nucleotide-long open reading frame encodes the 314 residue long mature mMDH protein and a 24 amino acid NH2-terminal extension which directs mitochondrial import and is cleaved from the precursor after import to generate mature mMDH. The amino acid composition of the transit peptide is polar and basic. The pre-mMDH transit peptide shows marked homology with those of two other enzymes targeted to the rat mitochondrial matrix. Images PMID:3755817

  8. Nucleotide sequence of complementary DNA encoding for quaking protein of cow, horse and pig.

    PubMed

    Murata, Tomoaki; Yamashiro, Yasuhiro; Kondo, Tatsuya; Nakaichi, Munekazu; Une, Satoshi; Taura, Yasuho

    2005-08-01

    Complementary DNA (cDNA) for bovine quaking gene (Bqk), equine quaking gene (Eqk) and porcine quaking gene (Pqk), which are homologous to mouse quaking gene (qkI), were isolated, and their nucleotide sequences were determined. cDNA sequences of Bqk, Eqk and Pqk showed very high homology to that of qkI at nucleotide level; 94.2, 95.7 and 95.6%, respectively. Deduced amino acid sequences for Bqk, Eqk and Pqk perfectly matched to that of qkI. These findings suggest that the quaking gene family is highly conserved during mammalian evolution, and that Bqk, Eqk and Pqk are likely to have important biological functions also in cow, horse and pig.

  9. TUIT, a BLAST-Based tool for taxonomic classification of nucleotide sequences

    PubMed Central

    Tuzhikov, Alexander; Panchin, Alexander; Shestopalov, Valery I.

    2014-01-01

    Pyrosequencing of 16S ribosomal RNA (rRNA) genes has become the gold standard in human microbiome studies. The routine task of taxonomic classification using 16S rRNA reads is commonly performed by the Ribosomal Database Project (RDP) II Classifier, a robust tool that relies on a set of well-characterized reference sequences. However, the RDP II Classifier may be unable to classify a significant part of the dataset due to the absence of proper reference sequences. The taxonomic classification for some of the unclassified sequences might still be performed using BLAST searches against large and frequently updated nucleotide databases. Here we introduce TUIT (Taxonomic Unit Identification Tool) – an efficient open source and platform-independent application that can perform taxonomic classification on its own or can be used in combination with RDP II Classifier to maximize the taxonomic identification rate. Using a set of simulated DNA sequences we demonstrate that the algorithm performs taxonomic classification with high specificity for sequences as short as 125 base pairs. TUIT is applicable for 16S rRNA gene sequence classification; however, it is not restricted to 16S rRNA sequences. In addition, TUIT may be used as a complementary tool for effective taxonomic classification of nucleotide sequences generated by many current platforms, such as Roche 454 and Illumina. Standalone TUIT is available online at http://sourceforge.net/projects/tuit/. PMID:24502797

  10. Sequence selective naked-eye detection of DNA harnessing extension of oligonucleotide-modified nucleotides.

    PubMed

    Verga, Daniela; Welter, Moritz; Marx, Andreas

    2016-02-01

    DNA polymerases can efficiently and sequence selectively incorporate oligonucleotide (ODN)-modified nucleotides and the incorporated oligonucleotide strand can be employed as primer in rolling circle amplification (RCA). The effective amplification of the DNA primer by Φ29 DNA polymerase allows the sequence-selective hybridisation of the amplified strand with a G-quadruplex DNA sequence that has horse radish peroxidase-like activity. Based on these findings we develop a system that allows DNA detection with single-base resolution by naked eye.

  11. Complete nucleotide sequence analysis of a Dengue-1 virus isolated on Easter Island, Chile.

    PubMed

    Cáceres, C; Yung, V; Araya, P; Tognarelli, J; Villagra, E; Vera, L; Fernández, J

    2008-01-01

    Dengue-1 viruses responsible for the dengue fever outbreak in Easter Island in 2002 were isolated from acute-phase sera of dengue fever patients. In order to analyze the complete genome sequence, we designed primers to amplify contiguous segments across the entire sequence of the viral genome. RT-PCR products obtained were cloned, and complete nucleotide and deduced amino acid sequences were determined. This report constitutes the first complete genetic characterization of a DENV-1 isolate from Chile. Phylogenetic analysis shows that an Easter Island isolate is most closely related to Pacific DENV-1 genotype IV viruses.

  12. Complete nucleotide sequence of a subviral DNA molecule of porcine circovirus type 2.

    PubMed

    Wen, Han

    2016-07-01

    Porcine circovirus type 2 (PCV2) is a member of the genus Circovirus in the family Circoviridae. Most subgenomic molecules of PCV2 have been mapped. Here, the first full-length sequence of a subviral molecule of PCV2 (CH-IVT12) containing a reverse complement sequence of the PCV2 genome was determined by sequencing DNA extracted from PK15 cells infected with PCV2. The circular CH-IVT12 DNA consists of 1136 nucleotides and contains one major open reading frame.

  13. Nucleotide sequences of the coat protein genes of two Japanese zucchini yellow mosaic virus isolates.

    PubMed

    Kundu, A K; Ohshima, K; Sako, N

    1997-10-01

    The nucleotide (nt) sequences of the coat protein (CP) genes of two Japanese zucchini yellow mosaic virus (ZYMV) isolates (ZYMV-169 and ZYMV-M) were determined. The CP genes of both isolates were 837 nt long and encoded 279 amino acids (aa). The nt and deduced aa sequence similarities between the two isolates were 92% and 94.6%, respectively. The deduced aa sequences of CPs of the Japanese isolates were compared with those of previously reported ZYMV isolates by phylogenetic analysis. This comparison lead us to divide all ZMYV isolates into 3 groups in which ZYMV-169 formed its own distinct group.

  14. Nucleotide sequence of a new isolate of ribgrass mosaic tobamovirus infecting Impatiens New Guinea.

    PubMed

    Wetzel, T; Njapo Ngangom, H O; Chotewutmontri, S; Krczal, G

    2006-04-01

    The complete nucleotide sequence of a tobamovirus isolated from Impatiens New Guinea was determined. The genome was 6302 nt long, and its genomic organisation was similar to those of other crucufer tobamoviruses. Sequence comparisons with the corresponding sequences of other crucifer tobamoviruses revealed highest levels of identity with the ribgrass mosaic virus (Shanghai isolate). A small open reading frame putatively encoding a 4.5-kDa protein with a low degree of similarity to the ORF6 of tobacco mosaic virus was found nested in the movement protein gene.

  15. The nucleotide sequence at the termini of adenovirus type 5 DNA.

    PubMed Central

    Steenbergh, P H; Maat, J; van Ormondt, H; Sussenbach, J S

    1977-01-01

    The sequences of the first 194 base pairs at both termini of adenovirus type 5 (Ad5) DNA have been determined, using the chemical degradation technique developed by Maxam and Gilbert (Proc. Nat. Acad. Sci. USA 74 (1977), pp. 560-564). The nucleotide sequences 1-75 were confirmed by analysis of labeled RNA transcribed from the terminal HhaI fragments in vitro. The sequence data show that Ad5 DNA has a perfect inverted terminal repetition of 103 base pairs long. Images PMID:600799

  16. Nucleotide sequence and genome organization of Dweet mottle virus and its relationship to members of the family Betaflexiviridae

    USDA-ARS?s Scientific Manuscript database

    The nucleotide sequence of Dweet mottle virus (DMV) was determined and compared to sequences of members of the family Alpha- and Beta-flexiviridae. The DMV genome has 8747 nucleotides (nt) excluding the poly-(A) tail at the 3’ end of the genome. The overall G+C content of DMV genomic RNA is 40%. D...

  17. Nucleotide sequence characterization of Ty 1-17, a class II transposon from yeast.

    PubMed Central

    Warmington, J R; Waring, R B; Newlon, C S; Indge, K J; Oliver, S G

    1985-01-01

    We have determined the nucleotide sequence of a class II yeast transposon (Ty 1-17) which is found just centromere-distal to the LEU2 structural gene on chromosome III of Saccharomyces cerevisiae. The complete element is 5961 bp long and is bounded by two identical, directly repeated, delta sequences of 332 bp each. The sequence organization indicates that Ty 1-17 is a retrotransposon, like the class I elements characterized previously. It contains two long open reading-frames, TyA (439 amino acids) and TyB (1349 amino acids). In this paper, the sequences of the two classes of yeast transposon are compared with one another and with analogous elements, such as retroviral proviruses, cauliflower mosaic virus and copia sequences. Features of the Ty 1-17 sequence which may be important to its mechanism of transposition and its genetic action are discussed. PMID:2997719

  18. PatMatch: a program for finding patterns in peptide and nucleotide sequences

    PubMed Central

    Yan, Thomas; Yoo, Danny; Berardini, Tanya Z.; Mueller, Lukas A.; Weems, Dan C.; Weng, Shuai; Cherry, J. Michael; Rhee, Seung Y.

    2005-01-01

    Here, we present PatMatch, an efficient, web-based pattern-matching program that enables searches for short nucleotide or peptide sequences such as cis-elements in nucleotide sequences or small domains and motifs in protein sequences. The program can be used to find matches to a user-specified sequence pattern that can be described using ambiguous sequence codes and a powerful and flexible pattern syntax based on regular expressions. A recent upgrade has improved performance and now supports both mismatches and wildcards in a single pattern. This enhancement has been achieved by replacing the previous searching algorithm, scan_for_matches [D'Souza et al. (1997), Trends in Genetics, 13, 497–498], with nondeterministic-reverse grep (NR-grep), a general pattern matching tool that allows for approximate string matching [Navarro (2001), Software Practice and Experience, 31, 1265–1312]. We have tailored NR-grep to be used for DNA and protein searches with PatMatch. The stand-alone version of the software can be adapted for use with any sequence dataset and is available for download at The Arabidopsis Information Resource (TAIR) at . The PatMatch server is available on the web at for searching Arabidopsis thaliana sequences. PMID:15980466

  19. Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences.

    PubMed

    Chen, Wei; Lin, Hao; Chou, Kuo-Chen

    2015-10-01

    With the avalanche of DNA/RNA sequences generated in the post-genomic age, it is urgent to develop automated methods for analyzing the relationship between the sequences and their functions. Towards this goal, a series of sequence-based methods have been proposed and applied to analyze various character-unknown DNA/RNA sequences in order for in-depth understanding their action mechanisms and processes. Compared with the classical sequence-based methods, the pseudo nucleotide composition or PseKNC approach developed very recently has the following advantages: (1) it can convert length-different DNA/RNA sequences into dimension-fixed digital vectors that can be directly handled by all the existing machine-learning algorithms or operation engines; (2) it can contain the desired features and properties according to the selection or definition of users; (3) it can cover considerable sequence pattern information, both local and global. This minireview is focused on the concept of pseudo nucleotide composition, its development and applications.

  20. Nucleotide binding database NBDB – a collection of sequence motifs with specific protein-ligand interactions

    PubMed Central

    Zheng, Zejun; Goncearenco, Alexander; Berezovsky, Igor N.

    2016-01-01

    NBDB database describes protein motifs, elementary functional loops (EFLs) that are involved in binding of nucleotide-containing ligands and other biologically relevant cofactors/coenzymes, including ATP, AMP, ATP, GMP, GDP, GTP, CTP, PAP, PPS, FMN, FAD(H), NAD(H), NADP, cAMP, cGMP, c-di-AMP and c-di-GMP, ThPP, THD, F-420, ACO, CoA, PLP and SAM. The database is freely available online at http://nbdb.bii.a-star.edu.sg. In total, NBDB contains data on 249 motifs that work in interactions with 24 ligands. Sequence profiles of EFL motifs were derived de novo from nonredundant Uniprot proteome sequences. Conserved amino acid residues in the profiles interact specifically with distinct chemical parts of nucleotide-containing ligands, such as nitrogenous bases, phosphate groups, ribose, nicotinamide, and flavin moieties. Each EFL profile in the database is characterized by a pattern of corresponding ligand–protein interactions found in crystallized ligand–protein complexes. NBDB database helps to explore the determinants of nucleotide and cofactor binding in different protein folds and families. NBDB can also detect fragments that match to profiles of particular EFLs in the protein sequence provided by user. Comprehensive information on sequence, structures, and interactions of EFLs with ligands provides a foundation for experimental and computational efforts on design of required protein functions. PMID:26507856

  1. PCV: An Alignment Free Method for Finding Homologous Nucleotide Sequences and its Application in Phylogenetic Study.

    PubMed

    Kumar, Rajnish; Mishra, Bharat Kumar; Lahiri, Tapobrata; Kumar, Gautam; Kumar, Nilesh; Gupta, Rahul; Pal, Manoj Kumar

    2017-06-01

    Online retrieval of the homologous nucleotide sequences through existing alignment techniques is a common practice against the given database of sequences. The salient point of these techniques is their dependence on local alignment techniques and scoring matrices the reliability of which is limited by computational complexity and accuracy. Toward this direction, this work offers a novel way for numerical representation of genes which can further help in dividing the data space into smaller partitions helping formation of a search tree. In this context, this paper introduces a 36-dimensional Periodicity Count Value (PCV) which is representative of a particular nucleotide sequence and created through adaptation from the concept of stochastic model of Kolekar et al. (American Institute of Physics 1298:307-312, 2010. doi: 10.1063/1.3516320 ). The PCV construct uses information on physicochemical properties of nucleotides and their positional distribution pattern within a gene. It is observed that PCV representation of gene reduces computational cost in the calculation of distances between a pair of genes while being consistent with the existing methods. The validity of PCV-based method was further tested through their use in molecular phylogeny constructs in comparison with that using existing sequence alignment methods.

  2. Nucleotide sequence and expression of the 14-3-3 from the halotolerant alga Dunaliella salina.

    PubMed

    Wang, Tian-yun; Jing, Chang-Qin; Dong, Wei-Hua; Zhang, Jun-He; Zhang, Yu

    2010-02-01

    Previously we reported the nucleotide sequence of a 14-3-3 cDNA cloned from the unicellular green alga Dunaliella salina, however, the nucleotide sequence of this gene have not been reported so far. In the present study, the cloning and characterization of the nucleotide sequence, the gene copy and expression were undertaken. The coding sequence of the gene was found to be interrupted by five introns of 132, 266, 153, 152 and 625 bp, respectively. Introns 3-5 were found in conserved positions as compared to the Chlamydomonas reinhardtii 14-3-3 gene. D. salina 14-3-3 cDNA was inserted into the prokaryotic expression plasmid pET-28 and transformed into E. coli BL21, and the recombinant expressed 14-3-3 protein was purified from E. coli and immunized the rabbit. Indirect ELISA coated with 14-3-3 illustrated that the rabbit antisera titration was 1:1.00E + 06. Western blotting assays confirmed that prepared rabbit antibodies could recognize the recombinant 14-3-3 protein. Southern blotting results showed that there was only one copy of the 14-3-3 present in the genome of D. salina and 14-3-3 expression did not change throughout the Dnualiella cell cycle.

  3. Linking the human cytogenetic map with nucleotide sequence: the CCAP clone set.

    PubMed

    Jang, Wonhee; Yonescu, Raluca; Knutsen, Turid; Brown, Theresa; Reppert, Tricia; Sirotkin, Karl; Schuler, Gregory D; Ried, Thomas; Kirsch, Ilan R

    2006-07-15

    We present the completed dataset and clone repository of the Cancer Chromosome Aberration Project (CCAP), an initiative developed and funded through the intramural program of the U.S. National Cancer Institute, to provide seamless linkage of human cytogenetic markers with the primary nucleotide sequence of the human genome. Spaced at 1-2 Mb intervals across the human genome, 1,339 bacterial artificial chromosome (BAC) clones have been localized to chromosomal bands through high-resolution fluorescence in situ hybridization (FISH) mapping. Of these clones, 99.8% can be positioned on the primary human genome sequence and 95% are placed at or close to their precise nucleotide starts and stops. This dataset can be studied and manipulated within generally available public Web sites. The clones are available from a commercial repository. The CCAP BAC clone set provides anchors for the interrogation of gene and sequence involvement in oncogenic and developmental disorders when the starting point is the recognition of a structural, numerical, or interstitial chromosomal aberration. This dataset also provides a current view of the quality and coherence of the available genome sequence and insight into the nucleotide and three-dimensional structures that manifest as Giemsa light and dark chromosomal banding patterns.

  4. Quadfinder: server for identification and analysis of quadruplex-forming motifs in nucleotide sequences

    PubMed Central

    Scaria, Vinod; Hariharan, Manoj; Arora, Amit; Maiti, Souvik

    2006-01-01

    G-quadruplex secondary structures, which play a structural role in repetitive DNA such as telomeres, may also play a functional role at other genomic locations as targetable regulatory elements which control gene expression. The recent interest in application of quadruplexes in biological systems prompted us to develop a tool for the identification and analysis of quadruplex-forming nucleotide sequences especially in the RNA. Here we present Quadfinder, an online server for prediction and bioinformatics of uni-molecular quadruplex-forming nucleotide sequences. The server is designed to be user-friendly and needs minimal intervention by the user, while providing flexibility of defining the variants of the motif. The server is freely available at URL . PMID:16845097

  5. Nucleotide sequence variability of the Adh gene of the coastal plant Calystegia soldanella (Convolvulaceae) in Japan.

    PubMed

    Ohsako, Takanori; Matsuoka, Gakuto

    2008-02-01

    Calystegia soldanella (Convolvulaceae) is a self-incompatible perennial herb distributed on sandy seashores throughout the temperate zone of the world. In Japan, the species occasionally grows on the sandy shores of Lake Biwa. To clarify the genetic differentiation among local populations, we investigated the nucleotide sequence variability of the Adh gene. In a 1625-bp sequence between exon 2 and the 3' noncoding region of the Adh gene, a total of 44 polymorphic sites were found among 91 individuals from 19 populations. The nucleotide diversity for the entire sample was 0.00212. Similar values were determined for geographical groups of populations. No genetic differentiation among the groups of populations was found. The complete lack of genetic differentiation between the sea coastal populations and the inland populations could not be attributed to gene flow. Although the inland populations are geographically isolated from the sea coastal populations, the time since separation might be insufficient to establish significant genetic differentiation.

  6. Nucleotide sequence and replication properties of the Bacillus borstelensis cryptic plasmid pHT926.

    PubMed Central

    Ebisu, S; Murahashi, Y; Takagi, H; Kadowaki, K; Yamaguchi, K; Yamagata, H; Udaka, S

    1995-01-01

    The nucleotide sequence of pHT926, a cryptic plasmid found in Bacillus borstelensis HP926, was determined. pHT926 replicates by a rolling-circle mechanism and belongs to the pC194 plasmid family. The copy number of pHT926 was fourfold higher than that of pUB110 and very stably maintained in Bacillus choshinensis. PMID:7487045

  7. The complete nucleotide sequence of the mitochondrial genome of Phthonandria atrilineata (Lepidoptera: Geometridae).

    PubMed

    Yang, Ling; Wei, Zhao-Jun; Hong, Gui-Yun; Jiang, Shao-Tong; Wen, Long-Ping

    2009-07-01

    Using long-polymerase chain reaction (Long-PCR) method, we determined the complete nucleotide sequence of the mitochondrial genome (mitogenome) of Phthonandria atrilineata. The complete mtDNA from P. atrilineata was 15,499 base pairs in length and contained 13 protein-coding genes (PCGs), 2 rRNA genes, 22 tRNA genes, and a control region. The P. atrilineata genes were in the same order and orientation as the completely sequenced mitogenomes of other lepidopteran species. The nucleotide composition of P. atrilineata mitogenome was biased toward A + T nucleotides (81.02%), and the 13 PCGs show different A + T contents that range from 73.25% (cox1) to 92.12% (atp8). Phthonandria had the canonical set of 22 tRNA genes, that fold in the typical cloverleaf structure described for metazoan mt tRNAs, with the unique exception of trnS(AGN). The phylogenetic relationships were reconstructed with the concatenated sequences of the 13 PCGs of the mitochondrial genome, which confirmed that P. atrilineata is most closely related to the superfamily Bombycoidea.

  8. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence editors...

  9. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence editors...

  10. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence editors...

  11. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence editors...

  12. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence editors...

  13. Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data.

    PubMed

    Dunn, Joshua G; Weissman, Jonathan S

    2016-11-22

    Next-generation sequencing (NGS) informs many biological questions with unprecedented depth and nucleotide resolution. These assays have created a need for analytical tools that enable users to manipulate data nucleotide-by-nucleotide robustly and easily. Furthermore, because many NGS assays encode information jointly within multiple properties of read alignments - for example, in ribosome profiling, the locations of ribosomes are jointly encoded in alignment coordinates and length - analytical tools are often required to extract the biological meaning from the alignments before analysis. Many assay-specific pipelines exist for this purpose, but there remains a need for user-friendly, generalized, nucleotide-resolution tools that are not limited to specific experimental regimes or analytical workflows. Plastid is a Python library designed specifically for nucleotide-resolution analysis of genomics and NGS data. As such, Plastid is designed to extract assay-specific information from read alignments while retaining generality and extensibility to novel NGS assays. Plastid represents NGS and other biological data as arrays of values associated with genomic or transcriptomic positions, and contains configurable tools to convert data from a variety of sources to such arrays. Plastid also includes numerous tools to manipulate even discontinuous genomic features, such as spliced transcripts, with nucleotide precision. Plastid automatically handles conversion between genomic and feature-centric coordinates, accounting for splicing and strand, freeing users of burdensome accounting. Finally, Plastid's data models use consistent and familiar biological idioms, enabling even beginners to develop sophisticated analytical workflows with minimal effort. Plastid is a versatile toolkit that has been used to analyze data from multiple NGS assays, including RNA-seq, ribosome profiling, and DMS-seq. It forms the genomic engine of our ORF annotation tool, ORF-RATER, and is readily

  14. Nucleotide sequence variation of the envelope protein gene identifies two distinct genotypes of yellow fever virus.

    PubMed

    Chang, G J; Cropp, B C; Kinney, R M; Trent, D W; Gubler, D J

    1995-09-01

    The evolution of yellow fever virus over 67 years was investigated by comparing the nucleotide sequences of the envelope (E) protein genes of 20 viruses isolated in Africa, the Caribbean, and South America. Uniformly weighted parsimony algorithm analysis defined two major evolutionary yellow fever virus lineages designated E genotypes I and II. E genotype I contained viruses isolated from East and Central Africa. E genotype II viruses were divided into two sublineages: IIA viruses from West Africa and IIB viruses from America, except for a 1979 virus isolated from Trinidad (TRINID79A). Unique signature patterns were identified at 111 nucleotide and 12 amino acid positions within the yellow fever virus E gene by signature pattern analysis. Yellow fever viruses from East and Central Africa contained unique signatures at 60 nucleotide and five amino acid positions, those from West Africa contained unique signatures at 25 nucleotide and two amino acid positions, and viruses from America contained such signatures at 30 nucleotide and five amino acid positions in the E gene. The dissemination of yellow fever viruses from Africa to the Americas is supported by the close genetic relatedness of genotype IIA and IIB viruses and genetic evidence of a possible second introduction of yellow fever virus from West Africa, as illustrated by the TRINID79A virus isolate. The E protein genes of American IIB yellow fever viruses had higher frequencies of amino acid substitutions than did genes of yellow fever viruses of genotypes I and IIA on the basis of comparisons with a consensus amino acid sequence for the yellow fever E gene. The great variation in the E proteins of American yellow fever virus probably results from positive selection imposed by virus interaction with different species of mosquitoes or nonhuman primates in the Americas.

  15. Nucleotide sequence variation of the envelope protein gene identifies two distinct genotypes of yellow fever virus.

    PubMed Central

    Chang, G J; Cropp, B C; Kinney, R M; Trent, D W; Gubler, D J

    1995-01-01

    The evolution of yellow fever virus over 67 years was investigated by comparing the nucleotide sequences of the envelope (E) protein genes of 20 viruses isolated in Africa, the Caribbean, and South America. Uniformly weighted parsimony algorithm analysis defined two major evolutionary yellow fever virus lineages designated E genotypes I and II. E genotype I contained viruses isolated from East and Central Africa. E genotype II viruses were divided into two sublineages: IIA viruses from West Africa and IIB viruses from America, except for a 1979 virus isolated from Trinidad (TRINID79A). Unique signature patterns were identified at 111 nucleotide and 12 amino acid positions within the yellow fever virus E gene by signature pattern analysis. Yellow fever viruses from East and Central Africa contained unique signatures at 60 nucleotide and five amino acid positions, those from West Africa contained unique signatures at 25 nucleotide and two amino acid positions, and viruses from America contained such signatures at 30 nucleotide and five amino acid positions in the E gene. The dissemination of yellow fever viruses from Africa to the Americas is supported by the close genetic relatedness of genotype IIA and IIB viruses and genetic evidence of a possible second introduction of yellow fever virus from West Africa, as illustrated by the TRINID79A virus isolate. The E protein genes of American IIB yellow fever viruses had higher frequencies of amino acid substitutions than did genes of yellow fever viruses of genotypes I and IIA on the basis of comparisons with a consensus amino acid sequence for the yellow fever E gene. The great variation in the E proteins of American yellow fever virus probably results from positive selection imposed by virus interaction with different species of mosquitoes or nonhuman primates in the Americas. PMID:7637022

  16. Nucleotide sequencing and characterization of the genes encoding benzene oxidation enzymes of Pseudomonas putida.

    PubMed Central

    Irie, S; Doi, S; Yorifuji, T; Takagi, M; Yano, K

    1987-01-01

    The nucleotide sequence of the genes from Pseudomonas putida encoding oxidation of benzene to catechol was determined. Five open reading frames were found in the sequence. Four corresponding protein molecules were detected by a DNA-directed in vitro translation system. Escherichia coli cells containing the fragment with the four open reading frames transformed benzene to cis-benzene glycol, which is an intermediate of the oxidation of benzene to catechol. The relation between the product of each cistron and the components of the benzene oxidation enzyme system is discussed. Images PMID:3667527

  17. Nucleotide Sequence Analysis of RNA Synthesized from Rabbit Globin Complementary DNA

    PubMed Central

    Poon, Raymond; Paddock, Gary V.; Heindell, Howard; Whitcome, Philip; Salser, Winston; Kacian, Dan; Bank, Arthur; Gambino, Roberto; Ramirez, Francesco

    1974-01-01

    Rabbit globin complementary DNA made with RNA-dependent DNA polymerase (reverse transcriptase) was used as template for in vitro synthesis of 32P-labeled RNA. The sequences of the nucleotides in most of the fragments resulting from combined ribonuclease T1 and alkaline phosphatase digestion have been determined. Several fragments were long enough to fit uniquely with the α or β globin amino-acid sequences. These data demonstrate that the cDNA was copied from globin mRNA and contained no detectable contaminants. Images PMID:4139714

  18. SinicView: a visualization environment for comparisons of multiple nucleotide sequence alignment tools.

    PubMed

    Shih, Arthur Chun-Chieh; Lee, D T; Lin, Laurent; Peng, Chin-Lin; Chen, Shiang-Heng; Wu, Yu-Wei; Wong, Chun-Yi; Chou, Meng-Yuan; Shiao, Tze-Chang; Hsieh, Mu-Fen

    2006-03-02

    Deluged by the rate and complexity of completed genomic sequences, the need to align longer sequences becomes more urgent, and many more tools have thus been developed. In the initial stage of genomic sequence analysis, a biologist is usually faced with the questions of how to choose the best tool to align sequences of interest and how to analyze and visualize the alignment results, and then with the question of whether poorly aligned regions produced by the tool are indeed not homologous or are just results due to inappropriate alignment tools or scoring systems used. Although several systematic evaluations of multiple sequence alignment (MSA) programs have been proposed, they may not provide a standard-bearer for most biologists because those poorly aligned regions in these evaluations are never discussed. Thus, a tool that allows cross comparison of the alignment results obtained by different tools simultaneously could help a biologist evaluate their correctness and accuracy. In this paper, we present a versatile alignment visualization system, called SinicView, (for Sequence-aligning INnovative and Interactive Comparison VIEWer), which allows the user to efficiently compare and evaluate assorted nucleotide alignment results obtained by different tools. SinicView calculates similarity of the alignment outputs under a fixed window using the sum-of-pairs method and provides scoring profiles of each set of aligned sequences. The user can visually compare alignment results either in graphic scoring profiles or in plain text format of the aligned nucleotides along with the annotations information. We illustrate the capabilities of our visualization system by comparing alignment results obtained by MLAGAN, MAVID, and MULTIZ, respectively. With SinicView, users can use their own data sequences to compare various alignment tools or scoring systems and select the most suitable one to perform alignment in the initial stage of sequence analysis.

  19. Nucleotide sequences of immunoglobulin eta genes of chimpanzee and orangutan: DNA molecular clock and hominoid evolution

    SciTech Connect

    Sakoyama, Y.; Hong, K.J.; Byun, S.M.; Hisajima, H.; Ueda, S.; Yaoita, Y.; Hayashida, H.; Miyata, T.; Honjo, T.

    1987-02-01

    To determine the phylogenetic relationships among hominoids and the dates of their divergence, the complete nucleotide sequences of the constant region of the immunoglobulin eta-chain (C/sub eta1/) genes from chimpanzee and orangutan have been determined. These sequences were compared with the human eta-chain constant-region sequence. A molecular clock (silent molecular clock), measured by the degree of sequence divergence at the synonymous (silent) positions of protein-encoding regions, was introduced for the present study. From the comparison of nucleotide sequences of ..cap alpha../sub 1/-antitrypsin and ..beta..- and delta-globulin genes between humans and Old World monkeys, the silent molecular clock was calibrated: the mean evolutionary rate of silent substitution was determined to be 1.56 x 10/sup -9/ substitutions per site per year. Using the silent molecular clock, the mean divergence dates of chimpanzee and orangutan from the human lineage were estimated as 6.4 +/- 2.6 million years and 17.3 +/- 4.5 million years, respectively. It was also shown that the evolutionary rate of primate genes is considerably slower than those of other mammalian genes.

  20. Analysis of a cloned colicin Ib gene: complete nucleotide sequence and implications for regulation of expression.

    PubMed Central

    Varley, J M; Boulnois, G J

    1984-01-01

    The complete nucleotide sequence of a 2,971 base pair EcoRI fragment carrying the structural gene for colicin Ib has been determined. The length of the gene is 1,881 nucleotides which is predicted to produce a protein of 626 amino acids and of molecular weight 71,364. The structural gene is flanked by likely promoter and terminator signals and in between the promoter and the ribosome binding site is an inverted repeat sequence which resembles other sequences known to bind the LexA protein. Further analysis of the 5' flanking sequences revealed a second region which may act either as a second LexA binding site and/or in the binding of cyclic AMP receptor protein. Comparison of the predicted amino acid sequence of colicin Ib with that of colicins A and E1 reveals localised homology. The implications of these similarities in the proteins and of regulation of the colicin Ib structural gene are discussed. Images PMID:6091036

  1. Nucleotide sequence of an exceptionally long 5.8S ribosomal RNA from Crithidia fasciculata.

    PubMed Central

    Schnare, M N; Gray, M W

    1982-01-01

    In Crithidia fasciculata, a trypanosomatid protozoan, the large ribosomal subunit contains five small RNA species (e, f, g, i, j) in addition to 5S rRNA [Gray, M.W. (1981) Mol. Cell. Biol. 1, 347-357]. The complete primary sequence of species i is shown here to be pAACGUGUmCGCGAUGGAUGACUUGGCUUCCUAUCUCGUUGA ... AGAmACGCAGUAAAGUGCGAUAAGUGGUApsiCAAUUGmCAGAAUCAUUCAAUUACCGAAUCUUUGAACGAAACGG ... CGCAUGGGAGAAGCUCUUUUGAGUCAUCCCCGUGCAUGCCAUAUUCUCCAmGUGUCGAA(C)OH. This sequence establishes that species i is a 5.8S rRNA, despite its exceptional length (171-172 nucleotides). The extra nucleotides in C. fasciculata 5.8S rRNA are located in a region whose primary sequence and length are highly variable among 5.8S rRNAs, but which is capable of forming a stable hairpin loop structure (the "G+C-rich hairpin"). The sequence of C. fasciculata 5.8S rRNA is no more closely related to that of another protozoan, Acanthamoeba castellanii, than it is to representative 5.8S rRNA sequences from the other eukaryotic kingdoms, emphasizing the deep phylogenetic divisions that seem to exist within the Kingdom Protista. Images PMID:7079176

  2. Cloning, expression, and nucleotide sequence of the Lactobacillus helveticus 481 gene encoding the bacteriocin helveticin J.

    PubMed Central

    Joerger, M C; Klaenhammer, T R

    1990-01-01

    Lactobacillus helveticus 481 produces a 37-kDa bacteriocin called helveticin J. Libraries of chromosomal DNA from L. helveticus were prepared in lambda gt11 and probed for phage-producing fusion proteins that could react with polyclonal helveticin J antibody. Two recombinant phage, HJ1 and HJ4, containing homologous inserts of 350 and 600 bp, respectively, produced proteins that reacted with antibody. These two phage clones specifically hybridized to L. helveticus 481 total genomic DNA but not to DNA from strains that did not produce helveticin J or strains producing unrelated bacteriocins. HJ1 and HJ4 lysogens produced beta-galactosidase fusion proteins that shared similar epitopes with each other and helveticin J. The intact helveticin J gene (hlv) was isolated by screening a library of L. helveticus chromosomal DNA in lambda EMBL3 with the insert DNA from phage HJ4 as a probe. The DNA sequence of a contiguous 3,364-bp region was determined. Two complete open reading frames (ORF), designated ORF2 and ORF3, were identified within the sequenced fragment. The 3' end of another open reading frame, ORF1, was located upstream of ORF2. A noncoding region and a putative promoter were located between ORF1 and ORF2. ORF2 could encode an 11,808-Da protein. The L. helveticus DNA inserts of the HJ1 and HJ4 clones reside within ORF3, which begins 30 bp downstream from the termination codon of ORF2. ORF3 could encode a 37,511-Da protein. Downstream from ORF3, the 5' end of another ORF (ORF4) was found. A Bg/II fragment containing ORF2 and ORF3 was cloned into pGK12, and the recombinant plasmid, pTRK135, was transformed into Lactobacillus acidophilus via electroporation. Transformants carrying pTRK135 produced a bacteriocin that was heat labile and exhibited an acitivity spectrum that was the same as that of helveticin J. Images PMID:2228964

  3. Large-scale detection and application of expressed sequence tag single nucleotide polymorphisms in Nicotiana.

    PubMed

    Wang, Y; Zhou, D; Wang, S; Yang, L

    2015-07-14

    Single nucleotide polymorphisms (SNPs) are widespread in the Nicotiana genome. Using an alignment and variation detection method, we developed 20,607,973 SNPs, based on the expressed sequence tag sequences of 10 Nicotiana species. The replacement rate was much higher than the transversion rate in the SNPs, and SNPs widely exist in the Nicotiana. In vitro verification indicated that all of the SNPs were high quality and accurate. Evolutionary relationships between 15 varieties were investigated by polymerase chain reaction with a special primer; the specific 302 locus of these sequence results clearly indicated the origin of Zhongyan 100. A database of Nicotiana SNPs (NSNP) was developed to store and search for SNPs in Nicotiana. NSNP is a tool for researchers to develop SNP markers of sequence data.

  4. HPC-CLUST: distributed hierarchical clustering for large sets of nucleotide sequences.

    PubMed

    Matias Rodrigues, João F; von Mering, Christian

    2014-01-15

    Nucleotide sequence data are being produced at an ever increasing rate. Clustering such sequences by similarity is often an essential first step in their analysis-intended to reduce redundancy, define gene families or suggest taxonomic units. Exact clustering algorithms, such as hierarchical clustering, scale relatively poorly in terms of run time and memory usage, yet they are desirable because heuristic shortcuts taken during clustering might have unintended consequences in later analysis steps. Here we present HPC-CLUST, a highly optimized software pipeline that can cluster large numbers of pre-aligned DNA sequences by running on distributed computing hardware. It allocates both memory and computing resources efficiently, and can process more than a million sequences in a few hours on a small cluster. Source code and binaries are freely available at http://meringlab.org/software/hpc-clust/; the pipeline is implemented in Cþþ and uses the Message Passing Interface (MPI) standard for distributed computing.

  5. Molecular cloning and nucleotide sequence of rat lingual lipase cDNA.

    PubMed Central

    Docherty, A J; Bodmer, M W; Angal, S; Verger, R; Riviere, C; Lowe, P A; Lyons, A; Emtage, J S; Harris, T J

    1985-01-01

    Purified rat lingual lipase (EC3113), a glycoprotein of approximate molecular weight 52,000, was used to generate polyclonal antibodies which were able to recognise the denatured and deglycosylated enzyme. These immunoglobulins were used to screen a cDNA library prepared from mRNA isolated from the serous glands of rat tongue cloned in E. coli expression vectors. An almost full length cDNA clone was isolated and the nucleotide and predicted amino acid sequence obtained. Comparison with the N-terminal amino acid sequence of the purified enzyme confirmed the identity of the cDNA and indicated that there was a hydrophobic signal sequence of 18 residues. The amino acid sequence of mature rat lingual lipase consists of 377 residues and shares little homology with porcine pancreatic lipase apart from a short region containing a serine residue at an analogous position to the ser 152 of the porcine enzyme. Images PMID:3839077

  6. Complete nucleotide sequence of a circular plasmid from the Lyme disease spirochete, Borrelia burgdorferi.

    PubMed Central

    Dunn, J J; Buchstein, S R; Butler, L L; Fisenne, S; Polin, D S; Lade, B N; Luft, B J

    1994-01-01

    We have determined the complete nucleotide sequence of a small circular plasmid from the spirochete Borrelia burgdorferi Ip21, the agent of Lyme disease. The plasmid (cp8.3/Ip21) is 8,303 bp long, has a 76.6% A+T content, and is unstable upon passage of cells in vitro. An analysis of the sequence revealed the presence of two nearly perfect copies of a 184-bp inverted repeat sequence separated by 2,675 bp containing three closely spaced, but nonoverlapping, open reading frames (ORFs). Each inverted repeat ends in sequences that may function as signals for the initiation of transcription and translation of flanking plasmid sequences. A unique oligonucleotide probe based on the repeated sequence showed that the DNA between the repeats is present predominantly in a single orientation. Additional copies of the repeat were not detected elsewhere in the Ip21 genome. An analysis for potential ORFs indicates that the plasmid has nine highly probable protein-coding ORFs and one that is less probable; together, they occupy almost 71% of the nucleotide sequence. Analysis of the deduced amino acid sequences of the ORFs revealed one (ORF-9) with features in common with Borrelia lipoproteins and another (ORF-2) having limited homology with a replication protein, RepC, from a gram-positive plasmid that replicates by a rolling circle (RC) mechanism. Known collectively as RC plasmids, such plasmids require a double-stranded origin at which the Rep protein nicks the DNA to generate a single-stranded replication intermediate. cp8.3/Ip21 has three copies of the heptameric motif characteristically found at a nick site of most RC plasmids. These observations suggest that cp8.3/Ip21 may replicate by an RC mechanism. Images PMID:8169221

  7. PAPNC, a novel method to calculate nucleotide diversity from large scale next generation sequencing data

    PubMed Central

    Shao, Wei; Kearney, Mary F.; Boltz, Valerie F.; Spindler, Jonathan E.; Mellors, John W.; Maldarelli, Frank; Coffin, John M.

    2014-01-01

    Estimating viral diversity in infected patients can provide insight into pathogen evolution and emergence of drug resistance. With the widespread adoption of deep sequencing, it is important to develop tools to accurately calculate population diversity from very large datasets. Current methods for estimating diversity that are based on multiple alignments are not practical to apply to such data. In this study, the authors report a novel method (Pairwise Alignment Positional Nucleotide Counting, PAPNC) for estimating population diversity from 454 sequence data. The diversity measurements determined using this method were comparable to those calculated by average pairwise difference (APD) of multiply aligned sequences using MEGA5. Diversities were estimated for 9 patient plasma HIV samples sequenced with Titanium 454 technology and by single-genome sequencing (SGS). Diversities calculated from deep sequencing using PAPNC ranged from 0.002 to 0.021 while APD measurements calculated from SGS data ranged proximately from 0.001 to 0.018, with the difference being attributable to PCR error (contributing background diversity of 0.0016 in a control sample). Comparison of APDs estimated from 100 sets of sequences drawn at random from 454 generated data and from corresponding SGS data showed very close correlation between the two methods with R2 of 0.96, and differing on average by about 1% (after correction for PCR error). The authors have developed a novel method that is good for calculating genetic diversities for large scale datasets from next generation sequencing. It can be implemented easily as a function in available variation calling programs like SAM tools or haplotype reconstruction software for nucleotide genetic diversity calculation. A Perl script implementing this method is available upon request. PMID:24681054

  8. PAPNC, a novel method to calculate nucleotide diversity from large scale next generation sequencing data.

    PubMed

    Shao, Wei; Kearney, Mary F; Boltz, Valerie F; Spindler, Jonathan E; Mellors, John W; Maldarelli, Frank; Coffin, John M

    2014-07-01

    Estimating viral diversity in infected patients can provide insight into pathogen evolution and emergence of drug resistance. With the widespread adoption of deep sequencing, it is important to develop tools to accurately calculate population diversity from very large datasets. Current methods for estimating diversity that are based on multiple alignments are not practical to apply to such data. In this study, the authors report a novel method (Pairwise Alignment Positional Nucleotide Counting, PAPNC) for estimating population diversity from 454 sequence data. The diversity measurements determined using this method were comparable to those calculated by average pairwise difference (APD) of multiply aligned sequences using MEGA5. Diversities were estimated for 9 patient plasma HIV samples sequenced with Titanium 454 technology and by single-genome sequencing (SGS). Diversities calculated from deep sequencing using PAPNC ranged from 0.002 to 0.021 while APD measurements calculated from SGS data ranged proximately from 0.001 to 0.018, with the difference being attributable to PCR error (contributing background diversity of 0.0016 in a control sample). Comparison of APDs estimated from 100 sets of sequences drawn at random from 454 generated data and from corresponding SGS data showed very close correlation between the two methods with R(2) of 0.96, and differing on average by about 1% (after correction for PCR error). The authors have developed a novel method that is good for calculating genetic diversities for large scale datasets from next generation sequencing. It can be implemented easily as a function in available variation calling programs like SAMtools or haplotype reconstruction software for nucleotide genetic diversity calculation. A Perl script implementing this method is available upon request. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. The mouse collagen X gene: complete nucleotide sequence, exon structure and expression pattern.

    PubMed Central

    Elima, K; Eerola, I; Rosati, R; Metsäranta, M; Garofalo, S; Perälä, M; De Crombrugghe, B; Vuorio, E

    1993-01-01

    Overlapping genomic clones covering the 7.2 kb mouse alpha 1(X) collagen gene, 0.86 kb of promoter and 1.25 kb of 3'-flanking sequences were isolated from two genomic libraries and characterized by nucleotide sequencing. Typical features of the gene include a unique three-exon structure, similar to that in the chick gene, with the entire triple-helical domain of 463 amino acids coded by a single large exon. The highest degree of amino acid and nucleotide sequence conservation was seen in the coding region for the collagenous and C-terminal non-collagenous domains between the mouse and known chick, bovine and human collagen type X sequences. More divergence between the sequences occurred in the N-terminal non-collagenous domain. Similarity between the mammalian collagen X sequences extended into the 3'-untranslated sequence, particularly near the polyadenylation site. The promoter of the mouse collagen X gene was found to contain two TATAA boxes 159 bp apart; primer extension analyses of the transcription start site revealed that both were functional. The promoter has an unusual structure with a very low G + C content of 28% between positions -220 and -1 of the upstream transcription start site. Northern and in situ hybridization analyses confirmed that the expression of the alpha 1(X) collagen gene is restricted to hypertrophic chondrocytes in tissues undergoing endochondral calcification. The detailed sequence information of the gene is useful for studies on the promoter activity of the gene and for generation of transgenic mice. Images Figure 3 Figure 5 Figure 6 PMID:8424763

  10. Nucleotide sequence of the internal transcribed spacers and 5.8S region of ribosomal DNA in Pinus pinea L.

    PubMed

    Marrocco, R; Gelati, M T; Maggini, F

    1996-01-01

    The nucleotide sequence of the first internal transcribed spacer (ITS1) belonging to different ribosomal RNA genes from Pinus pinea are reported. The analyzed ITS1 can be distinguished on the basis of their length, being one 2631 bp and the other 271 bp long. Nucleotide comparison of these regions did not show appreciable sequence homology. The larger ITS1 contains five tandem arranged subrepeats with size ranging between 219 bp and 237 bp. The nucleotide sequence of the 5.8S and the ITS2 regions belonging to the larger ribosomal RNA gene are also reported.

  11. The complete nucleotide sequence of the egg drop syndrome virus: an intermediate between mastadenoviruses and aviadenoviruses.

    PubMed

    Hess, M; Blöcker, H; Brandt, P

    1997-11-10

    The complete nucleotide sequence of an avian adenovirus, the egg drop syndrome (EDS) virus, was determined. The total genome length is 33,213 nucleotides, resulting in a molecular weight of 21.9 x 10(6). The GC content is only 42.5%. Between map units 3.5 and 76.9, the distribution of open reading frames with homology to known genes is similar to that reported for other mammalian and avian adenoviruses. However, no homologies to adenovirus genes such as E1A, pIX, pV, and E3 could be found. Outside this region, several open reading frames were identified without any obvious homology to known adenovirus proteins. In the region organized similarly as other adenoviral genomes, most homologies were found to an ovine adenovirus (OAV strain 287). The highest level of amino acid identity was found for the hexon proteins of EDS and OAV. The virus-associated RNA (VA RNA) was identified thanks to the homology with the VA RNA of fowl adenovirus serotype 1 (FAV1). Similarities with FAV1 were also found in the fiber protein. Our results demonstrate that the avian EDS virus represents an intermediate between mammalian and avian adenoviruses. The nucleotide sequence and genomic organization of the EDS virus reflect the heterogeneity of the aviadenovirus genus and the Adenoviridae family.

  12. PEG-Labeled Nucleotides and Nanopore Detection for Single Molecule DNA Sequencing by Synthesis

    PubMed Central

    Kumar, Shiv; Tao, Chuanjuan; Chien, Minchen; Hellner, Brittney; Balijepalli, Arvind; Robertson, Joseph W. F.; Li, Zengmin; Russo, James J.; Reiner, Joseph E.; Kasianowicz, John J.; Ju, Jingyue

    2012-01-01

    We describe a novel single molecule nanopore-based sequencing by synthesis (Nano-SBS) strategy that can accurately distinguish four bases by detecting 4 different sized tags released from 5′-phosphate-modified nucleotides. The basic principle is as follows. As each nucleotide is incorporated into the growing DNA strand during the polymerase reaction, its tag is released and enters a nanopore in release order. This produces a unique ionic current blockade signature due to the tag's distinct chemical structure, thereby determining DNA sequence electronically at single molecule level with single base resolution. As proof of principle, we attached four different length PEG-coumarin tags to the terminal phosphate of 2′-deoxyguanosine-5′-tetraphosphate. We demonstrate efficient, accurate incorporation of the nucleotide analogs during the polymerase reaction, and excellent discrimination among the four tags based on nanopore ionic currents. This approach coupled with polymerase attached to the nanopores in an array format should yield a single-molecule electronic Nano-SBS platform. PMID:23002425

  13. PEG-labeled nucleotides and nanopore detection for single molecule DNA sequencing by synthesis.

    PubMed

    Kumar, Shiv; Tao, Chuanjuan; Chien, Minchen; Hellner, Brittney; Balijepalli, Arvind; Robertson, Joseph W F; Li, Zengmin; Russo, James J; Reiner, Joseph E; Kasianowicz, John J; Ju, Jingyue

    2012-01-01

    We describe a novel single molecule nanopore-based sequencing by synthesis (Nano-SBS) strategy that can accurately distinguish four bases by detecting 4 different sized tags released from 5'-phosphate-modified nucleotides. The basic principle is as follows. As each nucleotide is incorporated into the growing DNA strand during the polymerase reaction, its tag is released and enters a nanopore in release order. This produces a unique ionic current blockade signature due to the tag's distinct chemical structure, thereby determining DNA sequence electronically at single molecule level with single base resolution. As proof of principle, we attached four different length PEG-coumarin tags to the terminal phosphate of 2'-deoxyguanosine-5'-tetraphosphate. We demonstrate efficient, accurate incorporation of the nucleotide analogs during the polymerase reaction, and excellent discrimination among the four tags based on nanopore ionic currents. This approach coupled with polymerase attached to the nanopores in an array format should yield a single-molecule electronic Nano-SBS platform.

  14. Nucleotide sequence of the cell wall proteinase gene of Streptococcus cremoris Wg2.

    PubMed Central

    Kok, J; Leenhouts, K J; Haandrikman, A J; Ledeboer, A M; Venema, G

    1988-01-01

    A 6.5-kilobase HindIII fragment that specifies the proteolytic activity of Streptococcus cremoris Wg2 was sequenced entirely. The nucleotide sequence revealed two open reading frames (ORFs), a small ORF1 with 295 codons and a large ORF2 containing 1,772 codons. For both ORFs, there was no stop codon on the HindIII fragment. A partially overlapping PstI fragment was used to locate the translation stop of the large ORF2. The entire ORF2 contained 1,902 coding triplets, followed by an apparently rho-independent terminator sequence. The inferred amino acid sequence would result in a protein of 200 kilodaltons. Both ORFs have their putative transcription and translation signals in a 345-base-pair ClaI fragment. ORF2 is preceded by a promoter region containing a 15-base-pair complementary direct repeat. Both the truncated 33- and the 200-kilodalton proteins have a signal peptide-like N-terminal amino acid sequence. The protein specified by ORF2 contained regions of extensive homology with serine proteases of the subtilisin family. Specifically, amino acid sequences involved in the formation of the active site (viz., Asp-32, His-64, and Ser-221 of the subtilisins) are well conserved in the S. cremoris Wg2 proteinase. The homologous sequences are separated by nonhomologous regions which contain several inserts, most notably a sequence of approximately 200 amino acids between the His and Ser residues of the active site. PMID:3278687

  15. Mouse Mammary Tumor Virus-Like Nucleotide Sequences in Canine and Feline Mammary Tumors▿

    PubMed Central

    Hsu, Wei-Li; Lin, Hsing-Yi; Chiou, Shyan-Song; Chang, Chao-Chin; Wang, Szu-Pong; Lin, Kuan-Hsun; Chulakasian, Songkhla; Wong, Min-Liang; Chang, Shih-Chieh

    2010-01-01

    Mouse mammary tumor virus (MMTV) has been speculated to be involved in human breast cancer. Companion animals, dogs, and cats with intimate human contacts may contribute to the transmission of MMTV between mouse and human. The aim of this study was to detect MMTV-like nucleotide sequences in canine and feline mammary tumors by nested PCR. Results showed that the presence of MMTV-like env and LTR sequences in canine malignant mammary tumors was 3.49% (3/86) and 18.60% (16/86), respectively. For feline malignant mammary tumors, the presence of both env and LTR sequences was found to be 22.22% (2/9). Nevertheless, the MMTV-like LTR and env sequences also were detected in normal mammary glands of dogs and cats. In comparisons of the MMTV-like DNA sequences of our findings to those of NIH 3T3 (MMTV-positive murine cell line) and human breast cancer cells, the sequence similarities ranged from 94 to 98%. Phylogenetic analysis revealed that intermixing among sequences identified from tissues of different hosts, i.e., mouse, dog, cat, and human, indicated the MMTV-like DNA existing in these hosts. Moreover, the env transcript was detected in 1 of the 19 MMTV-positive samples by reverse transcription-PCR. Taken together, our study provides evidence for the existence and expression of MMTV-like sequences in neoplastic and normal mammary glands of dogs and cats. PMID:20881168

  16. Mouse mammary tumor virus-like nucleotide sequences in canine and feline mammary tumors.

    PubMed

    Hsu, Wei-Li; Lin, Hsing-Yi; Chiou, Shyan-Song; Chang, Chao-Chin; Wang, Szu-Pong; Lin, Kuan-Hsun; Chulakasian, Songkhla; Wong, Min-Liang; Chang, Shih-Chieh

    2010-12-01

    Mouse mammary tumor virus (MMTV) has been speculated to be involved in human breast cancer. Companion animals, dogs, and cats with intimate human contacts may contribute to the transmission of MMTV between mouse and human. The aim of this study was to detect MMTV-like nucleotide sequences in canine and feline mammary tumors by nested PCR. Results showed that the presence of MMTV-like env and LTR sequences in canine malignant mammary tumors was 3.49% (3/86) and 18.60% (16/86), respectively. For feline malignant mammary tumors, the presence of both env and LTR sequences was found to be 22.22% (2/9). Nevertheless, the MMTV-like LTR and env sequences also were detected in normal mammary glands of dogs and cats. In comparisons of the MMTV-like DNA sequences of our findings to those of NIH 3T3 (MMTV-positive murine cell line) and human breast cancer cells, the sequence similarities ranged from 94 to 98%. Phylogenetic analysis revealed that intermixing among sequences identified from tissues of different hosts, i.e., mouse, dog, cat, and human, indicated the MMTV-like DNA existing in these hosts. Moreover, the env transcript was detected in 1 of the 19 MMTV-positive samples by reverse transcription-PCR. Taken together, our study provides evidence for the existence and expression of MMTV-like sequences in neoplastic and normal mammary glands of dogs and cats.

  17. Avian Retroviruses That Cause Carcinoma and Leukemia: Identification of Nucleotide Sequences Associated with Pathogenicity

    PubMed Central

    Sheiness, Diana; Bister, Klaus; Moscovici, Carlo; Fanshier, Lois; Gonda, Thomas; Bishop, J. Michael

    1980-01-01

    Avian myelocytomatosis virus (MC29V) is a retrovirus that transforms both fibroblasts and macrophages in culture and induces myelocytomatosis, carcinomas, and sarcomas in birds. Previous work identified a sequence of about 1,500 nucleotides (here denoted oncMCV) that apparently derived from a normal cellular sequence and that may encode the oncogenic capacity of MC29V. In an effort to further implicate oncMCV in tumorigenesis, we used molecular hybridization to examine the distribution of nucleotide sequences related to oncMCV among the genomes of various avian retroviruses. In addition, we characterized further the genetic composition of the remainder of the MC29V genome. Our work exploited the availability of radioactive DNAs (cDNA's) complementary to oncMCV (cDNAMCV) or to specific portions of the genome of avian sarcoma virus (ASV). We showed that genomic RNAs of avian erythroblastosis virus (AEV) and avian myeloblastosis virus (AMV) could not hybridize appreciably with cDNAMCV. By contrast, cDNAMCV hybridized extensively (about 75%) and with essentially complete fidelity to the genome of Mill Hill 2 virus (MH2V), whose pathogenicity is very similar to that of MC29V, but different from that of AEV or AMV. Hybridization with the ASV cDNA's demonstrated that the MC29V genome includes about half of the ASV envelope protein gene and that the remainder of the MC29V genome is closely related to nucleotide sequences that are shared among the genomes of many avian leukosis and sarcoma viruses. We conclude that oncMCV probably specifies the unique set of pathogenicities displayed by MC29V and MH2V, whereas the oncogenic potentials of AEV and AMV are presumably encoded by a distinct nucleotide sequence unrelated to oncMCV. The genomes of ASV, MC29V, and other avian oncoviruses thus share a set of common sequences, but apparently owe their various oncogenic potentials to unrelated transforming genes. Images PMID:6245277

  18. Cloning, nucleotide sequence, and expression of the Pasteurella haemolytica A1 glycoprotease gene.

    PubMed Central

    Abdullah, K M; Lo, R Y; Mellors, A

    1991-01-01

    Pasteurella haemolytica serotype A1 secretes a glycoprotease which is specific for O-sialoglycoproteins such as glycophorin A. The gene encoding the glycoprotease enzyme has been cloned in the recombinant plasmid pH1, and its nucleotide sequence has been determined. The gene (designated gcp) codes for a protein of 35.2 kDa, and an active enzyme protein of this molecular mass can be observed in Escherichia coli clones carrying pPH1. In vivo labeling of plasmid-encoded proteins in E. coli maxicells demonstrated the expression of a 35-kDa protein from pPH1. The amino-terminal sequence of the heterologously expressed protein corresponds to that predicted from the nucleotide sequence. The glycoprotease is a neutral metalloprotease, and the predicted amino acid sequence of the glycoprotease contains a putative zinc-binding site. The gene shows no significant homology with the genes for other proteases of procaryotic or eucaryotic origin. However, there is substantial homology between gcp and an E. coli gene, orfX, whose product is believed to function in the regulation of macromolecule biosynthesis. Images PMID:1885539

  19. Total chemical synthesis of a 77-nucleotide-long RNA sequence having methionine-acceptance activity.

    PubMed Central

    Ogilvie, K K; Usman, N; Nicoghosian, K; Cedergren, R J

    1988-01-01

    Chemical synthesis is described of a 77-nucleotide-long RNA molecule that has the sequence of an Escherichia coli Ado-47-containing tRNA(fMet) species in which the modified nucleosides have been substituted by their unmodified parent nucleosides. The sequence was assembled on a solid-phase, controlled-pore glass support in a stepwise manner with an automated DNA synthesizer. The ribonucleotide building blocks used were fully protected 5'-monomethoxytrityl-2'-silyl-3'-N,N-diisopropylaminophosphoram idites. p-Nitro-phenylethyl groups were used to protect the O6 of guanine residues. The fully deprotected tRNA analogue was characterized by polyacrylamide gel electrophoresis (sizing), terminal nucleotide analysis, sequencing, and total enzyme degradation, all of which indicated that the sequence was correct and contained only 3-5 linkages. The 77-mer was then assayed for amino acid acceptor activity by using E. coli methionyl-tRNA synthetase. The results indicated that the synthetic product, lacking modified bases, is a substrate for the enzyme and has an amino acid acceptance 11% of that of the major native species, tRNA(fMet) containing 7-methylguanosine at position 47. Images PMID:3413059

  20. Mitochondrial DNA in the sea urchin Arbacia lixula: evolutionary inferences from nucleotide sequence analysis.

    PubMed

    De Giorgi, C; Lanave, C; Musci, M D; Saccone, C

    1991-07-01

    From the stirodont Arbacia lixula we determined the sequence of 5,127 nucleotides of mitochondrial DNA (mtDNA) encompassing 18 tRNAs, two complete coding genes, parts of three other coding genes, and part of the 12S ribosomal RNA (rRNA). The sequence confirms that the organization of mtDNA is conserved within echinoids. Furthermore, it underlines the following peculiar features of sea urchin mtDNA: the clustering of tRNAs, the short noncoding regulatory sequence, and the separation by the ND1 and ND2 genes of the two rRNA genes. Comparison with the orthologous sequences from the camarodont species Paracentrotus lividus and Strongylocentrotus purpuratus revealed that (1) echinoids have an extra piece on the amino terminus of the ND5 gene that is probably the remnant of an old leucine tRNA gene; (2) third-position codon nucleotide usage has diverged between A. lixula and the camarodont species to a significant extent, implying different directional mutational pressures; and (3) the stirodont-camarodont divergence occurred twice as long ago as did the P. lividus-S. purpuratus divergence.

  1. Nucleotide and deduced amino acid sequences of rat myosin binding protein H (MyBP-H).

    PubMed

    Jung, J; Oh, J; Lee, K

    1998-12-01

    The complete nucleotide sequence of the cDNA clone encoding rat skeletal muscle myosin-binding protein H (MyBP-H) was determined and amino acid sequence was deduced from the nucleotide sequence (GenBank accession number AF077338). The full-length cDNA of 1782 base pairs(bp) contains a single open reading frame of 1454 bp encoding a rat MyBP-H protein of the predicted molecular mass 52.7 kDa and includes the common consensus 'CA__TG' protein binding motif. The cDNA sequence of rat MyBP-H show 92%, 84% and 41% homology with those of mouse, human and chicken, respectively. The protein contains tandem internal motifs array (-FN III-Ig C2-FN III-Ig C2-) in the C-terminal region which resembles to the immunoglobulin superfamily C2 and fibronectin type III motifs. The amino acid sequence of the C-terminal Ig C2 was highly conserved among MyBPs family and other thick filament binding proteins, suggesting that the C-terminal Ig C2 might play an important role in its function. All proteins belonging to MyBP-H member contains 'RKPS' sequence which is assumed to be cAMP- and cGMP-dependent protein kinase A phosphorylation site. Computer analysis of the primary sequence of rat MyBP-H predicted 11 protein kinase C (PKC) phosphorylation site, 7 casein kinase II (CK2) phosphorylation site and 4 N-myristoylation site.

  2. The complete nucleotide sequence and genome organization of a novel carmovirus - Honeysuckle ringspot virus isolated from honeysuckle.

    USDA-ARS?s Scientific Manuscript database

    A virus associated with yellow to purple ringspot on honeysuckle plants has been detected and tentatively named as Honeysuckle ringspot virus (HnRSV). The complete nucleotide sequence of HnRSV has been determined from infected honeysuckle. The genomic RNA of HnRSV is 3,956 nucleotides in length and ...

  3. The complete nucleotide sequence of goat (Capra hircus) mitochondrial genome. Goat mitochondrial genome.

    PubMed

    Parma, Pietro; Pietro, Parma; Feligini, Maria; Maria, Feligini; Greeppi, Gianfranco; Gianfranco, Greppi; Enne, Giuseppe; Giuseppe, Enne

    2003-06-01

    The goat mtDNA sequences reported to date are fragmentary. By using both in silico cloning procedure and conventional molecular biology techniques we have determined the complete nucleotide sequence of the goat (Capra hircus) mitochondrial genome. The length of the sequence was 16.640 bp. Genes responsible for 12S and 16S rRNAs, 22 tRNAs and 13 protein-coding regions are found. The genome organization is conformed to those of other mitochondrial genomes. Comparison between the 13 protein coding genes of goat, cow and sheep reveals that the difference range from 1.2 to 12.2% with a mean of 7.3% between goat and cow and from 0 to 15.6% (mean 4.7%) between goat and sheep.

  4. Nucleotide sequence of yeast GDH1 encoding nicotinamide adenine dinucleotide phosphate-dependent glutamate dehydrogenase.

    PubMed

    Moye, W S; Amuro, N; Rao, J K; Zalkin, H

    1985-07-15

    The yeast GDH1 gene encodes NADP-dependent glutamate dehydrogenase. This gene was isolated by complementation of an Escherichia coli glutamate auxotroph. NADP-dependent glutamate dehydrogenase was overproduced 6-10-fold in Saccharomyces cerevisiae bearing GDH1 on a multicopy plasmid. The nucleotide sequence of the 1362-base pair coding region and 5' and 3' flanking sequences were determined. Transcription start sites were located by S1 nuclease mapping. Regulation of GDH1 was not maintained when the gene was present on a multicopy plasmid. Protein secondary structure predictions identified a region with potential to form the dinucleotide-binding domain. The amino acid sequences of the yeast and Neurospora crassa enzymes are 63% conserved. Unlike the N. crassa gene, yeast GDH1 has no introns.

  5. Conservation of nucleotide sequences for molecular diagnosis of Middle East respiratory syndrome coronavirus, 2015.

    PubMed

    Furuse, Yuki; Okamoto, Michiko; Oshitani, Hitoshi

    2015-11-01

    Infection due to the Middle East respiratory syndrome coronavirus (MERS-CoV) is widespread. The present study was performed to assess the protocols used for the molecular diagnosis of MERS-CoV by analyzing the nucleotide sequences of viruses detected between 2012 and 2015, including sequences from the large outbreak in eastern Asia in 2015. Although the diagnostic protocols were established only 2 years ago, mismatches between the sequences of primers/probes and viruses were found for several of the assays. Such mismatches could lead to a lower sensitivity of the assay, thereby leading to false-negative diagnosis. A slight modification in the primer design is suggested. Protocols for the molecular diagnosis of viral infections should be reviewed regularly after they are established, particularly for viruses that pose a great threat to public health such as MERS-CoV.

  6. Identification of shark species in seafood products by forensically informative nucleotide sequencing (FINS).

    PubMed

    Blanco, M; Pérez-Martín, R I; Sotelo, C G

    2008-11-12

    The identification of commercial shark species is a relevant issue to ensure the correct labeling of seafood products, to maintain consumer confidence in seafood, and to enhance the knowledge of the species and volumes that are at present being captured, thus improving the management of shark fisheries. The polymerase chain reaction was employed to obtain a 423 bp amplicon from the mitochondrial cytochrome b gene. The sequences from this fragment, belonging to 63 authentic individuals of 23 species, were analyzed using a genetic distance method. Nine different samples of commercial fresh, frozen, and convenience food were obtained in local and international markets to validate the methodology. These samples were analyzed, and sequences were employed for species identification, showing that forensically informative nucleotide sequencing (FINS) is a suitable technique for identification of processed seafood containing shark as an ingredient. The results also showed that incorrect labeling practices may occur regarding shark products, probably because of incorrect labeling at the production point.

  7. Nucleotide sequence of the bean strain of southern bean mosaic virus.

    PubMed

    Othman, Y; Hull, R

    1995-01-10

    The genome of the bean strain of southern bean mosaic virus (SBMV-B) comprises 4109 nucleotides and thus is slightly shorter than those of the two other sequenced sobemoviruses (southern bean mosaic virus, cowpea strain (SBMV-C) and rice yellow mottle virus (RYMV)). SBMV-B has an overall sequence similarity with SBMV-C of 55% and with RYMV of 45%. Three potential open reading frames (ORFs) were recognized in SBMV-B which were in similar positions in the genomes of SBMV-C and RYMV. However, there was no analog of SBMV-C and RYMV ORF 3. From a comparison of the predicted sequences of the ORFs of these three sobemoviruses and of the noncoding regions, it is suggested that the two SBMV strains differ from one another as much as they do from RYMV and that they should be considered as different viruses.

  8. Nucleotide sequence of a satellite RNA associated with carrot motley dwarf in parsley and carrot.

    PubMed

    Menzel, Wulf; Maiss, Edgar; Vetten, H Josef

    2009-02-01

    Carrot motley dwarf (CMD) is known to result from a mixed infection by two viruses, the polerovirus Carrot red leaf virus and one of the umbraviruses Carrot mottle mimic virus or Carrot mottle virus. Some umbraviruses have been shown to be associated with small satellite (sat) RNAs, but none have been reported for the latter two. A CMD-affected parsley plant was used for sap transmission to test plants, that were used for dsRNA isolation. The presence of a 0.8-kbp dsRNA indicated the occurrence of a hitherto unrecognized satRNA associated with CMD. The satRNAs of the CMD isolate from parsley and an isolate from carrot have been sequenced and showed 94% sequence identity. Nucleotide sequences and putative translation products had no significant similarities to GenBank entries. To our knowledge, this is the first report of satRNAs associated with CMD.

  9. Single nucleotide polymorphisms from Theobroma cacao expressed sequence tags associated with witches' broom disease in cacao.

    PubMed

    Lima, L S; Gramacho, K P; Carels, N; Novais, R; Gaiotto, F A; Lopes, U V; Gesteira, A S; Zaidan, H A; Cascardo, J C M; Pires, J L; Micheli, F

    2009-07-14

    In order to increase the efficiency of cacao tree resistance to witches' broom disease, which is caused by Moniliophthora perniciosa (Tricholomataceae), we looked for molecular markers that could help in the selection of resistant cacao genotypes. Among the different markers useful for developing marker-assisted selection, single nucleotide polymorphisms (SNPs) constitute the most common type of sequence difference between alleles and can be easily detected by in silico analysis from expressed sequence tag libraries. We report the first detection and analysis of SNPs from cacao-M. perniciosa interaction expressed sequence tags, using bioinformatics. Selection based on analysis of these SNPs should be useful for developing cacao varieties resistant to this devastating disease.

  10. On the identification of group II introns in nucleotide sequence data.

    PubMed

    Knoop, V; Kloska, S; Brennicke, A

    1994-09-30

    Four different consensus sequences (GTI, group II identifiers) have been derived from domains V of known group II introns and are used as query input sequences for sensitive database screenings with the FASTA and LFASTA programs. The set of four GTI sequences can identify all domains V of the 96 known group II introns in the completely sequenced chloroplast genomes of Marchantia polymorpha, Epifagus virginiana, Oryza sativa, Nicotiana tabacum and the completely sequenced mitochondrial genomes of Saccharomyces cerevisiae, Podospora anserina, Schizosaccharomyces pombe and Marchantia polymorpha. Seven moderately high-scoring hits can easily be rejected as false-positives since they do not fulfil secondary structure requirements. Large FASTA outputs obtained after screening the entire nucleotide sequence database are evaluated in a second step by a program (D5SCAN) that allows the assignment of variable selection criteria for potential domain V secondary structures. Database searches with these routines yield evidence for several group II intron sequences previously unrecognized. These include novel intron structures in the cyanobacterium Synechocystis and in the mitochondrial genomes of Marchantia, soybean, pea, broad bean, sugar beet and a heterobasidiomycete. Potential intron remnants are found contributing to the secondary structure of rRNAs in several trypanosome species. At a given sensitivity of 95% positively identified true domains V, the search routine produces one false positive hit per 10,000 kb.

  11. Nucleotide-sequence-specific de novo methylation in a somatic murine cell line.

    PubMed Central

    Szyf, M; Schimmer, B P; Seidman, J G

    1989-01-01

    DNA fragments encoding the mouse steroid 21-hydroxylase (C21 or Cyp21A1) gene are de novo methylated when introduced into the mouse adrenocortical tumor cell line Y1 by DNA-mediated gene transfer. Although CCGG sequences within the C21 gene are de novo methylated, CCGG sites within flanking vector sequences, other mammalian gene sequences driven by the C21 promoter, and the neomycin-resistance gene, which was cotransfected with the C21 gene, do not become methylated. At least two separate signals for de novo methylation are encoded within the gene since three fragments derived from the C21 gene were methylated de novo. Specific de novo methylation of C21-derived sequences does not occur in L cells or Y1 kin8 cells; this suggests that the cellular factors needed for de novo methylation of the C21 gene are not ubiquitous. Most DNA sequences are not de novo methylated when introduced into somatic cells and DNA sequences other than the C21 gene are not de novo methylated when introduced into Y1 cells. Several groups have suggested that de novo methylation occurs in early embryonic cells and that somatic cells strictly maintain their methylation pattern by a semiconservative methyltransferase. Our results suggest that de novo methylation of specific nucleotide sequences can occur in some mammalian somatic cells. Images PMID:2789380

  12. Patterns of nucleotide misincorporations during enzymatic amplification and direct large-scale sequencing of ancient DNA.

    PubMed

    Stiller, M; Green, R E; Ronan, M; Simons, J F; Du, L; He, W; Egholm, M; Rothberg, J M; Keates, S G; Keats, S G; Ovodov, N D; Antipina, E E; Baryshnikov, G F; Kuzmin, Y V; Vasilevski, A A; Wuenschell, G E; Termini, J; Hofreiter, M; Jaenicke-Després, V; Pääbo, S

    2006-09-12

    Whereas evolutionary inferences derived from present-day DNA sequences are by necessity indirect, ancient DNA sequences provide a direct view of past genetic variants. However, base lesions that accumulate in DNA over time may cause nucleotide misincorporations when ancient DNA sequences are replicated. By repeated amplifications of mitochondrial DNA sequences from a large number of ancient wolf remains, we show that C/G-to-T/A transitions are the predominant type of such misincorporations. Using a massively parallel sequencing method that allows large numbers of single DNA strands to be sequenced, we show that modifications of C, as well as to a lesser extent of G, residues cause such misincorporations. Experiments where oligonucleotides containing modified bases are used as templates in amplification reactions suggest that both of these types of misincorporations can be caused by deamination of the template bases. New DNA sequencing methods in conjunction with knowledge of misincorporation processes have now, in principle, opened the way for the determination of complete genomes from organisms that became extinct during and after the last glaciation.

  13. Developing Single Nucleotide Polymorphism (SNP) markers from transcriptome sequences for the identification of longan (Dimocarpus longan) germplasm

    USDA-ARS?s Scientific Manuscript database

    Longan (Dimocarpus longan Lour.) is an important tropical fruit tree crop. Accurate varietal identification is essential for germplasm management and breeding. Using longan transcriptome sequences from public databases, we developed single nucleotide polymorphism (SNP) markers; validated 60 SNPs in...

  14. Complete Nucleotide Sequence of an Australian Isolate of Turnip mosaic virus before and after Seven Years of Serial Passaging

    PubMed Central

    Pretorius, Lara; Moyle, Richard L.; Dalton-Morgan, Jessica; Hussein, Nasser

    2016-01-01

    The complete genome sequence of an Australian isolate of Turnip mosaic virus was determined by Sanger sequencing. After seven years of serial passaging by mechanical inoculation, the isolate was resequenced by RNA sequencing (RNA-Seq). Eighteen single nucleotide polymorphisms were identified between the isolates. Both isolates had 96% identity to isolate AUST10. PMID:27856582

  15. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

    PubMed

    Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong; Warnow, Tandy

    2015-05-01

    We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.

  16. Genomic DNA Enrichment Using Sequence Capture Microarrays: a Novel Approach to Discover Sequence Nucleotide Polymorphisms (SNP) in Brassica napus L

    PubMed Central

    Clarke, Wayne E.; Parkin, Isobel A.; Gajardo, Humberto A.; Gerhardt, Daniel J.; Higgins, Erin; Sidebottom, Christine; Sharpe, Andrew G.; Snowdon, Rod J.; Federico, Maria L.; Iniguez-Luy, Federico L.

    2013-01-01

    Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci –QTL– analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species. PMID:24312619

  17. Genomic DNA enrichment using sequence capture microarrays: a novel approach to discover sequence nucleotide polymorphisms (SNP) in Brassica napus L.

    PubMed

    Clarke, Wayne E; Parkin, Isobel A; Gajardo, Humberto A; Gerhardt, Daniel J; Higgins, Erin; Sidebottom, Christine; Sharpe, Andrew G; Snowdon, Rod J; Federico, Maria L; Iniguez-Luy, Federico L

    2013-01-01

    Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci -QTL- analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species.

  18. Infectivity and complete nucleotide sequence of cucumber fruit mottle mosaic virus isolate Cm cDNA.

    PubMed

    Rhee, Sun-Ju; Hong, Jin-Sung; Lee, Gung Pyo

    2014-07-01

    Three isolates of cucumber fruit mottle mosaic virus (CFMMV) were collected from melon, cucumber, and pumpkin plants in Korea. A full-length cDNA clone of CFMMV-Cm (melon isolate) was produced and evaluated for infectivity after T7 transcription in vitro (pT7CF-Cmflc). The complete CFMMV genome sequence of the infectious clone pT7CF-Cmflc was determined. The genome of CFMMV-Cm consisted of 6,571 nucleotides and shared high nucleotide sequence identity (98.8 %) with the Israel isolate of CFMMV. Based on the infectious clone pT7CF-Cmflc, a CaMV 35S-promoter driven cDNA clone (p35SCF-Cmflc) was subsequently constructed and sequenced. Mechanical inoculation with RNA transcripts of pT7CF-Cmflc and agro-inoculation with p35SCF-Cmflc resulted in systemic infection of cucumber and melon, producing symptoms similar to those produced by CFMMV-Cm. Progeny virus in infected plants was detected by RT-PCR, western blot assay, and transmission electron microscopy.

  19. Structure and nucleotide sequence of the rat intestinal vitamin D-dependent calcium binding protein gene.

    PubMed Central

    Krisinger, J; Darwish, H; Maeda, N; DeLuca, H F

    1988-01-01

    The vitamin D-dependent intestinal calcium binding protein (ICaBP, 9 kDa) is under transcriptional regulation by 1,25-dihydroxyvitamin D3 [1,25-(OH)2D3], the hormonal active form of the vitamin. To study the mechanism of gene regulation by 1,25-(OH)2D3, we isolated the rat ICaBP gene by using a cDNA probe. Its nucleotide sequence revealed 3 exons separated by 2 introns within approximately 3 kilobases. The first exon represents only noncoding sequences, while the second and third encode the two calcium binding domains of the protein. The gene contains a 15-base-pair imperfect palindrome in the first intron that shows high homology to the estrogen-responsive element. This sequence may represent the vitamin D-responsive element involved in the regulation of the ICaBP gene. The second intron shows an 84-base-pair-long simple nucleotide repeat that implicates Z-DNA formation. Genomic Southern analysis shows that the rat gene is represented as a single copy. Images PMID:3194402

  20. Complete nucleotide sequence and genome organization of a Cactus virus X strain from Hylocereus undatus (Cactaceae).

    PubMed

    Liou, M R; Chen, Y R; Liou, R F

    2004-05-01

    The complete nucleotide sequence of a strain of Cactus virus X (CVX-Hu) isolated from Hylocereus undatus (Cactaceae) has been determined. Excluding the poly(A) tail, the sequence is 6614 nucleotides in length and contains seven open reading frames (ORFs). The genome organization of CVX is similar to that of other potexviruses. ORF1 encodes the putative viral replicase with conserved methyltransferase, helicase, and polymerase motifs. Within ORF1, two other ORFs were located separately in the +2 reading frame, we call these ORF6 and ORF7. ORF2, 3, and 4, which form the "triple gene block" characteristic of the potexviruses, encode proteins with molecular mass of 25, 12, and 7 KDa, respectively. ORF5 encodes the coat protein with an estimated molecular mass of 24 KDa. Sequence analysis indicated that proteins encoded by ORF1-5 display certain degree of homology to the corresponding proteins of other potexviruses. Putative product of ORF6, however, shows no significant similarity to those of other potexviruses. Phylogenetic analyses based on the replicase (the methyltransferase, helicase, and polymerase domains) and coat protein demonstrated a closer relationship of CVX with Bamboo mosaic virus, Cassava common mosaic virus, Foxtail mosaic virus, Papaya mosaic virus, and Plantago asiatica mosaic virus.

  1. The nucleotide sequence of sacbrood virus of the honey bee: an insect picorna-like virus.

    PubMed

    Ghosh, R C; Ball, B V; Willcocks, M M; Carter, M J

    1999-06-01

    We have determined the nucleotide sequence of sacbrood virus (SBV), which causes a fatal infection of honey bee larvae. The genomic RNA of SBV is longer than that of typical mammalian picornaviruses (8832 nucleotides) and contains a single, large open reading frame (179-8752) encoding a polyprotein of 2858 amino acids. Sequence comparison with other virus polyproteins revealed regions of similarity to characterized helicase, protease and RNA-dependent RNA polymerase domains; structural genes were located at the 5' terminus with non-structural genes at the 3' end. Picornavirus-like agents of insects have two distinct genomic organizations; some resemble mammalian picornaviruses with structural genes at the 5' end and non-structural genes at the 3' end, and others resemble caliciviruses in which this order is reversed; SBV thus belongs to the former type. Sequence comparison suggested that SBV is distantly related to infectious flacherie virus (IFV) of the silk worm, which possesses an RNA of similar size and gene order.

  2. Nucleotide sequence and expression of the Enterobacter aerogenes alpha-acetolactate decarboxylase gene in brewer's yeast.

    PubMed Central

    Sone, H; Fujii, T; Kondo, K; Shimizu, F; Tanaka, J; Inoue, T

    1988-01-01

    The nucleotide sequence of a 1.4-kilobase DNA fragment containing the alpha-acetolactate decarboxylase gene of Enterobacter aerogenes was determined. The sequence contains an entire protein-coding region of 780 nucleotides which encodes an alpha-acetolactate decarboxylase of 260 amino acids. The DNA sequence coding for alpha-acetolactate decarboxylase was placed under the control of the alcohol dehydrogenase I promoter of the yeast Saccharomyces cerevisiae in a plasmid capable of autonomous replication in both S. cerevisiae and Escherichia coli. Brewer's yeast cells transformed by this plasmid showed alpha-acetolactate decarboxylase activity and were used in laboratory-scale fermentation experiments. These experiments revealed that the diacetyl concentration in wort fermented by the plasmid-containing yeast strain was significantly lower than that in wort fermented by the parental strain. These results indicated that the alpha-acetolactate decarboxylase activity produced by brewer's yeast cells degraded alpha-acetolactate and that this degradation caused a decrease in diacetyl production. PMID:3278689

  3. Determination of Single-Nucleotide Polymorphisms by Real-time Pyrophosphate DNA Sequencing

    PubMed Central

    Alderborn, Anders; Kristofferson, Anna; Hammerling, Ulf

    2000-01-01

    The characterization of naturally occurring variations in the human genome has evoked an immense interest during recent years. Variations known as biallelic Single-Nucleotide Polymorphisms (SNPs) have become increasingly popular markers in molecular genetics because of their wide application both in evolutionary relationship studies and in the identification of susceptibility to common diseases. We have addressed the issue of SNP genotype determination by investigating variations within the Renin–Angiotensin–Aldosterone System (RAAS) using pyrosequencing, a real-time pyrophosphate detection technology. The method is based on indirect luminometric quantification of the pyrophosphate that is released as a result of nucleotide incorporation onto an amplified template. The technical platform employed comprises a highly automated sequencing instrument that allows the analysis of 96 samples within 10 to 20 minutes. In addition to each studied polymorphic position, 5–10 downstream bases were sequenced for acquisition of reference signals. Evaluation of pyrogram data was accomplished by comparison of peak heights, which are proportional to the number of incorporated nucleotides. Analysis of the pyrograms that resulted from alternate allelic configurations for each addressed SNP revealed a highly discriminating pattern. Homozygous samples produced clear-cut single base peaks in the expected position, whereas heterozygous counterparts were characterized by distinct half-height peaks representing both allelic positions. Whenever any of the allelic bases of an SNP formed a homopolymer with adjacent bases, the nonallelic signal was added to those of the SNP. This feature did not, however, influence SNP readability. Furthermore, the multibase reading capacity of the described system provides extensive flexibility in regard to the positioning of sequencing primers and allows the determination of several closely located SNPs in a single run. PMID:10958643

  4. The ChEMBL bioactivity database: an update

    PubMed Central

    Bento, A. Patrícia; Gaulton, Anna; Hersey, Anne; Bellis, Louisa J.; Chambers, Jon; Davies, Mark; Krüger, Felix A.; Light, Yvonne; Mak, Lora; McGlinchey, Shaun; Nowotka, Michal; Papadatos, George; Santos, Rita; Overington, John P.

    2014-01-01

    ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 Nucleic Acids Research Database Issue. Since then, a variety of new data sources and improvements in functionality have contributed to the growth and utility of the resource. In particular, more comprehensive tracking of compounds from research stages through clinical development to market is provided through the inclusion of data from United States Adopted Name applications; a new richer data model for representing drug targets has been developed; and a number of methods have been put in place to allow users to more easily identify reliable data. Finally, access to ChEMBL is now available via a new Resource Description Framework format, in addition to the web-based interface, data downloads and web services. PMID:24214965

  5. The ChEMBL bioactivity database: an update.

    PubMed

    Bento, A Patrícia; Gaulton, Anna; Hersey, Anne; Bellis, Louisa J; Chambers, Jon; Davies, Mark; Krüger, Felix A; Light, Yvonne; Mak, Lora; McGlinchey, Shaun; Nowotka, Michal; Papadatos, George; Santos, Rita; Overington, John P

    2014-01-01

    ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 Nucleic Acids Research Database Issue. Since then, a variety of new data sources and improvements in functionality have contributed to the growth and utility of the resource. In particular, more comprehensive tracking of compounds from research stages through clinical development to market is provided through the inclusion of data from United States Adopted Name applications; a new richer data model for representing drug targets has been developed; and a number of methods have been put in place to allow users to more easily identify reliable data. Finally, access to ChEMBL is now available via a new Resource Description Framework format, in addition to the web-based interface, data downloads and web services.

  6. Nucleotide sequence of the transforming gene of m1 murine sarcoma virus.

    PubMed Central

    Brow, M A; Sen, A; Sutcliffe, J G

    1984-01-01

    The v-mosm1 nucleotide sequence codes for a protein that is 376 amino acids long. Although the N-terminus is homologous with that of the v-mos124 protein, the C-terminus is substantially different from the C-termini of all other examined mos proteins, suggesting that this region is nonessential and perhaps cleaved. Overall, v-mosm1 has greater homology with c-mos than does v-mos124, but mutually exclusive differences between c-mos and each of the v-mos genes preclude linear descent and suggest a common ancestral murine sarcoma virus. PMID:6319757

  7. The Complete Nucleotide Sequence of the Mitochondrial Genome of Bactrocera minax (Diptera: Tephritidae)

    PubMed Central

    Zhang, Bin; Nardi, Francesco; Hull-Sanders, Helen; Wan, Xuanwu; Liu, Yinghong

    2014-01-01

    The complete 16,043 bp mitochondrial genome (mitogenome) of Bactrocera minax (Diptera: Tephritidae) has been sequenced. The genome encodes 37 genes usually found in insect mitogenomes. The mitogenome information for B. minax was compared to the homologous sequences of Bactrocera oleae, Bactrocera tryoni, Bactrocera philippinensis, Bactrocera carambolae, Bactrocera papayae, Bactrocera dorsalis, Bactrocera correcta, Bactrocera cucurbitae and Ceratitis capitata. The analysis indicated the structure and organization are typical of, and similar to, the nine closely related species mentioned above, although it contains the lowest genome-wide A+T content (67.3%). Four short intergenic spacers with a high degree of conservation among the nine tephritid species mentioned above and B. minax were observed, which also have clear counterparts in the control regions (CRs). Correlation analysis among these ten tephritid species revealed close positive correlation between the A+T content of zero-fold degenerate sites (P0FD), the ratio of nucleotide substitution frequency at P0FD sites to all degenerate sites (zero-fold degenerate sites, two-fold degenerate sites and four-fold degenerate sites) and amino acid sequence distance (ASD) were found. Further, significant positive correlation was observed between the A+T content of four-fold degenerate sites (P4FD) and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites; however, we found significant negative correlation between ASD and the A+T content of P4FD, and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites. A higher nucleotide substitution frequency at non-synonymous sites compared to synonymous sites was observed in nad4, the first time that has been observed in an insect mitogenome. A poly(T) stretch at the 5′ end of the CR followed by a [TA(A)]n-like stretch was also found. In addition, a highly conserved G+A-rich sequence block was observed in front of the

  8. The complete nucleotide sequence of the mitochondrial DNA of the dogfish, Scyliorhinus canicula.

    PubMed Central

    Delarbre, C; Spruyt, N; Delmarre, C; Gallut, C; Barriel, V; Janvier, P; Laudet, V; Gachelin, G

    1998-01-01

    We have determined the complete nucleotide sequence of the mitochondrial DNA (mtDNA) of the dogfish, Scyliorhinus canicula. The 16,697-bp-long mtDNA possesses a gene organization identical to that of the Osteichthyes, but different from that of the sea lamprey Petromyzon marinus. The main features of the mtDNA of osteichthyans were thus established in the common ancestor to chondrichthyans and osteichthyans. The phylogenetic analysis confirms that the Chondrichthyes are the sister group of the Osteichthyes. PMID:9725850

  9. Within-Host Nucleotide Diversity of Virus Populations: Insights from Next-Generation Sequencing

    PubMed Central

    Nelson, Chase W.; Hughes, Austin L.

    2014-01-01

    Next-generation sequencing (NGS) technology offers new opportunities for understanding the evolution and dynamics of viral populations within individual hosts over the course of infection. We review simple methods for estimating synonymous and nonsynonymous nucleotide diversity in viral genes from NGS data without the need for inferring linkage. We discuss the potential usefulness of these data for addressing questions of both practical and theoretical interest, including fundamental questions regarding the effective population sizes of within-host viral populations and the modes of natural selection acting on them. PMID:25481279

  10. The complete nucleotide sequence of the mitochondrial genome of Bactrocera minax (Diptera: Tephritidae).

    PubMed

    Zhang, Bin; Nardi, Francesco; Hull-Sanders, Helen; Wan, Xuanwu; Liu, Yinghong

    2014-01-01

    The complete 16,043 bp mitochondrial genome (mitogenome) of Bactrocera minax (Diptera: Tephritidae) has been sequenced. The genome encodes 37 genes usually found in insect mitogenomes. The mitogenome information for B. minax was compared to the homologous sequences of Bactrocera oleae, Bactrocera tryoni, Bactrocera philippinensis, Bactrocera carambolae, Bactrocera papayae, Bactrocera dorsalis, Bactrocera correcta, Bactrocera cucurbitae and Ceratitis capitata. The analysis indicated the structure and organization are typical of, and similar to, the nine closely related species mentioned above, although it contains the lowest genome-wide A+T content (67.3%). Four short intergenic spacers with a high degree of conservation among the nine tephritid species mentioned above and B. minax were observed, which also have clear counterparts in the control regions (CRs). Correlation analysis among these ten tephritid species revealed close positive correlation between the A+T content of zero-fold degenerate sites (P0FD), the ratio of nucleotide substitution frequency at P0FD sites to all degenerate sites (zero-fold degenerate sites, two-fold degenerate sites and four-fold degenerate sites) and amino acid sequence distance (ASD) were found. Further, significant positive correlation was observed between the A+T content of four-fold degenerate sites (P4FD) and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites; however, we found significant negative correlation between ASD and the A+T content of P4FD, and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites. A higher nucleotide substitution frequency at non-synonymous sites compared to synonymous sites was observed in nad4, the first time that has been observed in an insect mitogenome. A poly(T) stretch at the 5' end of the CR followed by a [TA(A)]n-like stretch was also found. In addition, a highly conserved G+A-rich sequence block was observed in front of the

  11. Nanoparticle-Based Discrimination of Single-Nucleotide Polymorphism in Long DNA Sequences.

    PubMed

    Sanromán-Iglesias, María; Lawrie, Charles H; Liz-Marzán, Luis M; Grzelczak, Marek

    2017-04-19

    Circulating DNA (ctDNA) and specifically the detection cancer-associated mutations in liquid biopsies promises to revolutionize cancer detection. The main difficulty however is that the length of typical ctDNA fragments (∼150 bases) can form secondary structures potentially obscuring the mutated fragment from detection. We show that an assay based on gold nanoparticles (65 nm) stabilized with DNA (Au@DNA) can discriminate single nucleotide polymorphism in clinically relevant ssDNA sequences (70-140 bases). The preincubation step was crucial to this process, allowing sequential bridging of Au@DNA, so that single base mutation can be discriminated, down to 100 pM concentration.

  12. Complete nucleotide sequence of a virus associated with rusty mottle disease of sweet cherry (Prunus avium).

    PubMed

    Villamor, D V; Druffel, K L; Eastwell, K C

    2013-08-01

    Cherry rusty mottle is a disease of sweet cherries first described in 1940 in western North America. Because of the graft-transmissible nature of the disease, a viral nature of the disease was assumed. Here, the complete genomic nucleotide sequences of virus isolates from two trees expressing cherry rusty mottle disease symptoms are characterized; the virus is designated cherry rusty mottle associated virus (CRMaV). The biological and molecular characteristics of this virus in comparison to those of cherry necrotic rusty mottle virus (CNRMV) and cherry green ring mottle virus (CGRMV) are described. CRMaV was subsequently detected in additional sweet cherry trees expressing symptoms of cherry rusty mottle disease.

  13. Complete nucleotide sequences of two begomoviruses infecting Madagascar periwinkle (Catharanthus roseus) from Pakistan.

    PubMed

    Ilyas, Muhammad; Nawaz, Kiran; Shafiq, Muhammad; Haider, Muhammad Saleem; Shahid, Ahmad Ali

    2013-02-01

    Though Catharanthus roseus (Madagascar periwinkle) is an ornamental plant, it is famous for its medicinal value. Its alkaloids are known for anti-cancerous properties, and this plant is studied mainly for its alkaloids. Here, this plant has been studied for its viral diseases. Complete DNA sequences of two begomoviruses infecting C. roseus originating from Pakistan were determined. The sequence of one begomovirus (clone KN4) shows the highest level of nucleotide sequence identity (86.5 %) to an unpublished virus, chili leaf curl India virus (ChiLCIV), and then (84.4 % identity) to papaya leaf curl virus (PaLCV), and thus represents a new species, for which the name "Catharanthus yellow mosaic virus" (CYMV) is proposed. The sequence of another begomovirus (clone KN6) shows the highest level of sequence identity (95.9 % to 99 %) to a newly reported virus from India, papaya leaf crumple virus (PaLCrV). Sequence analysis shows that KN4 and KN6 are recombinants of Pedilanthus leaf curl virus (PedLCV) and croton yellow vein mosaic virus (CrYVMV).

  14. Mining for single nucleotide polymorphisms and insertions/deletions in maize expressed sequence tag data.

    PubMed

    Batley, Jacqueline; Barker, Gary; O'Sullivan, Helen; Edwards, Keith J; Edwards, David

    2003-05-01

    We have developed a computer based method to identify candidate single nucleotide polymorphisms (SNPs) and small insertions/deletions from expressed sequence tag data. Using a redundancy-based approach, valid SNPs are distinguished from erroneous sequence by their representation multiple times in an alignment of sequence reads. A second measure of validity was also calculated based on the cosegregation of the SNP pattern between multiple SNP loci in an alignment. The utility of this method was demonstrated by applying it to 102,551 maize (Zea mays) expressed sequence tag sequences. A total of 14,832 candidate polymorphisms were identified with an SNP redundancy score of two or greater. Segregation of these SNPs with haplotype indicates that candidate SNPs with high redundancy and cosegregation confidence scores are likely to represent true SNPs. This was confirmed by validation of 264 candidate SNPs from 27 loci, with a range of redundancy and cosegregation scores, in four inbred maize lines. The SNP transition/transversion ratio and insertion/deletion size frequencies correspond to those observed by direct sequencing methods of SNP discovery and suggest that the majority of predicted SNPs and insertion/deletions identified using this approach represent true genetic variation in maize.

  15. A simple ABO genotyping by PCR using sequence-specific primers with mismatched nucleotides.

    PubMed

    Taki, Takashi; Kibayashi, Kazuhiko

    2014-05-01

    In forensics, the specific ABO blood group is often determined by analyzing the ABO gene. Among various methods used, PCR employing sequence-specific primers (PCR-SSP) is simpler than other methods for ABO typing. When performing the PCR-SSP, the pseudo-positive signals often lead to errors in ABO typing. We introduced mismatched nucleotides at the second and the third positions from the 3'-end of the primers for the PCR-SSP method and examined whether reliable typing could be achieved by suppressing pseudo-positive signals. Genomic DNA was extracted from nail clippings of 27 volunteers, and the ABO gene was examined with PCR-SSP employing primers with and without mismatched nucleotides. The ABO blood group of the nail clippings was also analyzed serologically, and these results were compared with those obtained using PCR-SSP. When mismatched primers were employed for amplification, the results of the ABO typing matched with those obtained by the serological method. When primers without mismatched nucleotides were used for PCR-SSP, pseudo-positive signals were observed. Thus our method may be used for achieving more reliable ABO typing.

  16. Optimizing nucleotide sequence ensembles for combinatorial protein libraries using a genetic algorithm.

    PubMed

    Craig, Roger A; Lu, Jin; Luo, Jinquan; Shi, Lei; Liao, Li

    2010-01-01

    Protein libraries are essential to the field of protein engineering. Increasingly, probabilistic protein design is being used to synthesize combinatorial protein libraries, which allow the protein engineer to explore a vast space of amino acid sequences, while at the same time placing restrictions on the amino acid distributions. To this end, if site-specific amino acid probabilities are input as the target, then the codon nucleotide distributions that match this target distribution can be used to generate a partially randomized gene library. However, it turns out to be a highly nontrivial computational task to find the codon nucleotide distributions that exactly matches a given target distribution of amino acids. We first showed that for any given target distribution an exact solution may not exist at all. Formulated as a constrained optimization problem, we then developed a genetic algorithm-based approach to find codon nucleotide distributions that match as closely as possible to the target amino acid distribution. As compared with the previous gradient descent method on various objective functions, the new method consistently gave more optimized distributions as measured by the relative entropy between the calculated and the target distributions. To simulate the actual lab solutions, new objective functions were designed to allow for two separate sets of codons in seeking a better match to the target amino acid distribution.

  17. Mining of haplotype-based expressed sequence tag single nucleotide polymorphisms in citrus.

    PubMed

    Chen, Chunxian; Gmitter, Fred G

    2013-11-01

    Single nucleotide polymorphisms (SNPs), the most abundant variations in a genome, have been widely used in various studies. Detection and characterization of citrus haplotype-based expressed sequence tag (EST) SNPs will greatly facilitate further utilization of these gene-based resources. In this paper, haplotype-based SNPs were mined out of publicly available citrus expressed sequence tags (ESTs) from different citrus cultivars (genotypes) individually and collectively for comparison. There were a total of 567,297 ESTs belonging to 27 cultivars in varying numbers and consequentially yielding different numbers of haplotype-based quality SNPs. Sweet orange (SO) had the most (213,830) ESTs, generating 11,182 quality SNPs in 3,327 out of 4,228 usable contigs. Summed from all the individually mining results, a total of 25,417 quality SNPs were discovered - 15,010 (59.1%) were transitions (AG and CT), 9,114 (35.9%) were transversions (AC, GT, CG, and AT), and 1,293 (5.0%) were insertion/deletions (indels). A vast majority of SNP-containing contigs consisted of only 2 haplotypes, as expected, but the percentages of 2 haplotype contigs varied widely in these citrus cultivars. BLAST of the 25,417 25-mer SNP oligos to the Clementine reference genome scaffolds revealed 2,947 SNPs had "no hits found", 19,943 had 1 unique hit / alignment, 1,571 had one hit and 2+ alignments per hit, and 956 had 2+ hits and 1+ alignment per hit. Of the total 24,293 scaffold hits, 23,955 (98.6%) were on the main scaffolds 1 to 9, and only 338 were on 87 minor scaffolds. Most alignments had 100% (25/25) or 96% (24/25) nucleotide identities, accounting for 93% of all the alignments. Considering almost all the nucleotide discrepancies in the 24/25 alignments were at the SNP sites, it served well as in silico validation of these SNPs, in addition to and consistent with the rate (81%) validated by sequencing and SNaPshot assay. High-quality EST-SNPs from different citrus genotypes were detected, and

  18. Mining of haplotype-based expressed sequence tag single nucleotide polymorphisms in citrus

    PubMed Central

    2013-01-01

    Background Single nucleotide polymorphisms (SNPs), the most abundant variations in a genome, have been widely used in various studies. Detection and characterization of citrus haplotype-based expressed sequence tag (EST) SNPs will greatly facilitate further utilization of these gene-based resources. Results In this paper, haplotype-based SNPs were mined out of publicly available citrus expressed sequence tags (ESTs) from different citrus cultivars (genotypes) individually and collectively for comparison. There were a total of 567,297 ESTs belonging to 27 cultivars in varying numbers and consequentially yielding different numbers of haplotype-based quality SNPs. Sweet orange (SO) had the most (213,830) ESTs, generating 11,182 quality SNPs in 3,327 out of 4,228 usable contigs. Summed from all the individually mining results, a total of 25,417 quality SNPs were discovered – 15,010 (59.1%) were transitions (AG and CT), 9,114 (35.9%) were transversions (AC, GT, CG, and AT), and 1,293 (5.0%) were insertion/deletions (indels). A vast majority of SNP-containing contigs consisted of only 2 haplotypes, as expected, but the percentages of 2 haplotype contigs varied widely in these citrus cultivars. BLAST of the 25,417 25-mer SNP oligos to the Clementine reference genome scaffolds revealed 2,947 SNPs had “no hits found”, 19,943 had 1 unique hit / alignment, 1,571 had one hit and 2+ alignments per hit, and 956 had 2+ hits and 1+ alignment per hit. Of the total 24,293 scaffold hits, 23,955 (98.6%) were on the main scaffolds 1 to 9, and only 338 were on 87 minor scaffolds. Most alignments had 100% (25/25) or 96% (24/25) nucleotide identities, accounting for 93% of all the alignments. Considering almost all the nucleotide discrepancies in the 24/25 alignments were at the SNP sites, it served well as in silico validation of these SNPs, in addition to and consistent with the rate (81%) validated by sequencing and SNaPshot assay. Conclusions High-quality EST-SNPs from different

  19. Nucleotide sequence alignment of hdcA from Gram-positive bacteria.

    PubMed

    Diaz, Maria; Ladero, Victor; Redruello, Begoña; Sanchez-Llana, Esther; Del Rio, Beatriz; Fernandez, Maria; Martin, Maria Cruz; Alvarez, Miguel A

    2016-03-01

    The decarboxylation of histidine -carried out mainly by some gram-positive bacteria- yields the toxic dietary biogenic amine histamine (Ladero et al. 2010 〈10.2174/157340110791233256〉 [1], Linares et al. 2016 〈http://dx.doi.org/10.1016/j.foodchem.2015.11.013〉〉 [2]). The reaction is catalyzed by a pyruvoyl-dependent histidine decarboxylase (Linares et al. 2011 〈10.1080/10408398.2011.582813〉 [3]), which is encoded by the gene hdcA. In order to locate conserved regions in the hdcA gene of Gram-positive bacteria, this article provides a nucleotide sequence alignment of all the hdcA sequences from Gram-positive bacteria present in databases. For further utility and discussion, see 〈http://dx.doi.org/ 10.1016/j.foodcont.2015.11.035〉〉 [4].

  20. Nucleotide sequence alignment of hdcA from Gram-positive bacteria

    PubMed Central

    Diaz, Maria; Ladero, Victor; Redruello, Begoña; Sanchez-Llana, Esther; del Rio, Beatriz; Fernandez, Maria; Martin, Maria Cruz; Alvarez, Miguel A.

    2016-01-01

    The decarboxylation of histidine -carried out mainly by some gram-positive bacteria- yields the toxic dietary biogenic amine histamine (Ladero et al. 2010 〈10.2174/157340110791233256〉 [1], Linares et al. 2016 〈http://dx.doi.org/10.1016/j.foodchem.2015.11.013〉〉 [2]). The reaction is catalyzed by a pyruvoyl-dependent histidine decarboxylase (Linares et al. 2011 〈10.1080/10408398.2011.582813〉 [3]), which is encoded by the gene hdcA. In order to locate conserved regions in the hdcA gene of Gram-positive bacteria, this article provides a nucleotide sequence alignment of all the hdcA sequences from Gram-positive bacteria present in databases. For further utility and discussion, see 〈http://dx.doi.org/ 10.1016/j.foodcont.2015.11.035〉〉 [4]. PMID:26958625

  1. Nucleotide sequences of three tRNA(Ser) from Drosophila melanogaster reading the six serine codons.

    PubMed

    Cribbs, D L; Gillam, I C; Tener, G M

    1987-10-05

    The nucleotide sequences of three serine tRNAs from Drosophila melanogaster, together capable of decoding the six serine codons, were determined. tRNA(Ser)2b has the anticodon GCU, tRNA(Ser)4 has CGA and tRNA(Ser)7 has IGA. tRNA(Ser)2b differs from the last two by about 25%. However, tRNA(Ser)4 and tRNA(Ser)7 are 96% homologous, differing only at the first position of the anticodon and two other sites. This unusual sequence relationship suggests, together with similar pairs in the yeasts Schizosaccharomyces pombe and Saccharomyces cerevisiae, that eukaryotic tRNA(Ser)UCN may be undergoing concerted evolution.

  2. Complete nucleotide sequence of a new variant of grapevine fanleaf virus from northeastern China.

    PubMed

    Zhou, Jun; Fan, Xudong; Dong, Yafeng; Zhang, Zunping; Ren, Fang; Hu, Guojun; Li, Zhengnan

    2017-02-01

    The complete RNA1 and RNA2 sequences of a new grapevine fanleaf virus isolate (GFLV-SDHN) from northeastern China were determined. The two RNAs are 7,367 and 3,788 nucleotides (nt) in length, respectively, excluding the poly(A) tails. Compared to other GFLV isolates, GFLV-SDHN has a 22- to 24-nt insertion in the RNA1 5' untranslated region, and there was 19.1-20.1 % and 11.7 %-13.0 % sequence divergence in RNA1, and 15.5 %-20.5 % and 8.5-13.5 % in RNA2, at the nt and amino acid level, respectively. Phylogenetic analysis revealed that the origins of GFLV-SDHN are distinct from those of other GFLV isolates. One recombination event was identified in the 2A(HP) region of RNA2 in GFLV-SDHN.

  3. The complete nucleotide sequence and genome organization of pea streak virus (genus Carlavirus).

    PubMed

    Su, Li; Li, Zhengnan; Bernardy, Mike; Wiersma, Paul A; Cheng, Zhihui; Xiang, Yu

    2015-10-01

    Pea streak virus (PeSV) is a member of the genus Carlavirus in the family Betaflexiviridae. Here, the first complete genome sequence of PeSV was determined by deep sequencing of a cDNA library constructed from dsRNA extracted from a PeSV-infected sample and Rapid Amplification of cDNA Ends (RACE) PCR. The PeSV genome consists of 8041 nucleotides excluding the poly(A) tail and contains six open reading frames (ORFs). The putative peptide encoded by the PeSV ORF6 has an estimated molecular mass of 6.6 kDa and shows no similarity to any known proteins. This differs from typical carlaviruses, whose ORF6 encodes a 12- to 18-kDa cysteine-rich nucleic-acid-binding protein.

  4. Nucleotide-resolution DNA double-strand break mapping by next-generation sequencing.

    PubMed

    Crosetto, Nicola; Mitra, Abhishek; Silva, Maria Joao; Bienko, Magda; Dojer, Norbert; Wang, Qi; Karaca, Elif; Chiarle, Roberto; Skrzypczak, Magdalena; Ginalski, Krzysztof; Pasero, Philippe; Rowicka, Maga; Dikic, Ivan

    2013-04-01

    We present a genome-wide approach to map DNA double-strand breaks (DSBs) at nucleotide resolution by a method we termed BLESS (direct in situ breaks labeling, enrichment on streptavidin and next-generation sequencing). We validated and tested BLESS using human and mouse cells and different DSBs-inducing agents and sequencing platforms. BLESS was able to detect telomere ends, Sce endonuclease-induced DSBs and complex genome-wide DSB landscapes. As a proof of principle, we characterized the genomic landscape of sensitivity to replication stress in human cells, and we identified >2,000 nonuniformly distributed aphidicolin-sensitive regions (ASRs) overrepresented in genes and enriched in satellite repeats. ASRs were also enriched in regions rearranged in human cancers, with many cancer-associated genes exhibiting high sensitivity to replication stress. Our method is suitable for genome-wide mapping of DSBs in various cells and experimental conditions, with a specificity and resolution unachievable by current techniques.

  5. Overproduction and nucleotide sequence of the respiratory D-lactate dehydrogenase of Escherichia coli.

    PubMed Central

    Rule, G S; Pratt, E A; Chin, C C; Wold, F; Ho, C

    1985-01-01

    Recombinant DNA plasmids containing the gene for the membrane-bound D-lactate dehydrogenase (D-LDH) of Escherichia coli linked to the promoter PL from lambda were constructed. After induction, the levels of D-LDH were elevated 300-fold over that of the wild type and amounted to 35% of the total cellular protein. The nucleotide sequence of the D-LDH gene was determined and shown to agree with the amino acid composition and the amino-terminal sequence of the purified enzyme. Removal of the amino-terminal formyl-Met from D-LDH was not inhibited in cells which contained these high levels of D-LDH. Images PMID:3882663

  6. Using mitochondrial nucleotide sequences to investigate diversity and genealogical relationships within common carp (Cyprinus carpio L.).

    PubMed

    Thai, B T; Burridge, C P; Pham, T A; Austin, C M

    2005-02-01

    Direct sequencing of mitochondrial DNA (mtDNA) D-loop (745 bp) and MTATPase6/MTATPase8 (857 bp) regions was used to investigate genetic variation within common carp and develop a global genealogy of common carp strains. The D-loop region was more variable than the MTATPase6/MTATPase8 region, but given the wide distribution of carp the overall levels of sequence divergence were low. Levels of haplotype diversity varied widely among countries with Chinese, Indonesian and Vietnamese carp showing the greatest diversity whereas Japanese Koi and European carp had undetectable nucleotide variation. A genealogical analysis supports a close relationship between Vietnamese, Koi and Chinese Color carp strains and to a lesser extent, European carp. Chinese and Indonesian carp strains were the most divergent, and their relationships do not support the evolution of independent Asian and European lineages and current taxonomic treatments.

  7. [Nucleotide sequence of HLA-DQA1 promoter region (QAP) in a lung cancer patient].

    PubMed

    Qiu, C; Zhou, W; Song, C

    1996-06-01

    The HLA-DQA1 allele and nucleotide sequence of HLA-DQA1 promoter region (QAP) in a patient with IDDM complicated lung cancer have been identified by PCR/SSCP, PCR/SSCP and PCR/sequencing. The results showed that: (1) All of the lung cancer patient and his family members carried HLA-DQA1* 0301/0501 alleles. (2) a single base substitution G-->A at position -155 and deletion CAA at position -161 to -163 occurred in the patient. These results suggest that the mutation of HLA-DQA1 promoter region may modulate HLA-DQA1 gene expression by trans-acting factors binding to variant cis-acting elements and may be responsible for pathogenesis of lung cancer.

  8. Unique nucleotide sequence (UNS)-guided assembly of repetitive DNA parts for synthetic biology applications

    PubMed Central

    Torella, Joseph P.; Lienert, Florian; Boehm, Christian R.; Chen, Jan-Hung; Way, Jeffrey C.; Silver, Pamela A.

    2016-01-01

    Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts and hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies — for example repeated terminator and insulator sequences — that complicate recombination-based assembly. We and others have recently developed DNA assembly methods that we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked with UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly-assembled constructs, or into high-quality combinatorial libraries in only 2–3 days. If the DNA parts must be generated from scratch, an additional 2–5 days are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques. PMID:25101822

  9. Nucleotide sequence of ermA, a macrolide-lincosamide-streptogramin B determinant in Staphylococcus aureus.

    PubMed

    Murphy, E

    1985-05-01

    The complete nucleotide sequence of ermA, the prototype macrolide-lincosamide-streptogramin B resistance gene from Staphylococcus aureus, has been determined. The sequence predicts a 243-amino-acid protein that is homologous to those specified by ermC, ermAM, and ermD, resistance determinants from Staphylococcus aureus, Streptococcus sanguis, and Bacillus licheniformis, respectively. The ermA transcript, identified by Northern analysis and S1 mapping, contains a 5' leader sequence of 211 bases which has the potential to encode two short peptides of 15 and 19 amino acids; the second, longer peptide has 13 amino acids in common with the putative regulatory leader peptide of ermC. The coding sequence for this peptide is deleted in several mutants in which macrolide-lincosamide-streptogramin B resistance is constitutively expressed. Potential secondary structures available to the leader sequence of the wild-type (inducible) transcript and to constitutive deletion, insertion, and point mutations provide additional support for the translational attenuation model for induction of macrolide-lincosamide-streptogramin B resistance.

  10. Nucleotide sequence of ermA, a macrolide-lincosamide-streptogramin B determinant in Staphylococcus aureus.

    PubMed Central

    Murphy, E

    1985-01-01

    The complete nucleotide sequence of ermA, the prototype macrolide-lincosamide-streptogramin B resistance gene from Staphylococcus aureus, has been determined. The sequence predicts a 243-amino-acid protein that is homologous to those specified by ermC, ermAM, and ermD, resistance determinants from Staphylococcus aureus, Streptococcus sanguis, and Bacillus licheniformis, respectively. The ermA transcript, identified by Northern analysis and S1 mapping, contains a 5' leader sequence of 211 bases which has the potential to encode two short peptides of 15 and 19 amino acids; the second, longer peptide has 13 amino acids in common with the putative regulatory leader peptide of ermC. The coding sequence for this peptide is deleted in several mutants in which macrolide-lincosamide-streptogramin B resistance is constitutively expressed. Potential secondary structures available to the leader sequence of the wild-type (inducible) transcript and to constitutive deletion, insertion, and point mutations provide additional support for the translational attenuation model for induction of macrolide-lincosamide-streptogramin B resistance. Images PMID:2985541

  11. Computational generation and screening of RNA motifs in large nucleotide sequence pools

    PubMed Central

    Kim, Namhee; Izzo, Joseph A.; Elmetwaly, Shereef; Gan, Hin Hark; Schlick, Tamar

    2010-01-01

    Although identification of active motifs in large random sequence pools is central to RNA in vitro selection, no systematic computational equivalent of this process has yet been developed. We develop a computational approach that combines target pool generation, motif scanning and motif screening using secondary structure analysis for applications to 1012–1014-sequence pools; large pool sizes are made possible using program redesign and supercomputing resources. We use the new protocol to search for aptamer and ribozyme motifs in pools up to experimental pool size (1014 sequences). We show that motif scanning, structure matching and flanking sequence analysis, respectively, reduce the initial sequence pool by 6–8, 1–2 and 1 orders of magnitude, consistent with the rare occurrence of active motifs in random pools. The final yields match the theoretical yields from probability theory for simple motifs and overestimate experimental yields, which constitute lower bounds, for aptamers because screening analyses beyond secondary structure information are not considered systematically. We also show that designed pools using our nucleotide transition probability matrices can produce higher yields for RNA ligase motifs than random pools. Our methods for generating, analyzing and designing large pools can help improve RNA design via simulation of aspects of in vitro selection. PMID:20448026

  12. Nucleotide sequence analysis of beta tubulin gene in a wide range of dermatophytes.

    PubMed

    Rezaei-Matehkolaei, Ali; Mirhendi, Hossein; Makimura, Koichi; de Hoog, G Sybren; Satoh, Kazuo; Najafzadeh, Mohammad Javad; Shidfar, Mohammad Reza

    2014-10-01

    We investigated the resolving power of the beta tubulin protein-coding gene (BT2) for systematic study of dermatophyte fungi. Initially, 144 standard and clinical strains belonging to 26 species in the genera Trichophyton, Microsporum, and Epidermophyton were identified by internal transcribe spacer (ITS) sequencing. Subsequently, BT2 was partially amplified in all strains, and sequence analysis performed after construction of a BT2 database that showed length ranged from approximately 723 (T. ajelloi) to 808 nucleotides (M. persicolor) in different species. Intraspecific sequence variation was found in some species, but T. tonsurans, T. equinum, T. concentricum, T. verrucosum, T. rubrum, T. violaceum, T. eriotrephon, E. floccosum, M. canis, M. ferrugineum, and M. audouinii were invariant. The sequences were found to be relatively conserved among different strains of the same species. The species with the closest resemblance were Arthroderma benhamiae and T. concentricum and T. tonsurans and T. equinum with 100% and 99.8% identity, respectively; the most distant species were M. persicolor and M. amazonicum. The dendrogram obtained from BT2 topology was almost compatible with the species concept based on ITS sequencing, and similar clades and species were distinguished in the BT2 tree. Here, beta tubulin was characterized in a wide range of dermatophytes in order to assess intra- and interspecies variation and resolution and was found to be a taxonomically valuable gene.

  13. Organization and nucleotide sequence analysis of a ribosomal RNA gene cluster from Streptomyces ambofaciens.

    PubMed

    Pernodet, J L; Boccard, F; Alegre, M T; Gagnat, J; Guérineau, M

    1989-06-30

    The Streptomyces ambofaciens genome contains four rRNA gene clusters. These copies are called rrnA, B, C and D. The complete nucleotide (nt) sequence of rrnD has been determined. These genes possess striking similarity with other eubacterial rRNA genes. Comparison with other rRNA sequences allowed the putative localization of the sequences encoding mature rRNAs. The structural genes are arranged in the order 16S-23S-5S and are tightly linked. The mature rRNAs are predicted to contain 1528, 3120 and 120 nt, for the 16S, 23S and 5S rRNAs, respectively. The 23S rRNA is, to our knowledge, the longest of all sequenced prokaryotic 23S rRNAs. When compared to other large rRNAs it shows insertions at positions where they are also present in archaebacterial and in eukaryotic large rRNAs. Secondary structure models of S. ambofaciens rRNAs are proposed, based upon those existing for other bacterial rRNAs. Positions of putative transcription start points and of a termination signal are suggested. The corresponding putative primary transcript, containing the 16S, 23S and 5S rRNAs plus flanking regions, was folded into a secondary structure, and sequences possibly involved in rRNA maturation are described. The G + C content of the rRNA gene cluster is low (57%) compared with the overall G + C content of Streptomyces DNA (73%).

  14. Nucleotide and deduced amino acid sequences of Torpedo californica acetylcholine receptor gamma subunit.

    PubMed Central

    Claudio, T; Ballivet, M; Patrick, J; Heinemann, S

    1983-01-01

    The nucleotide sequence has been determined of a cDNA clone that codes for the 60,000-dalton gamma subunit of Torpedo californica acetylcholine receptor. The length of the cDNA clone is 2,010 base pairs. The 5' and 3' untranslated regions have respective lengths of 31 and 461 base pairs. Data suggest that the putative polyadenylylation consensus sequence A-A-T-A-A-A may not be required for polyadenylylation of the mRNA corresponding to the cDNA clone described in this study. From the DNA sequence data, the amino acid sequence of the gamma subunit was deduced. The subunit is composed of 489 amino acids giving a molecular mass of 56,600 daltons. The deduced amino acid sequence data also indicate the presence of a 17-amino acid extension or signal peptide on this subunit. From these data, structural predictions for the gamma subunit are made such as potential membrane-spanning regions, possible asparagine-linked glycosylation sites, and the assignment of regions of the protein to the extracellular, internal, and cytoplasmic domains of the lipid bilayer. Images PMID:6573658

  15. Nucleotide sequence of a cloned cDNA for proopiomelanocortin precursor of chum salmon, Onchorynchus keta.

    PubMed Central

    Soma, G I; Kitahara, N; Nishizawa, T; Nanami, H; Kotake, C; Okazaki, H; Andoh, T

    1984-01-01

    We have isolated a cDNA clone encoding salmon proopiomelanocortin precursor. Polyadenylated RNA was isolated from pituitary neurointermediate lobes and used to construct a cDNA library. The library was screened with 17 mer of oligodeoxyribonucleotides specific for the hexapeptide sequence in salmon beta-endorphin I, Phe-Met-Lys-Pro-Tyr-Thr at positions 4-9 excluding the third nucleotide. One positive clone, pSSM17 containing an insert of 1303 base pairs (bp) was characterized. Sequence determination revealed that it possessed sequences covering the entire regions encoding ACTH and beta-lipotropin and that the mRNA had the same overall organization as those of other mammalian species, i.e., the following peptide hormones were arranged in order from 5' upstream, ACTH including alpha-melanotropin and corticotropin-like intermediate lobe peptide, beta-lipotropin including gamma-lipotropin, beta-melanotropin and beta-endorphin. Amino acid sequences for putative salmon ACTH, beta-, and gamma-lipotropin were predicted. Comparison of the salmon mRNA sequence with those of mammals showed that the regions of alpha- and beta-MSH are relatively homologous, but other regions are much less so, especially in the 3' nontranslated region where it is much longer and completely heterologous. Images PMID:6095185

  16. Unique nucleotide sequence-guided assembly of repetitive DNA parts for synthetic biology applications

    SciTech Connect

    Torella, JP; Lienert, F; Boehm, CR; Chen, JH; Way, JC; Silver, PA

    2014-08-07

    Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts, and they hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies-for example, repeated terminator and insulator sequences-that complicate recombination-based assembly. We and others have recently developed DNA assembly methods, which we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked with UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly assembled constructs, or into high-quality combinatorial libraries in only 2-3 d. If the DNA parts must be generated from scratch, an additional 2-5 d are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques.

  17. The nucleotide sequence surrounding the replication origin of the cop3 mutant of the bacteriocinogenic plasmid Clo DF13.

    PubMed Central

    Stuitje, A R; Veltkamp, E; Maat, J; Heyneker, H L

    1980-01-01

    The nucleotide sequence from about 100 base-pairs downstream to about 600 base pairs upstream the CloDF13 replication origin has been determined. A comparison of this sequence with the corresponding ColE1 origin sequence reveals that: The sequence at the origin of replication is conserved. There are large differences in the nucleotide sequence downstream the replication origin, whereas there is a large homology in the region of about 410 base-pairs upstream the replication origin. This conserved region might code for a largely homologous basic, arginine rich polypeptide of about 45 amino-acids, for both ColE1 and CloDF13. Although there are large differences in the primary structure of the region coding for the 100 nucleotide RNA, the secondary structure of this region seems to be conserved. Images PMID:6253936

  18. The nucleotide sequence surrounding the replication origin of the cop3 mutant of the bacteriocinogenic plasmid Clo DF13.

    PubMed

    Stuitje, A R; Veltkamp, E; Maat, J; Heyneker, H L

    1980-04-11

    The nucleotide sequence from about 100 base-pairs downstream to about 600 base pairs upstream the CloDF13 replication origin has been determined. A comparison of this sequence with the corresponding ColE1 origin sequence reveals that: The sequence at the origin of replication is conserved. There are large differences in the nucleotide sequence downstream the replication origin, whereas there is a large homology in the region of about 410 base-pairs upstream the replication origin. This conserved region might code for a largely homologous basic, arginine rich polypeptide of about 45 amino-acids, for both ColE1 and CloDF13. Although there are large differences in the primary structure of the region coding for the 100 nucleotide RNA, the secondary structure of this region seems to be conserved.

  19. Mapping DNA methylation by transverse current sequencing: Reduction of noise from neighboring nucleotides

    NASA Astrophysics Data System (ADS)

    Alvarez, Jose; Massey, Steven; Kalitsov, Alan; Velev, Julian

    Nanopore sequencing via transverse current has emerged as a competitive candidate for mapping DNA methylation without needed bisulfite-treatment, fluorescent tag, or PCR amplification. By eliminating the error producing amplification step, long read lengths become feasible, which greatly simplifies the assembly process and reduces the time and the cost inherent in current technologies. However, due to the large error rates of nanopore sequencing, single base resolution has not been reached. A very important source of noise is the intrinsic structural noise in the electric signature of the nucleotide arising from the influence of neighboring nucleotides. In this work we perform calculations of the tunneling current through DNA molecules in nanopores using the non-equilibrium electron transport method within an effective multi-orbital tight-binding model derived from first-principles calculations. We develop a base-calling algorithm accounting for the correlations of the current through neighboring bases, which in principle can reduce the error rate below any desired precision. Using this method we show that we can clearly distinguish DNA methylation and other base modifications based on the reading of the tunneling current.

  20. Mapping Nucleotide Sequences that Encode Complex Binary Disease Traits with HapMap

    PubMed Central

    Cui, Yuehua; Fu, Wenjiang; Sun, Kelian; Romero, Roberto; Wu, Rongling

    2007-01-01

    Detecting the patterns of DNA sequence variants across the human genome is a crucial step for unraveling the genetic basis of complex human diseases. The human HapMap constructed by single nucleotide polymorphisms (SNPs) provides efficient sequence variation information that can speed up the discovery of genes related to common diseases. In this article, we present a generalized linear model for identifying specific nucleotide variants that encode complex human diseases. A novel approach is derived to group haplotypes to form composite diplotypes, which largely reduces the model degrees of freedom for an association test and hence increases the power when multiple SNP markers are involved. An efficient two-stage estimation procedure based on the expectation-maximization (EM) algorithm is derived to estimate parameters. Non-genetic environmental or clinical risk factors can also be fitted into the model. Computer simulations show that our model has reasonable power and type I error rate with appropriate sample size. It is also suggested through simulations that a balanced design with approximately equal number of cases and controls should be preferred to maintain small estimation bias and reasonable testing power. To illustrate the utility, we apply the method to a genetic association study of large for gestational age (LGA) neonates. The model provides a powerful tool for elucidating the genetic basis of complex binary diseases. PMID:19384427

  1. Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase.

    PubMed Central

    Clark, A G; Weiss, K M; Nickerson, D A; Taylor, S L; Buchanan, A; Stengård, J; Salomaa, V; Vartiainen, E; Perola, M; Boerwinkle, E; Sing, C F

    1998-01-01

    Allelic variation in 9.7 kb of genomic DNA sequence from the human lipoprotein lipase gene (LPL) was scored in 71 healthy individuals (142 chromosomes) from three populations: African Americans (24) from Jackson, MS; Finns (24) from North Karelia, Finland; and non-Hispanic Whites (23) from Rochester, MN. The sequences had a total of 88 variable sites, with a nucleotide diversity (site-specific heterozygosity) of .002+/-.001 across this 9.7-kb region. The frequency spectrum of nucleotide variation exhibited a slight excess of heterozygosity, but, in general, the data fit expectations of the infinite-sites model of mutation and genetic drift. Allele-specific PCR helped resolve linkage phases, and a total of 88 distinct haplotypes were identified. For 1,410 (64%) of the 2,211 site pairs, all four possible gametes were present in these haplotypes, reflecting a rich history of past recombination. Despite the strong evidence for recombination, extensive linkage disequilibrium was observed. The number of haplotypes generally is much greater than the number expected under the infinite-sites model, but there was sufficient multisite linkage disequilibrium to reveal two major clades, which appear to be very old. Variation in this region of LPL may depart from the variation expected under a simple, neutral model, owing to complex historical patterns of population founding, drift, selection, and recombination. These data suggest that the design and interpretation of disease-association studies may not be as straightforward as often is assumed. PMID:9683608

  2. Nucleotide sequence of a yeast Ty element: evidence for an unusual mechanism of gene expression.

    PubMed Central

    Clare, J; Farabaugh, P

    1985-01-01

    We have determined the DNA sequence of the transposable element Ty912 of yeast. The 5918-base-pair element encodes two genes, tya912 and tyb912, which specify proteins similar to sequence-specific DNA-binding proteins of Escherichia coli and retroviral reverse transcriptases, respectively. The tyb912 gene is atypical of eukaryotic genes since (i) it begins 1336 nucleotides into the Ty912 mRNA (i.e., downstream of the tya912 gene) and (ii) the first in-frame AUG is 921 nucleotides into the coding frame. Protein blot analysis of Ty-lacZ fusions shows that the tyb912 gene is translated starting at the 5' end of the tya912 gene and that the primary translational product is a tya912::tyb912 fusion protein. We have shown that synthesis of this fusion protein probably does not occur by RNA splicing. The data are consistent with a mechanism of translational frameshifting occurring within the region of overlap between the 3' end of tya912 and the 5' end of tyb912. Images PMID:2581255

  3. Evidence for Balancing Selection from Nucleotide Sequence Analyses of Human G6PD

    PubMed Central

    Verrelli, Brian C.; McDonald, John H.; Argyropoulos, George; Destro-Bisol, Giovanni; Froment, Alain; Drousiotou, Anthi; Lefranc, Gerard; Helal, Ahmed N.; Loiselet, Jacques; Tishkoff, Sarah A.

    2002-01-01

    Glucose-6-phosphate dehydrogenase (G6PD) mutations that result in reduced enzyme activity have been implicated in malarial resistance and constitute one of the best examples of selection in the human genome. In the present study, we characterize the nucleotide diversity across a 5.2-kb region of G6PD in a sample of 160 Africans and 56 non-Africans, to determine how selection has shaped patterns of DNA variation at this gene. Our global sample of enzymatically normal B alleles and A, A−, and Med alleles with reduced enzyme activities reveals many previously uncharacterized silent-site polymorphisms. In comparison with the absence of amino acid divergence between human and chimpanzee G6PD sequences, we find that the number of G6PD amino acid polymorphisms in human populations is significantly high. Unlike many other G6PD-activity alleles with reduced activity, we find that the age of the A variant, which is common in Africa, may not be consistent with the recent emergence of severe malaria and therefore may have originally had a historically different adaptive function. Overall, our observations strongly support previous genotype-phenotype association studies that proposed that balancing selection maintains G6PD deficiencies within human populations. The present study demonstrates that nucleotide sequence analyses can reveal signatures of both historical and recent selection in the genome and may elucidate the impact that infectious disease has had during human evolution. PMID:12378426

  4. Nucleotide sequence at the termini of the DNA of Bacillus subtilis phage phi 29.

    PubMed Central

    Escarmís, C; Salas, M

    1981-01-01

    Phage phi 29 DNA cannot be phosphorylated with polynucleotide kinase and [gamma-32P]ATP because of the presence of a viral protein covalently linked to the 5' termini. The 5' ends can, however, be made susceptible to phosphorylation by treatment with alkali and alkaline phosphatase. Restriction fragments Hpa II C and Hpa II F, corresponding to the right and left ends of phi 29 DNA, respectively, were labeled at the 5' ends with polynucleotide kinase and [gamma-32P]ATP or at the 3' ends with terminal transferase and [alpha-32P]ATP or [alpha-32P]cordycepin 5'-triphosphate. After a secondary cleavage of the labeled fragments, the sequence of the first 150-180 nucleotides at the termini of phi 29 DNA was determined by the method of Maxam and Gilbert. The ends of phi 29 DNA are flush, and a six-nucleotides-long inverted terminal repetition was found. The functional implications of the sequences determined are discussed. Images PMID:6262800

  5. Molecular detection and nucleotide sequence analysis of a new Aichi virus closely related to canine kobuvirus in sewage samples.

    PubMed

    Yamashita, Teruo; Adachi, Hirokazu; Hirose, Emi; Nakamura, Noriko; Ito, Miyabi; Yasui, Yoshihiro; Kobayashi, Shinichi; Minagawa, Hiroko

    2014-05-01

    Between 2001 and 2005, 207 raw sewage samples were collected at the inflow of a sewage treatment plant in Aichi Prefecture, Japan. Of the 207 sewage samples, 137 (66.2 %) were found to be positive for amplification of Aichi virus (AiV) nucleotide using reverse transcription (RT)-PCR with 10 forward and 10 reverse primers in the 3D region corresponding to the nucleotide sequence of all kobuviruses. AiV genotype A sequences were detected in all 137 samples. New sequences of AiV were detected in nine samples, exhibiting 83 % similarity with AiV A846/88, but 95 % similarity with canine kobuvirus (CKV) US-PC0082 in this region. The nucleotide sequences from the VP3 region to the 3' untranslated region (UTR) of sewage sample Y12/2004 were determined. The number of nucleotides in each region was the same as that of CKV. The similarity of the nucleotide (amino acid) identity of a complete VP1 region was 90.5 % (94.8 %) between Y12/2004 and CKV US-PC0082. The phylogenic analyses based on the nucleotide and the deduced amino acid sequences of VP1 and 3D showed that Y12/2004 was independent from AiV, but closely related to CKV. These results suggested that CKV is present in Aichi Prefecture, Japan.

  6. QGRS-H Predictor: a web server for predicting homologous quadruplex forming G-rich sequence motifs in nucleotide sequences

    PubMed Central

    Menendez, Camille; Frees, Scott; Bagga, Paramjeet S.

    2012-01-01

    Naturally occurring G-quadruplex structural motifs, formed by guanine-rich nucleic acids, have been reported in telomeric, promoter and transcribed regions of mammalian genomes. G-quadruplex structures have received significant attention because of growing evidence for their role in important biological processes, human disease and as therapeutic targets. Lately, there has been much interest in the potential roles of RNA G-quadruplexes as cis-regulatory elements of post-transcriptional gene expression. Large-scale computational genomics studies on G-quadruplexes have difficulty validating their predictions without laborious testing in ‘wet’ labs. We have developed a bioinformatics tool, QGRS-H Predictor that can map and analyze conserved putative Quadruplex forming 'G'-Rich Sequences (QGRS) in mRNAs, ncRNAs and other nucleotide sequences, e.g. promoter, telomeric and gene flanking regions. Identifying conserved regulatory motifs helps validate computations and enhances accuracy of predictions. The QGRS-H Predictor is particularly useful for mapping homologous G-quadruplex forming sequences as cis-regulatory elements in the context of 5′- and 3′-untranslated regions, and CDS sections of aligned mRNA sequences. QGRS-H Predictor features highly interactive graphic representation of the data. It is a unique and user-friendly application that provides many options for defining and studying G-quadruplexes. The QGRS-H Predictor can be freely accessed at: http://quadruplex.ramapo.edu/qgrs/app/start. PMID:22576365

  7. Identification of protein-interacting nucleotides in a RNA sequence using composition profile of tri-nucleotides.

    PubMed

    Panwar, Bharat; Raghava, Gajendra P S

    2015-04-01

    The RNA-protein interactions play a diverse role in the cells, thus identification of RNA-protein interface is essential for the biologist to understand their function. In the past, several methods have been developed for predicting RNA interacting residues in proteins, but limited efforts have been made for the identification of protein-interacting nucleotides in RNAs. In order to discriminate protein-interacting and non-interacting nucleotides, we used various classifiers (NaiveBayes, NaiveBayesMultinomial, BayesNet, ComplementNaiveBayes, MultilayerPerceptron, J48, SMO, RandomForest, SMO and SVM(light)) for prediction model development using various features and achieved highest 83.92% sensitivity, 84.82 specificity, 84.62% accuracy and 0.62 Matthew's correlation coefficient by SVM(light) based models. We observed that certain tri-nucleotides like ACA, ACC, AGA, CAC, CCA, GAG, UGA, and UUU preferred in protein-interaction. All the models have been developed using a non-redundant dataset and are evaluated using five-fold cross validation technique. A web-server called RNApin has been developed for the scientific community (http://crdd.osdd.net/raghava/rnapin/). Copyright © 2015 Elsevier Inc. All rights reserved.

  8. Identification and nucleotide sequence of the glycoprotein gB gene of equine herpesvirus 4.

    PubMed Central

    Riggio, M P; Cullinane, A A; Onions, D E

    1989-01-01

    The nucleotide sequence of the glycoprotein gB gene of equine herpesvirus 4 (EHV-4) was determined. The gene was located within a BamHI genomic library by a combination of Southern and dot-blot hybridization with probes derived from the herpes simplex virus type 1 (HSV-1) gB DNA sequence. The predominant portion of the coding sequences was mapped to a 2.95-kilobase BamHI-EcoRI subfragment at the left-hand end of BamHI-C. Potential TATA box, CAT box, and mRNA start site sequences and the translational initiation codon were located in the BamHI M fragment of the virus, which is located immediately to the left of BamHI-C. A polyadenylation signal, AATAAA, occurs nine nucleotides past the chain termination codon. Translation of these sequences would give a 110-kilodalton protein possessing a 5' hydrophobic signal sequence, a hydrophilic surface domain containing 11 potential N-linked glycosylation sites, a hydrophobic transmembrane domain, and a 3' highly charged cytoplasmic domain. A potential internal proteolytic cleavage site, Arg-Arg/Ser, was identified at residues 459 to 461. Analysis of this protein revealed amino acid sequence homologies of 47% with HSV-1 gB, 54% with pseudorabies virus gpII, 51% with varicella-zoster virus gpII, 29% with human cytomegalovirus gB, and 30% with Epstein-Barr virus gB. Alignment of EHV-4 gB with HSV-1 (KOS) gB further revealed that four potential N-linked glycosylation sites and all 10 cysteine residues on the external surface of the molecules are perfectly conserved, suggesting that the proteins possess similar secondary and tertiary structures. Thus, we showed that EHV-4 gB is highly conserved with the gB and gpII glycoproteins of other herpesviruses, suggesting that this glycoprotein has a similar overall function in each virus. Images PMID:2915378

  9. The bioinformatics of nucleotide sequence coding for proteins requiring metal coenzymes and proteins embedded with metals

    NASA Astrophysics Data System (ADS)

    Tremberger, G.; Dehipawala, Sunil; Cheung, E.; Holden, T.; Sullivan, R.; Nguyen, A.; Lieberman, D.; Cheung, T.

    2015-09-01

    All metallo-proteins need post-translation metal incorporation. In fact, the isotope ratio of Fe, Cu, and Zn in physiology and oncology have emerged as an important tool. The nickel containing F430 is the prosthetic group of the enzyme methyl coenzyme M reductase which catalyzes the release of methane in the final step of methano-genesis, a prime energy metabolism candidate for life exploration space mission in the solar system. The 3.5 Gyr early life sulfite reductase as a life switch energy metabolism had Fe-Mo clusters. The nitrogenase for nitrogen fixation 3 billion years ago had Mo. The early life arsenite oxidase needed for anoxygenic photosynthesis energy metabolism 2.8 billion years ago had Mo and Fe. The selection pressure in metal incorporation inside a protein would be quantifiable in terms of the related nucleotide sequence complexity with fractal dimension and entropy values. Simulation model showed that the studied metal-required energy metabolism sequences had at least ten times more selection pressure relatively in comparison to the horizontal transferred sequences in Mealybug, guided by the outcome histogram of the correlation R-sq values. The metal energy metabolism sequence group was compared to the circadian clock KaiC sequence group using magnesium atomic level bond shifting mechanism in the protein, and the simulation model would suggest a much higher selection pressure for the energy life switch sequence group. The possibility of using Kepler 444 as an example of ancient life in Galaxy with the associated exoplanets has been proposed and is further discussed in this report. Examples of arsenic metal bonding shift probed by Synchrotron-based X-ray spectroscopy data and Zn controlled FOXP2 regulated pathways in human and chimp brain studied tissue samples are studied in relationship to the sequence bioinformatics. The analysis results suggest that relatively large metal bonding shift amount is associated with low probability correlation R

  10. The nucleotide sequence of a Polish isolate of Tomato torrado virus.

    PubMed

    Budziszewska, Marta; Obrepalska-Steplowska, Aleksandra; Wieczorek, Przemysław; Pospieszny, Henryk

    2008-12-01

    A new virus was isolated from greenhouse tomato plants showing symptoms of leaf and apex necrosis in Wielkopolska province in Poland in 2003. The observed symptoms and the virus morphology resembled viruses previously reported in Spain called Tomato torrado virus (ToTV) and that in Mexico called Tomato marchitez virus (ToMarV). The complete genome of a Polish isolate Wal'03 was determined using RT-PCR amplification using oligonucleotide primers developed against the ToTV sequences deposited in Genbank, followed by cloning, sequencing, and comparison with the sequence of the type isolate. Phylogenetic analyses, performed on the basis of fragments of polyproteins sequences, established the relationship of Polish isolate Wal'03 with Spanish ToTV and Mexican ToMarV, as well as with other viruses from Sequivirus, Sadwavirus, and Cheravirus genera, reported to be the most similar to the new tomato viruses. Wal'03 genome strands has the same organization and very high homology with the ToTV type isolate, showing only some nucleotide and deduced amino acid changes, in contrast to ToMarV, which was significantly different. The phylogenetic tree clustered aforementioned viruses to the same group, indicating that they have a common origin.

  11. Nucleotide sequence divergence and functional constraint in mRNA evolution.

    PubMed Central

    Miyata, T; Yasunaga, T; Nishida, T

    1980-01-01

    Comparison of about 50 pairs of homologous nucleotide sequences for different genes revealed that the substitutions between synonymous codons occurred at much higher rates than did amino acid substitutions. Furthermore, five pairs of mRNA sequences for different genes were compared in species that had diverged at the same time. The evolutionary rate of synonymous substitution was estimated to be 5.1 X 10(-9) per site per year on the average and is approximately constant among different genes. It also is suggested that this property would be suitable for a molecular clock to determine the evolutionary relationships and branching order of duplicated genes. Each functional block of the noncoding region evolves with a rate that is almost constant, regardless of the types of genes. The intervening sequence and the 5' portion of the 3' noncoding region show considerable divergence, the extent of which is almost comparable to that in the synonymous codon sites, whereas the other blocks consisting of the 5' noncoding region and the 3' portion of the 3' noncoding region are strongly conserved, showing approximatley half of the divergence of the synonymous sites. This strong sequence preservation might be due to the functional requirements for transcription and modification of mRNA. PMID:6938980

  12. Increased functional protein expression using nucleotide sequence features enriched in highly expressed genes in zebrafish.

    PubMed

    Horstick, Eric J; Jordan, Diana C; Bergeron, Sadie A; Tabor, Kathryn M; Serpe, Mihaela; Feldman, Benjamin; Burgess, Harold A

    2015-04-20

    Many genetic manipulations are limited by difficulty in obtaining adequate levels of protein expression. Bioinformatic and experimental studies have identified nucleotide sequence features that may increase expression, however it is difficult to assess the relative influence of these features. Zebrafish embryos are rapidly injected with calibrated doses of mRNA, enabling the effects of multiple sequence changes to be compared in vivo. Using RNAseq and microarray data, we identified a set of genes that are highly expressed in zebrafish embryos and systematically analyzed for enrichment of sequence features correlated with levels of protein expression. We then tested enriched features by embryo microinjection and functional tests of multiple protein reporters. Codon selection, releasing factor recognition sequence and specific introns and 3' untranslated regions each increased protein expression between 1.5- and 3-fold. These results suggested principles for increasing protein yield in zebrafish through biomolecular engineering. We implemented these principles for rational gene design in software for codon selection (CodonZ) and plasmid vectors incorporating the most active non-coding elements. Rational gene design thus significantly boosts expression in zebrafish, and a similar approach will likely elevate expression in other animal models.

  13. Nucleotide sequence of the glucoamylase gene GLU1 in the yeast Saccharomycopsis fibuligera.

    PubMed Central

    Itoh, T; Ohtsuki, I; Yamashita, I; Fukui, S

    1987-01-01

    The complete nucleotide sequence of the glucoamylase gene GLU1 from the yeast Saccharomycopsis fibuligera has been determined. The GLU1 DNA hybridized to a polyadenylated RNA of 2.1 kilobases. A single open reading frame codes for a 519-amino-acid protein which contains four potential N-glycosylation sites. The putative precursor begins with a hydrophobic segment that presumably acts as a signal sequence for secretion. Glucoamylase was purified from a culture fluid of the yeast Saccharomyces cerevisiae which had been transformed with a plasmid carrying GLU1. The molecular weight of the protein was 57,000 by both gel filtration and acrylamide gel electrophoresis. The protein was glycosylated with asparagine-linked glycosides whose molecular weight was 2,000. The amino-terminal sequence of the protein began from the 28th amino acid residue from the first methionine of the putative precursor. The amino acid composition of the purified protein matched the predicted amino acid composition. These results confirmed that GLU1 encodes glucoamylase. A comparison of the amino acid sequence of glucoamylases from several fungi and yeast shows five highly conserved regions. One homology region is absent from the yeast enzyme and so may not be essential to glucoamylase function. Images PMID:3114236

  14. Feasibility of mini-sequencing schemes based on nucleotide polymorphisms for microbial identification and population analyses.

    PubMed

    Araujo, Ricardo; Eusebio, Nadia; Caramalho, Rita

    2015-03-01

    Practical schemes based on single nucleotide polymorphisms (SNP) have been proposed as alternatives to simplify and replace the molecular methodologies based on the extensive sequencing analysis of genes. SNaPshot mini-sequencing has been progressively experienced during the last decade and represents a fast and robust strategy to analyze critical polymorphisms. Such assays have been proposed to characterize some bacteria and microbial eukaryotes, and its feasibility was now reviewed in the present manuscript. The mini-sequencing schemes showed high discriminatory power and competence for identification of microorganisms, but some specificity errors were still found, particularly for species of the Burkholderia cepacia complex and mycobacteria. SNP assays designed for other goals, e.g., comparison of strains, detection of serotypes, virulence, epidemic, and phylogenetic-related subgroups of isolates, can be very useful by facilitating the investigation of large collections of isolates. The next-generation of SNP assays might consider the inclusion of large number of markers to fully characterize microbial taxonomy and strains; nevertheless, these new technologies are still prone to errors and can largely benefit from integration with well-established mini-sequencing assays. Newly proposed molecular tools should be systematically tested in collections of isolates with high indexes of diversity and guarantee interlaboratorial validation.

  15. Power Spectrum and Mutual Information Analyses of DNA Base (Nucleotide) Sequences

    NASA Astrophysics Data System (ADS)

    Isohata, Yasuhiko; Hayashi, Masaki

    2003-03-01

    On the basis of the power spectrum analyses for the base (nucleotide) sequences of various genes, we have studied long-range correlations in total base sequences which are expressed as 1/fα, behaviour of the exponent α for the accumulated base sequences as well as periodicities at short range. In particular from the analysis of content rate distributions of α we have obtained the average value \\barα=0.40± 0.01 and \\barα=0.20± 0.01 for the human genes and S. cerevisiae genes, respectively. We have also performed the analyses using the mutual information function. We show that there exists a clear difference between the content rate distributions of correlation lengths for the sample human genes and the S. cerevisiae genes. We are led to a conjecture that the elongation of the correlation length in the base sequences of genes from the early eukaryote (S. cerevisiae) to the late eukaryote (human) should be the definite reflection of the evolutionary process.

  16. Proteus mirabilis ambient-temperature fimbriae: cloning and nucleotide sequence of the aft gene cluster.

    PubMed Central

    Massad, G; Fulkerson, J F; Watson, D C; Mobley, H L

    1996-01-01

    Uropathogenic Proteus mirabilis produces at least four types of fimbriae. Amino acid sequences from two peptides, derived by tryptic digestion of the structural subunit of one type of these fimbriae, the ambient-temperature fimbriae, were determined: NVVPGQPSSTQ and LIEGENQLNYNA. PCR primers, based on these sequences and that of the N terminus, were used to amplify a 359-bp fragment. A cosmid clone, isolated from a P. mirabilis genomic library by hybridization with the 359-bp PCR product, was used to determine the nucleotide sequence of the atf gene cluster. A 3,903-bp region encodes three polypeptides: AtfA, the structural subunit; AtfB, the chaperone; and AtfC, the outer membrane molecular usher. No fimbria-related genes are evident either 5' or 3' to the three contiguous genes. AtfA demonstrates significant amino acid sequence identity with type 1 major fimbrial subunits of several enteric species. The 359-bp PCR product hybridized strongly with all Proteus isolates (n = 9) and 25% of 355 Escherichia coli isolates but failed to hybridize with any of 26 isolates among nine other uropathogenic species. Ambient-temperature fimbriae of P. mirabilis may represent a novel type of fimbriae of enteric species. PMID:8926119

  17. Proteus mirabilis ambient-temperature fimbriae: cloning and nucleotide sequence of the aft gene cluster.

    PubMed

    Massad, G; Fulkerson, J F; Watson, D C; Mobley, H L

    1996-10-01

    Uropathogenic Proteus mirabilis produces at least four types of fimbriae. Amino acid sequences from two peptides, derived by tryptic digestion of the structural subunit of one type of these fimbriae, the ambient-temperature fimbriae, were determined: NVVPGQPSSTQ and LIEGENQLNYNA. PCR primers, based on these sequences and that of the N terminus, were used to amplify a 359-bp fragment. A cosmid clone, isolated from a P. mirabilis genomic library by hybridization with the 359-bp PCR product, was used to determine the nucleotide sequence of the atf gene cluster. A 3,903-bp region encodes three polypeptides: AtfA, the structural subunit; AtfB, the chaperone; and AtfC, the outer membrane molecular usher. No fimbria-related genes are evident either 5' or 3' to the three contiguous genes. AtfA demonstrates significant amino acid sequence identity with type 1 major fimbrial subunits of several enteric species. The 359-bp PCR product hybridized strongly with all Proteus isolates (n = 9) and 25% of 355 Escherichia coli isolates but failed to hybridize with any of 26 isolates among nine other uropathogenic species. Ambient-temperature fimbriae of P. mirabilis may represent a novel type of fimbriae of enteric species.

  18. The Complete Nucleotide Sequence and Biotype Variability of Papaya leaf distortion mosaic virus.

    PubMed

    Maoka, Tetsuo; Hataya, Tatsuji

    2005-02-01

    ABSTRACT The complete nucleotide sequence of the genome of Papaya leaf distortion mosaic virus (PLDMV) was determined. The viral RNA genome of strain LDM (leaf distortion mosaic) comprised 10,153 nucleotides, excluding the poly(A) tail, and contained one long open reading frame encoding a polyprotein of 3,269 amino acids (molecular weight 373,347). The polyprotein contained nine putative proteolytic cleavage sites and some motifs conserved in other potyviral polyproteins with 44 to 50% identities, indicating that PLDMV is a distinct species in the genus Potyvirus. Like the W biotype of Papaya ringspot virus (PRSV), the non-papaya-infecting biotype of PLDMV (PLDMV-C) was found in plants of the family Cucurbitaceae. The coat protein (CP) sequence of PLDMV-C in naturally infected-Trichosanthes bracteata was compared with those of three strains of the P biotype (PLDMV-P), LDM and two additional strains M (mosaic) and YM (yellow mosaic), which are biologically different from each other. The CP sequences of three strains of PLDMV-P share high identities of 95 to 97%, while they share lower identities of 88 to 89% with that of PLDMV-C. Significant changes in hydrophobicity and a deletion of two amino acids at the N-terminal region of the CP of PLDMV-C were observed. The finding of two biotypes of PLDMV implies the possibility that the papaya-infecting biotype evolved from the cucurbitaceae-infecting potyvirus, as has been previously suggested for PRSV. In addition, a similar evolutionary event acquiring infectivity to papaya may arise frequently in viruses in the family Cucurbitaceae.

  19. The nucleotide sequences of 5S rRNAs from a fern Dryopteris acuminata and a horsetail Equisetum arvense.

    PubMed Central

    Hori, H; Osawa, S; Takaiwa, F; Sugiura, M

    1984-01-01

    The nucleotide sequences from two Pteridophyta species, a fern Dryopteris acuminata and a horsetail Equisetum arvense have been determined. These two sequences are more related to those of the Bryophyta species (88% identity on average) than to those of seed plants (84% identity on average). PMID:6538332

  20. Cloning and genomic nucleotide sequence of the matrix attachment region binding protein from the halotolerant alga Dunaliella salina.

    PubMed

    Wang, Peng-Ju; Wang, Tian-Yun; Wang, Ya-Feng; Yang, Rui; Li, Zhao-Xi

    2013-07-01

    In our previous study, the sequence of a matrix attachment region binding protein (MBP) cDNA was cloned from the unicellular green alga Dunaliella salina. However, the nucleotide sequence of this gene has not been reported so far. In this paper, the nucleotide sequence of MBP was cloned and characterized, and its gene copy number was determined. The MBP nucleotide sequence is 5641 bp long, and interrupted by 12 introns ranging from 132 to 562 bp. All the introns in the D. salina MBP gene have orthodox splice sites, exhibiting GT at the 5' end and AG at the 3' end. Southern blot analysis showed that MBP only has one copy in the D. salina genome.

  1. Complete nucleotide sequences of two isolates of cherry green ring mottle virus from peach (Prunus persica) in China.

    PubMed

    Wang, Lihui; Jiang, Dongmei; Niu, Feiqing; Lu, Meiguang; Wang, Hongqing; Li, Shifang

    2013-03-01

    Two complete nucleotide sequences of cherry green ring mottle virus (CGRMV) isolated from peach in Hebei (Hs10) and Fujian (F9) Provinces, China, were determined. Five open reading frames (ORFs) were found in the genomes of both isolates. The F9 and Hs10 isolates shared 82.2 % and 83.4-94.4 % nucleotide sequence identity, respectively, with two CGRMV isolates from cherry. Analysis of the nucleotide and amino acid sequences from the five ORFs of both isolates showed that Hs10 shares the greatest sequence identity with P1A (GenBank AJ291761) from cherry. Phylogenetic analysis indicated that CGRMV isolates from peach and cherry are closely related to members of the genus Foveavirus.

  2. The nucleotide sequence of the human int-1 mammary oncogene; evolutionary conservation of coding and non-coding sequences.

    PubMed Central

    van Ooyen, A; Kwee, V; Nusse, R

    1985-01-01

    The mouse mammary tumor virus can induce mammary tumors in mice by proviral activation of an evolutionarily conserved cellular oncogene called int-1. Here we present the nucleotide sequence of the human homologue of int-1, and compare it with the mouse gene. Like the mouse gene, the human homologue contains a reading frame of 370 amino acids, with only four substitutions. The amino acid changes are all in the hydrophobic leader domain of the int-1 encoded protein, and do not significantly alter its hydropathic index. The conservation between the mouse and the human int-1 genes is not restricted to exons; extensive parts of the introns are also homologous. Thus, int-1 ranks among the most conserved genes known, a property shared with other oncogenes. PMID:2998762

  3. Single nucleotide polymorphism analysis of Korean native chickens using next generation sequencing data.

    PubMed

    Seo, Dong-Won; Oh, Jae-Don; Jin, Shil; Song, Ki-Duk; Park, Hee-Bok; Heo, Kang-Nyeong; Shin, Younhee; Jung, Myunghee; Park, Junhyung; Jo, Cheorun; Lee, Hak-Kyo; Lee, Jun-Heon

    2015-02-01

    There are five native chicken lines in Korea, which are mainly classified by plumage colors (black, white, red, yellow, gray). These five lines are very important genetic resources in the Korean poultry industry. Based on a next generation sequencing technology, whole genome sequence and reference assemblies were performed using Gallus_gallus_4.0 (NCBI) with whole genome sequences from these lines to identify common and novel single nucleotide polymorphisms (SNPs). We obtained 36,660,731,136 ± 1,257,159,120 bp of raw sequence and average 26.6-fold of 25-29 billion reference assembly sequences representing 97.288 % coverage. Also, 4,006,068 ± 97,534 SNPs were observed from 29 autosomes and the Z chromosome and, of these, 752,309 SNPs are the common SNPs across lines. Among the identified SNPs, the number of novel- and known-location assigned SNPs was 1,047,951 ± 14,956 and 2,948,648 ± 81,414, respectively. The number of unassigned known SNPs was 1,181 ± 150 and unassigned novel SNPs was 8,238 ± 1,019. Synonymous SNPs, non-synonymous SNPs, and SNPs having character changes were 26,266 ± 1,456, 11,467 ± 604, 8,180 ± 458, respectively. Overall, 443,048 ± 26,389 SNPs in each bird were identified by comparing with dbSNP in NCBI. The presently obtained genome sequence and SNP information in Korean native chickens have wide applications for further genome studies such as genetic diversity studies to detect causative mutations for economic and disease related traits.

  4. Plastid sequence evolution: a new pattern of nucleotide substitutions in the Cucurbitaceae.

    PubMed

    Decker-Walters, Deena S; Chung, Sang-Min; Staub, Jack E

    2004-05-01

    Nucleotide substitutions (i.e., point mutations) are the primary driving force in generating DNA variation upon which selection can act. Substitutions called transitions, which entail exchanges between purines (A = adenine, G = guanine) or pyrimidines (C = cytosine, T = thymine), typically outnumber transversions (e.g., exchanges between a purine and a pyrimidine) in a DNA strand. With an increasing number of plant studies revealing a transversion rather than transition bias, we chose to perform a detailed substitution analysis for the plant family Cucurbitaceae using data from several short plastid DNA sequences. We generated a phylogenetic tree for 19 taxa of the tribe Benincaseae and related genera and then scored conservative substitution changes (e.g., those not exhibiting homoplasy or reversals) from the unambiguous branches of the tree. Neither the transition nor (A+T)/(G+C) biases found in previous studies were supported by our overall data. More importantly, we found a novel and symmetrical substitution bias in which Gs had been preferentially replaced by A, As by C, Cs by T, and Ts by G, resulting in the G-->A-->C-->T-->G substitution series. Understanding this pattern will lead to new hypotheses concerning plastid evolution, which in turn will affect the choices of substitution models and other tree-building algorithms for phylogenetic analyses based on nucleotide data.

  5. Nucleotide sequences related to the transforming gene of avian sarcoma virus are present in DNA of uninfected vertebrates.

    PubMed

    Spector, D H; Varmus, H E; Bishop, J M

    1978-09-01

    We have detected nucleotide sequences related to the transforming gene of avian sarcoma vius (ASV) in the DNA of uninfected vertebrates. Purified radioactive DNA (cDNAsarc) complementary to most of all of the gene (src) required for transformation of fibroblasts by ASV was annealed with DNA from a variety of normal species. Under conditions that facilitate pairing of partially matched nucleotide sequences (1.5 M NaCl, 59 degrees), cDNAsarc formed duplexes with chicken, human, calf, mouse, and salmon DNA but not with DNA from sea urchin, Drosophila, or Escherichia coli. The kinetics of duplex formation indicated that cDNAsarc was reacting with nucleotide sequences present in a single copy or at most a few copies per cell. In contrast to the preceding findings, nucleotide sequences complementary to the remainder of the ASV genome were observed only in chicken DNA. Thermal denaturation studies of the duplexes formed with cDNAsarc indicated a high degree of conservation of the nucleotide sequences related to src in vertebrate DNAs; the reductions in melting temperature suggested about 3--4% mismatching of cDNAsarc with chicken DNA and 8--10% mismatching of cDNAsarc with the other vertebrate DNAs.

  6. The EBI Search engine: providing search and retrieval functionality for biological data from EMBL-EBI.

    PubMed

    Squizzato, Silvano; Park, Young Mi; Buso, Nicola; Gur, Tamer; Cowley, Andrew; Li, Weizhong; Uludag, Mahmut; Pundir, Sangya; Cham, Jennifer A; McWilliam, Hamish; Lopez, Rodrigo

    2015-07-01

    The European Bioinformatics Institute (EMBL-EBI-https://www.ebi.ac.uk) provides free and unrestricted access to data across all major areas of biology and biomedicine. Searching and extracting knowledge across these domains requires a fast and scalable solution that addresses the requirements of domain experts as well as casual users. We present the EBI Search engine, referred to here as 'EBI Search', an easy-to-use fast text search and indexing system with powerful data navigation and retrieval capabilities. API integration provides access to analytical tools, allowing users to further investigate the results of their search. The interconnectivity that exists between data resources at EMBL-EBI provides easy, quick and precise navigation and a better understanding of the relationship between different data types including sequences, genes, gene products, proteins, protein domains, protein families, enzymes and macromolecular structures, together with relevant life science literature. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Developing single nucleotide polymorphism (SNP) markers from transcriptome sequences for identification of longan (Dimocarpus longan) germplasm

    PubMed Central

    Wang, Boyi; Tan, Hua-Wei; Fang, Wanping; Meinhardt, Lyndel W; Mischke, Sue; Matsumoto, Tracie; Zhang, Dapeng

    2015-01-01

    Longan (Dimocarpus longan Lour.) is an important tropical fruit tree crop. Accurate varietal identification is essential for germplasm management and breeding. Using longan transcriptome sequences from public databases, we developed single nucleotide polymorphism (SNP) markers; validated 60 SNPs in 50 longan germplasm accessions, including cultivated varieties and wild germplasm; and designated 25 SNP markers that unambiguously identified all tested longan varieties with high statistical rigor (P<0.0001). Multiple trees from the same clone were verified and off-type trees were identified. Diversity analysis revealed genetic relationships among analyzed accessions. Cultivated varieties differed significantly from wild populations (Fst=0.300; P<0.001), demonstrating untapped genetic diversity for germplasm conservation and utilization. Within cultivated varieties, apparent differences between varieties from China and those from Thailand and Hawaii indicated geographic patterns of genetic differentiation. These SNP markers provide a powerful tool to manage longan genetic resources and breeding, with accurate and efficient genotype identification. PMID:26504559

  8. Nucleotide sequence of the gene ereA encoding the erythromycin esterase in Escherichia coli.

    PubMed

    Ounissi, H; Courvalin, P

    1985-01-01

    We have cloned and determined the nucleotide sequence of the gene ereA of plasmid pIP1100 which confers high-level resistance to erythromycin (Em) in Escherichia coli. The gene was defined by initiation and termination codons and by in vitro insertion-inactivation into an open reading frame (ORF) of 1032 bp corresponding to a product with an Mr of 37 765. However, the enzyme, an Em esterase, displayed an apparent Mr of 43 000 upon electrophoresis of a minicell extract on the SDS-polyacrylamide gels. The G + C content (50.5%) of the gene ereA and the preferential codon usage in its ORF suggest that this resistance determinant should be indigenous to E. coli.

  9. Isolation and nucleotide sequence of the methanol dehydrogenase structural gene from Paracoccus denitrificans.

    PubMed Central

    Harms, N; de Vries, G E; Maurer, K; Hoogendijk, J; Stouthamer, A H

    1987-01-01

    A genomic clone bank of Paracoccus denitrificans DNA has been constructed in the expression vector set pEX1, pEX2, and pEX3. Screening of this clone bank with antibodies raised against P. denitrificans methanol dehydrogenase resulted in the isolation of a clone, pNH3, that synthesized methanol dehydrogenase cross-reactive proteins. The nucleotide sequence of the P. denitrificans DNA fragment inserted in this clone has been determined and shown to contain the full methanol dehydrogenase structural gene. DNA cross-hybridization was found with DNA fragments which have been reported to contain the methanol dehydrogenase structural genes from Methylobacterium sp. strain AM1 and Methylobacterium organophilum. Images PMID:3114231

  10. Cloning, overexpression and nucleotide sequence of a thermostable DNA ligase-encoding gene.

    PubMed

    Barany, F; Gelfand, D H

    1991-12-20

    Thermostable DNA ligase has been harnessed for the detection of single-base genetic diseases using the ligase chain reaction [Barany, Proc. Natl. Acad. Sci. USA 88 (1991) 189-193]. The Thermus thermophilus (Tth) DNA ligase-encoding gene (ligT) was cloned in Escherichia coli by genetic complementation of a ligts 7 defect in an E. coli host. Nucleotide sequence analysis of the gene revealed a single chain of 676 amino acid residues with 47% identity to the E. coli ligase. Under phoA promoter control, Tth ligase was overproduced to greater than 10% of E. coli cellular proteins. Adenylated and deadenylated forms of the purified enzyme were distinguished by apparent molecular weights of 81 kDa and 78 kDa, respectively, after separation via sodium dodecyl sulfate-polyacrylamide-gel electrophoresis.

  11. Nucleotide sequence of the genetic loci encoding subunits of Bradyrhizobium japonicum uptake hydrogenase.

    PubMed Central

    Sayavedra-Soto, L A; Powell, G K; Evans, H J; Morris, R O

    1988-01-01

    An indispensable part of the hydrogen-recycling system in Bradyrhizobium japonicum is the uptake hydrogenase, which is composed of 34.5- and 65.9-kDa subunits. The gene encoding the large subunit is located on a 5.9-kilobase fragment of the H2-uptake-complementing cosmid pHU52 [Zuber, M., Harker, A.R., Sultana, M.A. & Evans, H.J. (1986) Proc. Natl. Acad. Sci. USA 83, 7668-7672]. We have now determined that the structural genes for both subunits are present on this fragment. Two open reading frames are present that correspond in size and deduced amino acid sequence to the hydrogenase subunits, except that the small-subunit coding region contains a leader peptide of 46 amino acids. The two genes are separated by a 32-nucleotide intergenic region and likely constitute an operon. Comparison of the deduced amino acid sequences of the B. japonicum genes with those from Desulfovibrio gigas, Desulfovibrio baculatus, and Rhodobacter capsulatus indicates significant sequence identity. Images PMID:3054886

  12. Nucleotide sequence and the encoded amino acids of human apolipoprotein A-I mRNA.

    PubMed Central

    Law, S W; Brewer, H B

    1984-01-01

    The cDNA clones encoding the precursor form of human liver apolipoprotein A-I (apoA-I), preproapoA-I, have been isolated from a cDNA library. A 17-base synthetic oligonucleotide based on residues 108-113 of apoA-I and a 26-base primer-extended, dideoxynucleotide-terminated cDNA were used as hybridization probes to select for recombinant plasmids bearing the apoA-I sequence. The complete nucleic acid sequence of human liver preproapoA-I has been determined by analysis of the cloned cDNA. The sequence is composed of 801 nucleotides encoding 267 amino acid residues. PreproapoA-I contains an 18-amino-acid prepeptide and a 6-amino-acid propeptide connected to the amino terminus of the 243-amino acid mature apoA-I. Southern blotting analysis of chromosomal DNA obtained from peripheral blood indicated the apoA-I gene is contained in a 2.1-kilobase-pair Pst I fragment and there is no gross difference in structural organization between the normal apoA-I gene and the Tangier disease apoA-I gene. Images PMID:6198645

  13. Cloning and nucleotide sequence of a specific DNA fragment from Paracoccidioides brasiliensis.

    PubMed Central

    Goldani, L Z; Maia, A L; Sugar, A M

    1995-01-01

    We cloned and sequenced a species-specific 110-bp DNA fragment from Paracoccidioides brasiliensis. The DNA fragment was generated by PCR with primers complementary to the rat beta-actin gene under a low annealing temperature. Comparison of the nucleotide sequence, after excluding the primers, with those in the GenBank database identified approximately 60% homology with an exon of a major surface glycoprotein gene from Pneumocystis carinii and a fragment of unknown function in Saccharomyces cerevisiae chromosome VIII. By Southern hybridization analysis, the 32P-labelled fragment detected 1.0- and 1.9-kb restriction fragments within whole-cell genomic DNA of P. brasiliensis digested with HindIII and PstI, respectively, but failed to hybridize to genomic DNAs from Candida albicans, Blastomyces dermatitidis, Cryptococcus neoformans, Aspergillus fumigatus, Saccharomyces cerevisiae, Pneumocystis carinii, rat tissue, or humans under low-stringency hybridization conditions. Additionally, the specific DNA fragment from three different P. brasiliensis isolates (Pb18, RP18, RP17) was amplified by PCR with primers mostly complementary to nonactin sequences of the 110-bp DNA fragment. In contrast, there were no amplified products from other fungus genomic DNAs previously tested, including Histoplasma capsulatum. To date, this is the first species-specific DNA fragment cloned from P. brasiliensis which might be useful as a diagnostic marker for the identification and classification of different P. brasiliensis isolates. PMID:7650207

  14. Complete nucleotide sequence of the mitochondrial genome of a salamander, Mertensiella luschani.

    PubMed

    Zardoya, Rafael; Malaga-Trillo, Edward; Veith, Michael; Meyer, Axel

    2003-10-23

    The complete nucleotide sequence (16,650 bp) of the mitochondrial genome of the salamander Mertensiella luschani (Caudata, Amphibia) was determined. This molecule conforms to the consensus vertebrate mitochondrial gene order. However, it is characterized by a long non-coding intervening sequence with two 124-bp repeats between the tRNA(Thr) and tRNA(Pro) genes. The new sequence data were used to reconstruct a phylogeny of jawed vertebrates. Phylogenetic analyses of all mitochondrial protein-coding genes at the amino acid level recovered a robust vertebrate tree in which lungfishes are the closest living relatives of tetrapods, salamanders and frogs are grouped together to the exclusion of caecilians (the Batrachia hypothesis) in a monophyletic amphibian clade, turtles show diapsid affinities and are placed as sister group of crocodiles+birds, and the marsupials are grouped together with monotremes and basal to placental mammals. The deduced phylogeny was used to characterize the molecular evolution of vertebrate mitochondrial proteins. Amino acid frequencies were analyzed across the main lineages of jawed vertebrates, and leucine and cysteine were found to be the most and least abundant amino acids in mitochondrial proteins, respectively. Patterns of amino acid replacements were conserved among vertebrates. Overall, cartilaginous fishes showed the least variation in amino acid frequencies and replacements. Constancy of rates of evolution among the main lineages of jawed vertebrates was rejected.

  15. Modulation of base excision repair of 8-oxoguanine by the nucleotide sequence.

    PubMed

    Allgayer, Julia; Kitsera, Nataliya; von der Lippen, Carina; Epe, Bernd; Khobta, Andriy

    2013-10-01

    8-Oxoguanine (8-oxoG) is a major product of oxidative DNA damage, which induces replication errors and interferes with transcription. By varying the position of single 8-oxoG in a functional gene and manipulating the nucleotide sequence surrounding the lesion, we found that the degree of transcriptional inhibition is independent of the distance from the transcription start or the localization within the transcribed or the non-transcribed DNA strand. However, it is strongly dependent on the sequence context and also proportional to cellular expression of 8-oxoguanine DNA glycosylase (OGG1)-demonstrating that transcriptional arrest does not take place at unrepaired 8-oxoG and proving a causal connection between 8-oxoG excision and the inhibition of transcription. We identified the 5'-CAGGGC[8-oxoG]GACTG-3' motif as having only minimal transcription-inhibitory potential in cells, based on which we predicted that 8-oxoG excision is particularly inefficient in this sequence context. This anticipation was fully confirmed by direct biochemical assays. Furthermore, in DNA containing a bistranded Cp[8-oxoG]/Cp[8-oxoG] clustered lesion, the excision rates differed between the two strands at least by a factor of 9, clearly demonstrating that the excision preference is defined by the DNA strand asymmetry rather than the overall geometry of the double helix or local duplex stability.

  16. Mining for single nucleotide polymorphisms and insertions / deletions in expressed sequence tag libraries of oil palm.

    PubMed

    Riju, Aykkal; Chandrasekar, Arumugam; Arunachalam, Vadivel

    2007-01-01

    The oil palm is a tropical oil bearing tree. Recently EST-derived SNPs and SSRs are a free by-product of the currently expanding EST (Expressed Sequence Tag) data bases. The development of high-throughput methods for the detection of SNPs (Single Nucleotide Polymorphism) and small indels (insertion / deletion) has led to a revolution in their use as molecular markers. Available (5452) Oil palm EST sequences were mined from dbEST of NCBI. CAP3 program was used to assemble EST sequences into contigs. Candidate SNPs and Indel polymorphisms were detected using the perl script auto_snip version 1.0 which has used 576 ESTs for detecting SNPs and Indel sites. We found 1180 SNP sites and 137 indel polymorphisms with frequency 1.36 SNPs / 100 bp. Among the six tissues from which the EST libraries had been generated, mesocarp had high frequency of 2.91 SNPs and indels per 100 bp whereas the zygotic embryos had lowest frequency of 0.15 per 100 bp. We also used the Shannon index to analyze the proportion of ten possible types of SNP/indels. ESTs from tissues of normal apex showed highest values of Shannon index (0.60) whereas abnormal apex had least value (0.02). The present report deals the use of Shannon index for comparing SNP/ indel frequencies mined from ESTlibraries and also confirm that the frequency of SNP occurrence in oil palm to use them as markers for genetic studies.

  17. Complete Nucleotide Sequence of Watermelon Chlorotic Stunt Virus Originating from Oman

    PubMed Central

    Khan, Akhtar J.; Akhtar, Sohail; Briddon, Rob W.; Ammara, Um; Al-Matrooshi, Abdulrahman M.; Mansoor, Shahid

    2012-01-01

    Watermelon chlorotic stunt virus (WmCSV) is a bipartite begomovirus (genus Begomovirus, family Geminiviridae) that causes economic losses to cucurbits, particularly watermelon, across the Middle East and North Africa. Recently squash (Cucurbita moschata) grown in an experimental field in Oman was found to display symptoms such as leaf curling, yellowing and stunting, typical of a begomovirus infection. Sequence analysis of the virus isolated from squash showed 97.6–99.9% nucleotide sequence identity to previously described WmCSV isolates for the DNA A component and 93–98% identity for the DNA B component. Agrobacterium-mediated inoculation to Nicotiana benthamiana resulted in the development of symptoms fifteen days post inoculation. This is the first bipartite begomovirus identified in Oman. Overall the Oman isolate showed the highest levels of sequence identity to a WmCSV isolate originating from Iran, which was confirmed by phylogenetic analysis. This suggests that WmCSV present in Oman has been introduced from Iran. The significance of this finding is discussed. PMID:22852046

  18. The complete nucleotide sequence and genomic characterization of grapevine asteroid mosaic associated virus.

    PubMed

    Vargas-Asencio, José; Wojciechowska, Klaudia; Baskerville, Maia; Gomez, Annika L; Perry, Keith L; Thompson, Jeremy R

    2017-01-02

    In analyzing grapevine clones infected with grapevine red blotch associated virus, we identified a small number of isometric particles of approximately 30nm in diameter from an enriched fraction of leaf extract. A dominant protein of 25kDa was isolated from this fraction using SDS-PAGE and was identified by mass spectrometry as belonging to grapevine asteroid mosaic associated virus (GAMaV). Using a combination of three methods RNA-Seq, sRNA-Seq, and Sanger sequencing of RT- and RACE-PCR products, we obtained a full-length genome sequence consisting of 6719 nucleotides without the poly(A) tail. The virus possesses all of the typical conserved functional domains concordant with the genus Marafivirus and lies evolutionarily between citrus sudden death associated virus and oat blue dwarf virus. A large shift in RNA-Seq coverage coincided with the predicted location of the subgenomic RNA involved in coat protein (CP) expression. Genus wide sequence alignments confirmed the cleavage motif LxG(G/A) to be dominant between the helicase and RNA dependent RNA polymerase (RdRp), and the RdRp and CP domains. A putative overlapping protein (OP) ORF lacking a canonical translational start codon was identified with a reading frame context more consistent with the putative OPs of tymoviruses and fig fleck associated virus than with those of marafiviruses. BLAST analysis of the predicted GAMaV OP showed a unique relatedness to the OPs of members of the genus Tymovirus.

  19. Cloning and nucleotide sequence of a specific DNA fragment from Paracoccidioides brasiliensis.

    PubMed

    Goldani, L Z; Maia, A L; Sugar, A M

    1995-06-01

    We cloned and sequenced a species-specific 110-bp DNA fragment from Paracoccidioides brasiliensis. The DNA fragment was generated by PCR with primers complementary to the rat beta-actin gene under a low annealing temperature. Comparison of the nucleotide sequence, after excluding the primers, with those in the GenBank database identified approximately 60% homology with an exon of a major surface glycoprotein gene from Pneumocystis carinii and a fragment of unknown function in Saccharomyces cerevisiae chromosome VIII. By Southern hybridization analysis, the 32P-labelled fragment detected 1.0- and 1.9-kb restriction fragments within whole-cell genomic DNA of P. brasiliensis digested with HindIII and PstI, respectively, but failed to hybridize to genomic DNAs from Candida albicans, Blastomyces dermatitidis, Cryptococcus neoformans, Aspergillus fumigatus, Saccharomyces cerevisiae, Pneumocystis carinii, rat tissue, or humans under low-stringency hybridization conditions. Additionally, the specific DNA fragment from three different P. brasiliensis isolates (Pb18, RP18, RP17) was amplified by PCR with primers mostly complementary to nonactin sequences of the 110-bp DNA fragment. In contrast, there were no amplified products from other fungus genomic DNAs previously tested, including Histoplasma capsulatum. To date, this is the first species-specific DNA fragment cloned from P. brasiliensis which might be useful as a diagnostic marker for the identification and classification of different P. brasiliensis isolates.

  20. High-throughput nucleotide sequence analysis of diverse bacterial communities in leachates of decomposing pig carcasses

    PubMed Central

    Yang, Seung Hak; Lim, Joung Soo; Khan, Modabber Ahmed; Kim, Bong Soo; Choi, Dong Yoon; Lee, Eun Young; Ahn, Hee Kwon

    2015-01-01

    The leachate generated by the decomposition of animal carcass has been implicated as an environmental contaminant surrounding the burial site. High-throughput nucleotide sequencing was conducted to investigate the bacterial communities in leachates from the decomposition of pig carcasses. We acquired 51,230 reads from six different samples (1, 2, 3, 4, 6 and 14 week-old carcasses) and found that sequences representing the phylum Firmicutes predominated. The diversity of bacterial 16S rRNA gene sequences in the leachate was the highest at 6 weeks, in contrast to those at 2 and 14 weeks. The relative abundance of Firmicutes was reduced, while the proportion of Bacteroidetes and Proteobacteria increased from 3–6 weeks. The representation of phyla was restored after 14 weeks. However, the community structures between the samples taken at 1–2 and 14 weeks differed at the bacterial classification level. The trend in pH was similar to the changes seen in bacterial communities, indicating that the pH of the leachate could be related to the shift in the microbial community. The results indicate that the composition of bacterial communities in leachates of decomposing pig carcasses shifted continuously during the study period and might be influenced by the burial site. PMID:26500442

  1. The complete nucleotide sequence and genome organization of tomato chlorosis virus.

    PubMed

    Wintermantel, W M; Wisler, G C; Anchieta, A G; Liu, H-Y; Karasev, A V; Tzanetakis, I E

    2005-11-01

    The crinivirus tomato chlorosis virus (ToCV) was discovered initially in diseased tomato and has since been identified as a serious problem for tomato production in many parts of the world, particularly in the United States, Europe and Southeast Asia. The complete nucleotide sequence of ToCV was determined and compared with related crinivirus species. RNA 1 is organized into four open reading frames (ORFs), and encodes proteins involved in replication, based on homology to other viral replication factors. RNA 2 is composed of nine ORFs including genes that encode a HSP70 homolog and two proteins involved in encapsidation of viral RNA, referred to as the coat protein and minor coat protein. Sequence homology between ToCV and other criniviruses varies throughout the viral genome. The minor coat protein (CPm) of ToCV, which forms part of the "rattlesnake tail" of virions and may be involved in determining the unique, broad vector transmissibility of ToCV, is larger than the CPm of lettuce infectious yellows virus (LIYV) by 217 amino acids. Among sequenced criniviruses, considerable variability exists in the size of some viral proteins. Analysis of these differences with respect to biological function may provide insight into the role crinivirus proteins play in virus infection and transmission.

  2. Investigation of single nucleotide polymorphisms based on the intronic sequences of the propylene alcohol dehydrogenase gene in Chinese tobacco genotypes

    PubMed Central

    Wei, Ji-Cheng; Qiu, En-Jian; Guo, Hui-Yan; Hao, Ai-Ping; Chen, Rong-Ping

    2014-01-01

    A pair of primers was designed to amplify the propylene alcohol dehydrogenase gene sequence based on the cDNA sequence of the tobacco allyl-alcohol dehydrogenase gene. All introns were sequenced using traditional polymerase chain reaction (PCR) methods and T-A cloning. The sequences from common tobacco (Nicotiana tabaccum L.) and rustica tobacco (Nicotiana rustica L.) were analysed between the third intron and the fourth intron of the propylene alcohol dehydrogenase gene. The results showed that the alcohol dehydrogenase gene is a low-copy nuclear gene. The intron sequences have a combination of single nucleotide polymorphisms and length polymorphisms between common tobacco and rustica tobacco, which are suitable to identify the different germplasms. Furthermore, there are some single nucleotide polymorphism sites in the target sequence within common tobacco that can be used to distinguish intraspecific varieties. PMID:26740754

  3. Rat mammary-gland transferrin: nucleotide sequence, phylogenetic analysis and glycan structure.

    PubMed Central

    Escrivá, H; Pierce, A; Coddeville, B; González, F; Benaissa, M; Léger, D; Wieruszeski, J M; Spik, G; Pamblanco, M

    1995-01-01

    The complete cDNA for rat mammary-gland transferrin (Tf) has been sequenced and also the native protein isolated from milk in order to analyse the structure of the main glycan variants present. A lactating-rat mammary-gland cDNA library in lambda gt10 was screened with a partial cDNA copy of rat liver Tf and subsequently rescreened with 5' fragments of the longest clones. This produced a 2275 bp insert coding for an open reading frame of 695 amino acid residues. This includes a 19-amino acid signal sequence and the mature protein containing 676 amino acids and one N-glycosylation site in the C-terminal domain at residue 490. Phylogenetic analysis was carried out using 14 translated Tf nucleotide sequences, and the derived evolutionary tree shows that at least three gene duplication events have occurred during Tf evolution, one of which generated the N- and C-terminal domains and occurred before separation of arthropods and chordates. The two halves of human melanotransferrin are more similar to each other than to any other sequence, which contrasts with the pattern shown by the remaining sequences. Native rat milk Tf is separated into four bands on native PAGE that differ only in their sialic acid content: one biantennary glycan is present containing either no sialic acid residues or up to three. The complete structures of the two major variants were determined by methylation, m.s. and 400 MHz 1H-n.m.r. spectroscopy. They contain either one or two neuraminic acid residues (alpha 2-->6)-linked to galactose in conventional biantennary N-acetyl-lactosamine-type glycans. Most contain fucose (alpha 1-->6)-linked to the terminal non-reducing N-acetylglucosamine. Images Figure 4 PMID:7717992

  4. Human ribosomal RNA gene: nucleotide sequence of the transcription initiation region and comparison of three mammalian genes.

    PubMed Central

    Financsek, I; Mizumoto, K; Mishima, Y; Muramatsu, M

    1982-01-01

    The transcription initiation site of the human ribosomal RNA gene (rDNA) was located by using the single-strand specific nuclease protection method and by determining the first nucleotide of the in vitro capped 45S preribosomal RNA. The sequence of 1,211 nucleotides surrounding the initiation site was determined. The sequenced region was found to consist of 75% G and C and to contain a number of short direct and inverted repeats and palindromes. By comparison of the corresponding initiation regions of three mammalian species, several conserved sequences were found upstream and downstream from the transcription starting point. Two short A + T-rich sequences are present on human, mouse, and rat ribosomal RNA genes between the initiation site and 40 nucleotides upstream, and a C + T cluster is located at a position around -60. At and downstream from the initiation site, a common sequence, T-AG-C-T-G-A-C-A-C-G-C-T-G-T-C-C-T-CT-T, was found in the three genes from position -1 through +18. The strong conservation of these sequences suggests their functional significance in rDNA. The S1 nuclease protection experiments with cloned rDNA fragments indicated the presence in human 45S RNA of molecules several hundred nucleotides shorter than the supposed primary transcript. The first 19 nucleotides of these molecules appear identical--except for one mismatch--to the nucleotide sequence of the 5' end of a supposed early processing product of the mouse 45S RNA. Images PMID:6954460

  5. Evaluation of Ancestral Sequence Reconstruction Methods to Infer Nonstationary Patterns of Nucleotide Substitution.

    PubMed

    Matsumoto, Tomotaka; Akashi, Hiroshi; Yang, Ziheng

    2015-07-01

    Inference of gene sequences in ancestral species has been widely used to test hypotheses concerning the process of molecular sequence evolution. However, the approach may produce spurious results, mainly because using the single best reconstruction while ignoring the suboptimal ones creates systematic biases. Here we implement methods to correct for such biases and use computer simulation to evaluate their performance when the substitution process is nonstationary. The methods we evaluated include parsimony and likelihood using the single best reconstruction (SBR), averaging over reconstructions weighted by the posterior probabilities (AWP), and a new method called expected Markov counting (EMC) that produces maximum-likelihood estimates of substitution counts for any branch under a nonstationary Markov model. We simulated base composition evolution on a phylogeny for six species, with different selective pressures on G+C content among lineages, and compared the counts of nucleotide substitutions recorded during simulation with the inference by different methods. We found that large systematic biases resulted from (i) the use of parsimony or likelihood with SBR, (ii) the use of a stationary model when the substitution process is nonstationary, and (iii) the use of the Hasegawa-Kishino-Yano (HKY) model, which is too simple to adequately describe the substitution process. The nonstationary general time reversible (GTR) model, used with AWP or EMC, accurately recovered the substitution counts, even in cases of complex parameter fluctuations. We discuss model complexity and the compromise between bias and variance and suggest that the new methods may be useful for studying complex patterns of nucleotide substitution in large genomic data sets.

  6. The nucleotide sequence of cysteine transfer ribonucleic acid from baker's yeast. Identification of the products from partial degradation of the molecule and derivation of the complete sequence.

    PubMed Central

    Holness, N J; Atfield, G

    1976-01-01

    1. A series of large oligonucleotide fragments derived from tRNA Cys, were separated chromatographically and the sequence of each was deduced by examination of the products of digestion with pancreatic and T1 ribonucleases. 2. The location of the specific cleavage points in the nucleotide chain was similar to that produced by brief treatment with pancreatic ribonuclease. 3. The fragments could be arranged into two alternative sequences. The correct sequence was deduced by the sequential removal and identification of the first nine nucleotides from the 3'-end of the terminal half of the molecules. PMID:819006

  7. Nucleotide sequence of the capsid protein gene and 3' non-coding region of papaya mosaic virus RNA.

    PubMed

    Abouhaidar, M G

    1988-01-01

    The nucleotide sequences of cDNA clones corresponding to the 3' OH end of papaya mosaic virus RNA have been determined. The 3'-terminal sequence obtained was 900 nucleotides in length, excluding the poly(A) tail, and contained an open reading frame capable of giving rise to a protein of 214 amino acid residues with an Mr of 22930. This protein was identified as the viral capsid protein. The 3' non-coding region of PMV genome RNA was about 121 nucleotides long [excluding the poly(A) tail] and homologous to the complementary sequence of the non-coding region at the 5' end of PMV RNA. A long open reading frame was also found in the predicted 5' end region of the negative strand.

  8. Complete nucleotide sequence and genome organization of peach virus D, a putative new member of the genus Marafivirus.

    PubMed

    Igori, Davaajargal; Lim, Seungmo; Baek, Dasom; Kim, San Yeong; Seo, Euncheol; Cho, In-Sook; Choi, Gug-Seoun; Lim, Hyoun-Sub; Moon, Jae Sun

    2017-06-01

    The complete nucleotide sequence of peach virus D (PeVD) from Prunus persica was determined. The PeVD genome consists of 6,612 nucleotides excluding the 3' poly(A) tail and contains a single open reading frame coding for a polyprotein of 227 kDa. Sequence comparisons and phylogenetic analysis revealed that PeVD is most closely related to viruses in the genus Marafivirus, family Tymoviridae. The complete nucleotide and CP amino acid sequences of PeVD were most similar (51.1-57.8% and 32.2-48.0%, respectively) to members of the genus Marafivirus, suggesting that PeVD is a new member of this genus.

  9. Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.

  10. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array

    PubMed Central

    Fuller, Carl W.; Kumar, Shiv; Porel, Mintu; Chien, Minchen; Bibillo, Arek; Stranges, P. Benjamin; Dorwart, Michael; Tao, Chuanjuan; Li, Zengmin; Guo, Wenjing; Shi, Shundi; Korenblum, Daniel; Trans, Andrew; Aguirre, Anne; Liu, Edward; Harada, Eric T.; Pollard, James; Bhat, Ashwini; Cech, Cynthia; Yang, Alexander; Arnold, Cleoma; Palla, Mirkó; Hovis, Jennifer; Chen, Roger; Morozova, Irina; Kalachikov, Sergey; Russo, James J.; Kasianowicz, John J.; Davis, Randy; Roever, Stefan; Church, George M.; Ju, Jingyue

    2016-01-01

    DNA sequencing by synthesis (SBS) offers a robust platform to decipher nucleic acid sequences. Recently, we reported a single-molecule nanopore-based SBS strategy that accurately distinguishes four bases by electronically detecting and differentiating four different polymer tags attached to the 5′-phosphate of the nucleotides during their incorporation into a growing DNA strand catalyzed by DNA polymerase. Further developing this approach, we report here the use of nucleotides tagged at the terminal phosphate with oligonucleotide-based polymers to perform nanopore SBS on an α-hemolysin nanopore array platform. We designed and synthesized several polymer-tagged nucleotides using tags that produce different electrical current blockade levels and verified they are active substrates for DNA polymerase. A highly processive DNA polymerase was conjugated to the nanopore, and the conjugates were complexed with primer/template DNA and inserted into lipid bilayers over individually addressable electrodes of the nanopore chip. When an incoming complementary-tagged nucleotide forms a tight ternary complex with the primer/template and polymerase, the tag enters the pore, and the current blockade level is measured. The levels displayed by the four nucleotides tagged with four different polymers captured in the nanopore in such ternary complexes were clearly distinguishable and sequence-specific, enabling continuous sequence determination during the polymerase reaction. Thus, real-time single-molecule electronic DNA sequencing data with single-base resolution were obtained. The use of these polymer-tagged nucleotides, combined with polymerase tethering to nanopores and multiplexed nanopore sensors, should lead to new high-throughput sequencing methods. PMID:27091962

  11. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array.

    PubMed

    Fuller, Carl W; Kumar, Shiv; Porel, Mintu; Chien, Minchen; Bibillo, Arek; Stranges, P Benjamin; Dorwart, Michael; Tao, Chuanjuan; Li, Zengmin; Guo, Wenjing; Shi, Shundi; Korenblum, Daniel; Trans, Andrew; Aguirre, Anne; Liu, Edward; Harada, Eric T; Pollard, James; Bhat, Ashwini; Cech, Cynthia; Yang, Alexander; Arnold, Cleoma; Palla, Mirkó; Hovis, Jennifer; Chen, Roger; Morozova, Irina; Kalachikov, Sergey; Russo, James J; Kasianowicz, John J; Davis, Randy; Roever, Stefan; Church, George M; Ju, Jingyue

    2016-05-10

    DNA sequencing by synthesis (SBS) offers a robust platform to decipher nucleic acid sequences. Recently, we reported a single-molecule nanopore-based SBS strategy that accurately distinguishes four bases by electronically detecting and differentiating four different polymer tags attached to the 5'-phosphate of the nucleotides during their incorporation into a growing DNA strand catalyzed by DNA polymerase. Further developing this approach, we report here the use of nucleotides tagged at the terminal phosphate with oligonucleotide-based polymers to perform nanopore SBS on an α-hemolysin nanopore array platform. We designed and synthesized several polymer-tagged nucleotides using tags that produce different electrical current blockade levels and verified they are active substrates for DNA polymerase. A highly processive DNA polymerase was conjugated to the nanopore, and the conjugates were complexed with primer/template DNA and inserted into lipid bilayers over individually addressable electrodes of the nanopore chip. When an incoming complementary-tagged nucleotide forms a tight ternary complex with the primer/template and polymerase, the tag enters the pore, and the current blockade level is measured. The levels displayed by the four nucleotides tagged with four different polymers captured in the nanopore in such ternary complexes were clearly distinguishable and sequence-specific, enabling continuous sequence determination during the polymerase reaction. Thus, real-time single-molecule electronic DNA sequencing data with single-base resolution were obtained. The use of these polymer-tagged nucleotides, combined with polymerase tethering to nanopores and multiplexed nanopore sensors, should lead to new high-throughput sequencing methods.

  12. Nucleotide sequence analysis and DNA hybridization studies of the ant(4')-IIa gene from Pseudomonas aeruginosa.

    PubMed Central

    Shaw, K J; Munayyer, H; Rather, P N; Hare, R S; Miller, G H

    1993-01-01

    The ant(4')-IIa gene was previously cloned from Pseudomonas aeruginosa on a 1.6-kb DNA fragment (G. A. Jacoby, M. J. Blaser, P. Santanam, H. Hächler, F. H. Kayser, R. S. Hare, and G. H. Miller, Antimicrob. Agents Chemother. 34:2381-2386, 1990). In the current study, the ant(4')-IIa gene was localized by gamma-delta mutagenesis. A region of approximately 600 nucleotides which contained the ant(4')-IIa gene was identified, and DNA sequence analysis revealed two overlapping open reading frames (ORFs) within this region. Northern (RNA) blot analysis demonstrated expression of both ORFs in P. aeruginosa; therefore, site-directed mutagenesis was used to identify the ORF which encodes the ant(4')-IIa gene. No homology was found between ant(4')-IIa and ant(4')-Ia DNA sequences. Hybridization experiments confirmed that the ant(4')-Ia probe hybridized only to gram-positive presumptive ANT(4')-I strains and that the ant(4')-IIa probe hybridized only to gram-negative strains presumed to carry ANT(4')-II. Seven gram-negative strains which had been classified as having ANT(4')-II resistance profiles did not hybridize with probes for either ant(4')-Ia or ant(4')-IIa, suggesting that at least one additional ant(4') gene may exist. The predicted amino-terminal sequences of the ANT(4')-Ia and ANT(4')-IIa proteins showed significant sequence similarity between residues 38 and 63 of the ANT(4')-Ia protein and residues 26 and 51 of the ANT(4')-IIa protein. PMID:8494365

  13. Human secreted carbonic anhydrase: cDNA cloning, nucleotide sequence, and hybridization histochemistry

    SciTech Connect

    Aldred, P.; Fu, Ping; Barrett, G.; Penschow, J.D.; Wright, R.D.; Coghlan, J.P.; Fernley, R.T. )

    1991-01-01

    Complementary DNA clones coding for the human secreted carbonic anhydrase isozyme (CAVI) have been isolated and their nucleotide sequences determined. These clones identify a 1.45-kb mRNA that is present in high levels in parotid submandibular salivary glands but absent in other tissues such as the sublingual gland, kidney, liver, and prostate gland. Hybridization histochemistry of human salivary glands shows mRNA for CA VI located in the acinar cells of these glands. The cDNA clones encode a protein of 308 amino acids that includes a 17 amino acid leader sequence typical of secreted proteins. The mature protein has 291 amino acids compared to 259 or 260 for the cytoplasmic isozymes, with most of the extra amino acids present as a carboxyl terminal extension. In comparison, sheep CA VI has a 45 amino acid extension. Overall the human CA VI protein has a sequence identity of 35 {percent} with human CA II, while residues involved in the active site of the enzymes have been conserved. The human and sheep secreted carbonic anhydrases have a sequence identity of 72 {percent}. This includes the two cysteine residues that are known to be involved in an intramolecular disulfide bond in the sheep CA VI. The enzyme is known to be glycosylated and three potential N-glycosylation sites (Asn-X-Thr/Ser) have been identified. Two of these are known to be glycosylated in sheep CA VI. Southern analysis of human DNA indicates that there is only one gene coding for CA VI.

  14. Analysis of Complete Nucleotide Sequences of 12 Gossypium Chloroplast Genomes: Origin and Evolution of Allotetraploids

    PubMed Central

    Xu, Qin; Xiong, Guanjun; Li, Pengbo; He, Fei; Huang, Yi; Wang, Kunbo; Li, Zhaohu; Hua, Jinping

    2012-01-01

    Gossypium is the maternal source of extant allotetraploid species and allotetraploids have a monophyletic origin. G. hirsutum AD1 lineages have experienced more sequence variations than other allotetraploids in intergenic regions. The available complete nucleotide sequences of 12 Gossypium chloroplast genomes should facilitate studies to uncover the molecular mechanisms of compartmental co-evolution and speciation of Gossypium allotetraploids. PMID:22876273

  15. Nucleotide sequence and transcriptional analysis of the type A2 neurotoxin gene cluster in Clostridium botulinum.

    PubMed

    Dineen, Sean S; Bradshaw, Marite; Karasek, Charles E; Johnson, Eric A

    2004-06-01

    The nucleotide sequences of the upstream regions of the botulinum neurotoxin type A1 (BoNT/A1) cluster of Clostridium botulinum strain NCTC 2916 and the BoNT/A2 cluster of strain Kyoto-F were determined. A novel gene, designated orfx3, was identified following the orfx2 gene in both clusters. ORF-X2 and ORF-X3 exhibit similarity to the BoNT cluster associated P-47 protein. The BoNT/A1 and BoNT/A2 clusters share a similar gene arrangement, but exhibit differences in the spacing between certain genes. Sequences with similarity to transposases were identified in these intergenic regions, suggesting that these differences arose from an ancestral insertion event. Transcriptional analysis of the BoNT/A2 cluster revealed that the genes of the cluster are primarily synthesized as three polycistronic transcripts. Two divergent polycistronic transcripts, one encoding the orfx1, orfx2, and orfx3 genes, the second encoding the p47, ntnh, and bont/a2 genes, are transcribed from conserved BoNT cluster promoters. The third polycistronic transcript, expressed at low levels, encodes the positive regulatory botR gene and the orfx genes. This is the first complete analysis of a botulinum toxin A2 cluster.

  16. Nucleotide sequence and structural organization of the human vasopressin pituitary receptor (V3) gene.

    PubMed

    René, P; Lenne, F; Ventura, M A; Bertagna, X; de Keyzer, Y

    2000-01-04

    In the pituitary, vasopressin triggers ACTH release through a specific receptor subtype, termed V3 or V1b. We cloned the V3 cDNA and showed that its expression was almost exclusive to pituitary corticotrophs and some corticotroph tumors. To study the determinants of this tissue specificity, we have now cloned the gene for the human (h) V3 receptor and characterized its structure. It is composed of two exons, spanning 10kb, with the coding region interrupted between transmembrane domains 6 and 7. We established that the transcription initiation site is located 498 nucleotides upstream of the initiator codon and showed that two polyadenylation sites may be used, while the most frequent is the most downstream. Sequence analysis of the promoter region showed no TATA box but identified consensus binding motifs for Sp1, CREB, and half sites of the estrogen receptor binding site. However comparison with another corticotroph-specific gene, proopiomelanocortin, did not identify common regulatory elements in the two promoters except for a short GC-rich region. Unexpectedly, hV3 gene analysis revealed that a formerly cloned 'artifactual' hV3 cDNA indeed corresponded to a spliced antisense transcript, overlapping the 5' part of the coding sequence in exon 1 and the promoter region. This transcript, hV3rev, was detected in normal pituitary and in many corticotroph tumors expressing hV3 sense mRNA and may therefore play a role in hV3 gene expression.

  17. Nucleotide sequence and phylogenetic analysis of a new potexvirus: Malva mosaic virus.

    PubMed

    Côté, Fabien; Paré, Christine; Majeau, Nathalie; Bolduc, Marilène; Leblanc, Eric; Bergeron, Michel G; Bernardy, Michael G; Leclerc, Denis

    2008-01-01

    A filamentous virus isolated from Malva neglecta Wallr. (common mallow) and propagated in Chenopodium quinoa was grown, cloned and the complete nucleotide sequence was determined (GenBank accession # DQ660333). The genomic RNA is 6858 nt in length and contains five major open reading frames (ORFs). The genomic organization is similar to members and the viral encoded proteins shared homology with the group of the Potexvirus genus in the Flexiviridae family. Phylogenetic analysis revealed a close relationship with narcissus mosaic virus (NMV), scallion virus X (ScaVX) and, to a lesser extent, to Alstroemeria virus X (AlsVX) and pepino mosaic virus (PepMV). A novel putative pseudoknot structure is predicted in the 3'-UTR of a subgroup of potexviruses, including this newly described virus. The consensus GAAAA sequence is detected at the 5'-end of the genomic RNA and experimental data strongly suggest that this motif could be a distinctive hallmark of this genus. The name Malva mosaic virus is proposed.

  18. Complete nucleotide sequence analysis of the norovirus GII.4 Sydney variant in South Korea.

    PubMed

    Park, Ji-Sun; Lee, Sung-Geun; Jin, Ji-Young; Cho, Han-Gil; Jheong, Weon-Hwa; Paik, Soon-Young

    2015-01-01

    Norovirus is the primary cause of acute gastroenteritis in individuals of all ages. In Australia, a new strain of norovirus (GII.4) was identified in March 2012, and this strain has spread rapidly around the world. In August 2012, this new GII.4 strain was identified in patients in South Korea. Therefore, to examine the characteristics of the epidemic norovirus GII.4 2012 variant in South Korea, we conducted KM272334 full-length genomic analysis. The genome of the gg-12-08-04 strain consisted of 7,558 bp and contained three open reading frame (ORF) composites throughout the whole genome: ORF1 (5,100 bp), ORF2 (1,623 bp), and ORF3 (807 bp). Phylogenetic analyses showed that gg-12-08-04 belonged to the GII.4 Sydney 2012 variant, sharing 98.92% nucleotide similarity with this variant strain. According to SimPlot analysis, the gg-12-08-04 strain was a recombinant strain with breakpoint at the ORF1/2 junction between Osaka 2007 and Apeldoorn 2008 strains. This study is the first report of the complete sequence of the GII.4 Sydney 2012 strain in South Korea. Therefore, this may represent the standard sequence of the norovirus GII.4 2012 variant in South Korea and could therefore be useful for the development of norovirus vaccines.

  19. JNSViewer—A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures

    PubMed Central

    Dong, Min; Graham, Mitchell; Yadav, Nehul

    2017-01-01

    Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html. PMID:28582416

  20. Complete nucleotide sequence of rose yellow leaf virus, a new member of the family Tombusviridae.

    PubMed

    Mollov, Dimitre; Lockhart, Ben; Zlesak, David C

    2014-10-01

    The genome of the rose yellow leaf virus (RYLV) has been determined to be 3918 nucleotides long and to contain seven open reading frames (ORFs). ORF1 encodes a 27-kDa peptide (p27). ORF2 shares a common start codon with ORF1 and continues through the amber stop codon of p27 to encode an 87-kDa (p87) protein that has amino acid similarity to the RNA-dependent RNA polymerase (RdRp) of members of the family Tombusviridae. ORFs 3 and 4 have no significant amino acid similarity to known functional viral ORFs. ORF5 encodes a 6-kDa (p6) protein that has similarity to movement proteins of members of the Tombusviridae. ORF5A has no conventional start codon and overlaps with p6. A putative +1 frameshift mechanism allows p6 translation to continue through the stop codon and results in a 12-kDa protein that has high homology to the carmovirus p13 movement protein. The 37-kDa protein encoded by ORF6 has amino acid sequence similarity to coat proteins (CP) of members of the Tombusviridae. ORF7 has no significant amino acid similarity to known viral ORFs. Phylogenetic analysis of the RdRp amino acid sequences grouped RYLV together with the unclassified Rosa rugosa leaf distortion virus (RrLDV), pelargonium line pattern virus (PLPV), and pelargonium chlorotic ring pattern virus (PCRPV) in a distinct subgroup of the family Tombusviridae.

  1. Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition.

    PubMed

    Alberti, Adriana; Poulain, Julie; Engelen, Stefan; Labadie, Karine; Romac, Sarah; Ferrera, Isabel; Albini, Guillaume; Aury, Jean-Marc; Belser, Caroline; Bertrand, Alexis; Cruaud, Corinne; Da Silva, Corinne; Dossat, Carole; Gavory, Frédérick; Gas, Shahinaz; Guy, Julie; Haquelle, Maud; Jacoby, E'krame; Jaillon, Olivier; Lemainque, Arnaud; Pelletier, Eric; Samson, Gaëlle; Wessner, Mark; Acinas, Silvia G; Royo-Llonch, Marta; Cornejo-Castillo, Francisco M; Logares, Ramiro; Fernández-Gómez, Beatriz; Bowler, Chris; Cochrane, Guy; Amid, Clara; Hoopen, Petra Ten; De Vargas, Colomban; Grimsley, Nigel; Desgranges, Elodie; Kandels-Lewis, Stefanie; Ogata, Hiroyuki; Poulton, Nicole; Sieracki, Michael E; Stepanauskas, Ramunas; Sullivan, Matthew B; Brum, Jennifer R; Duhaime, Melissa B; Poulos, Bonnie T; Hurwitz, Bonnie L; Pesant, Stéphane; Karsenti, Eric; Wincker, Patrick

    2017-08-01

    A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009-2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world's planktonic ecosystems.

  2. NASP: a parallel program for identifying evolutionarily conserved nucleic acid secondary structures from nucleotide sequence alignments.

    PubMed

    Semegni, J Y; Wamalwa, M; Gaujoux, R; Harkins, G W; Gray, A; Martin, D P

    2011-09-01

    Many natural nucleic acid sequences have evolutionarily conserved secondary structures with diverse biological functions. A reliable computational tool for identifying such structures would be very useful in guiding experimental analyses of their biological functions. NASP (Nucleic Acid Structure Predictor) is a program that takes into account thermodynamic stability, Boltzmann base pair probabilities, alignment uncertainty, covarying sites and evolutionary conservation to identify biologically relevant secondary structures within multiple sequence alignments. Unique to NASP is the consideration of all this information together with a recursive permutation-based approach to progressively identify and list the most conserved probable secondary structures that are likely to have the greatest biological relevance. By focusing on identifying only evolutionarily conserved structures, NASP forgoes the prediction of complete nucleotide folds but outperforms various other secondary structure prediction methods in its ability to selectively identify actual base pairings. Downloable and web-based versions of NASP are freely available at http://web.cbio.uct.ac.za/~yves/nasp_portal.php yves@cbio.uct.ac.za Supplementary data are available at Bioinformatics online.

  3. Complete Nucleotide Sequence Analysis of the Norovirus GII.4 Sydney Variant in South Korea

    PubMed Central

    Park, Ji-Sun; Lee, Sung-Geun; Cho, Han-Gil; Jheong, Weon-Hwa; Paik, Soon-Young

    2015-01-01

    Norovirus is the primary cause of acute gastroenteritis in individuals of all ages. In Australia, a new strain of norovirus (GII.4) was identified in March 2012, and this strain has spread rapidly around the world. In August 2012, this new GII.4 strain was identified in patients in South Korea. Therefore, to examine the characteristics of the epidemic norovirus GII.4 2012 variant in South Korea, we conducted KM272334 full-length genomic analysis. The genome of the gg-12-08-04 strain consisted of 7,558 bp and contained three open reading frame (ORF) composites throughout the whole genome: ORF1 (5,100 bp), ORF2 (1,623 bp), and ORF3 (807 bp). Phylogenetic analyses showed that gg-12-08-04 belonged to the GII.4 Sydney 2012 variant, sharing 98.92% nucleotide similarity with this variant strain. According to SimPlot analysis, the gg-12-08-04 strain was a recombinant strain with breakpoint at the ORF1/2 junction between Osaka 2007 and Apeldoorn 2008 strains. This study is the first report of the complete sequence of the GII.4 Sydney 2012 strain in South Korea. Therefore, this may represent the standard sequence of the norovirus GII.4 2012 variant in South Korea and could therefore be useful for the development of norovirus vaccines. PMID:25688356

  4. Complete nucleotide sequence of a Spanish isolate of Parietaria mottle virus infecting tomato.

    PubMed

    Galipienso, Luis; Rubio, Luis; López, Luis; Soler, Salvador; Aramburu, José

    2009-10-01

    The genome of a Spanish isolate of Parietaria mottle virus (PMoV) obtained from tomato (strain PMoV-T) was completely sequenced. Protein motifs conserved for RNA viruses were identified: the p1 protein contained a metyltransferase domain in its N-terminal half and a triphosphatase/ helicase domain in its C-terminal half, the p2 protein contained a RNA polymerase domain; the 3a protein contained a RNA-binding domain with α-helix and β-sheet secondary structures. In addition, stem-loop structures with potential capacity of protein interactions were predicted on the untranslated terminal regions. Comparison with the other sequenced PMoV isolate showed nucleotide identities of 93, 90, and 93% for genomic RNAs 1, 2 and 3, respectively, and amino acid identities ranging from 88 to 97% for the different proteins. A cytosine deletion was detected at position 1,366 of RNA 3, involving a start codon for the coat protein (CP) gene different from the other PMoV isolate, resulting in a CP 16 amino acids shorter. Comparison of synonymous and nonsynonymous mutations revealed different selective constraints along the genome.

  5. Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition

    PubMed Central

    Alberti, Adriana; Poulain, Julie; Engelen, Stefan; Labadie, Karine; Romac, Sarah; Ferrera, Isabel; Albini, Guillaume; Aury, Jean-Marc; Belser, Caroline; Bertrand, Alexis; Cruaud, Corinne; Da Silva, Corinne; Dossat, Carole; Gavory, Frédérick; Gas, Shahinaz; Guy, Julie; Haquelle, Maud; Jacoby, E'krame; Jaillon, Olivier; Lemainque, Arnaud; Pelletier, Eric; Samson, Gaëlle; Wessner, Mark; Bazire, Pascal; Beluche, Odette; Bertrand, Laurie; Besnard-Gonnet, Marielle; Bordelais, Isabelle; Boutard, Magali; Dubois, Maria; Dumont, Corinne; Ettedgui, Evelyne; Fernandez, Patricia; Garcia, Espérance; Aiach, Nathalie Giordanenco; Guerin, Thomas; Hamon, Chadia; Brun, Elodie; Lebled, Sandrine; Lenoble, Patricia; Louesse, Claudine; Mahieu, Eric; Mairey, Barbara; Martins, Nathalie; Megret, Catherine; Milani, Claire; Muanga, Jacqueline; Orvain, Céline; Payen, Emilie; Perroud, Peggy; Petit, Emmanuelle; Robert, Dominique; Ronsin, Murielle; Vacherie, Benoit; Acinas, Silvia G.; Royo-Llonch, Marta; Cornejo-Castillo, Francisco M.; Logares, Ramiro; Fernández-Gómez, Beatriz; Bowler, Chris; Cochrane, Guy; Amid, Clara; Hoopen, Petra Ten; De Vargas, Colomban; Grimsley, Nigel; Desgranges, Elodie; Kandels-Lewis, Stefanie; Ogata, Hiroyuki; Poulton, Nicole; Sieracki, Michael E.; Stepanauskas, Ramunas; Sullivan, Matthew B.; Brum, Jennifer R.; Duhaime, Melissa B.; Poulos, Bonnie T.; Hurwitz, Bonnie L.; Acinas, Silvia G.; Bork, Peer; Boss, Emmanuel; Bowler, Chris; De Vargas, Colomban; Follows, Michael; Gorsky, Gabriel; Grimsley, Nigel; Hingamp, Pascal; Iudicone, Daniele; Jaillon, Olivier; Kandels-Lewis, Stefanie; Karp-Boss, Lee; Karsenti, Eric; Not, Fabrice; Ogata, Hiroyuki; Pesant, Stéphane; Raes, Jeroen; Sardet, Christian; Sieracki, Michael E.; Speich, Sabrina; Stemmann, Lars; Sullivan, Matthew B.; Sunagawa, Shinichi; Wincker, Patrick; Pesant, Stéphane; Karsenti, Eric; Wincker, Patrick

    2017-01-01

    A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009–2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world’s planktonic ecosystems. PMID:28763055

  6. Complete nucleotide sequence of an immunoglobulin VH gene homologue from Caiman, a phylogenetically ancient reptile.

    PubMed

    Litman, G W; Berger, L; Murphy, K; Litman, R; Hinds, K; Jahn, C L; Erickson, B W

    1983-05-26

    Immunoglobulin variable (V) gene regions typify extensive multigenic families in terms of overall size, chromosomal arrangement and presence of large numbers of apparent pseudogenes. A unique mechanism of somatic reorganization involving recombination of VH, D and JH or VL and JL segments accompanies the differentiation of lymphoid cells and together with somatic mutation and other types of recombination accounts for V-region diversity. Although these processes have been well characterized in higher mammals, little is known concerning their origin and diversification during phylogenetic time. Previously, we described the blot-hybridization characteristics of murine VHIII probes with restriction enzyme-digested genomic DNA isolated from several phylogenetically critical species, including Caiman crocodylus, a modern representative of an ancient reptilian subclass. Here we have used a murine probe, S107V, to select homologous clones from a library of Caiman genomic DNA constructed in a lambda bacteriophage. The complete nucleotide sequence of a Caiman gene homologous to the murine VH gene and its adjacent 5' and 3' region is described. Comparison of the sequence with mammalian prototypes shows evidence of considerable organizational and structural homology extending outside the presumed VH-coding region and including elements believed to be involved in somatic recombination. Inferences about the evolution of this multigenic family can now be extended to the level of phylogenetic class.

  7. Nucleotide sequence of the Kaposi sarcoma-associated herpesvirus (HHV8)

    PubMed Central

    Russo, James J.; Bohenzky, Roy A.; Chien, Ming-Cheng; Chen, Jing; Yan, Ming; Maddalena, Dawn; Parry, J. Preston; Peruzzi, Daniela; Edelman, Isidore S.; Chang, Yuan; Moore, Patrick S.

    1996-01-01

    The genome of the Kaposi sarcoma-associated herpesvirus (KSHV or HHV8) was mapped with cosmid and phage genomic libraries from the BC-1 cell line. Its nucleotide sequence was determined except for a 3-kb region at the right end of the genome that was refractory to cloning. The BC-1 KSHV genome consists of a 140.5-kb-long unique coding region flanked by multiple G+C-rich 801-bp terminal repeat sequences. A genomic duplication that apparently arose in the parental tumor is present in this cell culture-derived strain. At least 81 ORFs, including 66 with homology to herpesvirus saimiri ORFs, and 5 internal repeat regions are present in the long unique region. The virus encodes homologs to complement-binding proteins, three cytokines (two macrophage inflammatory proteins and interleukin 6), dihydrofolate reductase, bcl-2, interferon regulatory factors, interleukin 8 receptor, neural cell adhesion molecule-like adhesin, and a D-type cyclin, as well as viral structural and metabolic proteins. Terminal repeat analysis of virus DNA from a KS lesion suggests a monoclonal expansion of KSHV in the KS tumor. PMID:8962146

  8. A simple and reliable assay for detecting specific nucleotide sequences in plants using optical thin-film biosensor chips.

    PubMed

    Bai, Su-Lan; Zhong, Xiaobo; Ma, Ligeng; Zheng, Wenjie; Fan, Liu-Min; Wei, Ning; Deng, Xing Wang

    2007-01-01

    Here we report the adaptation and optimization of an efficient, accurate and inexpensive assay that employs custom-designed silicon-based optical thin-film biosensor chips to detect unique transgenes in genetically modified (GM) crops and SNP markers in model plant genomes. Briefly, aldehyde-attached sequence-specific single-stranded oligonucleotide probes are arrayed and covalently attached to a hydrazine-derivatized biosensor chip surface. Unique DNA sequences (or genes) are detected by hybridizing biotinylated PCR amplicons of the DNA sequences to probes on the chip surface. In the SNP assay, target sequences (PCR amplicons) are hybridized in the presence of a mixture of biotinylated detector probes and a thermostable DNA ligase. Only perfect matches between the probe and target sequences, but not those with even a single nucleotide mismatch, can be covalently fixed on the chip surface. In both cases, the presence of specific target sequences is signified by a color change on the chip surface (gold to blue/purple) after brief incubation with an anti-biotin IgG horseradish peroxidase (HRP) to generate a precipitable product from an HRP substrate. Highly sensitive and accurate identification of PCR targets can be completed within 30 min. This assay is extremely robust, exhibits high sensitivity and specificity, and is flexible from low to high throughput and very economical. This technology can be customized for any nucleotide sequence-based identification assay and widely applied in crop breeding, trait mapping, and other work requiring positive detection of specific nucleotide sequences.

  9. Nucleotide sequence of the 3'-noncoding region of alfalfa mosaic virus RNA 4 and its homology with the genomic RNAs.

    PubMed Central

    Koper-Zwarthoff, E C; Brederode, F T; Walstra, P; Bol, J F

    1979-01-01

    A 226-nucleotide fragment was derived from alfalfa mosaic virus RNA 4 (ALMV RNA 4), the subgenomic messenger for viral coat protein, and its sequence was deduced by in vitro labeling with polynucleotide kinase and application of RNA sequencing techniques. The fragment contains the 3'-terminal 45 nucleotides of the coat protein cistron and the complete 3'-noncoding region of 182 nucleotides. The total length of RNA 4 was calculated to be 881 nucleotides. AlMV RNAs 1, 2 and 3 were elongated with a 3'-terminal poly(A) stretch and subjected to sequence analysis by using a specific primer, reverse transcriptase and chain terminators. This revealed and extensive homology between the 3'-terminal 140 to 150 nucleotides of all four ALMV RNAs. Despite a number of base substitutions, the secondary structure of the homologous region is highly conserved. The observed homology indicates that, as with RNA 4, the sites with a high affinity for the viral coat protein are located at the 3'-termini of the genomic RNAs. Images PMID:537914

  10. Coding and 3' non-coding nucleotide sequence of chalcone synthase mRNA and assignment of amino acid sequence of the enzyme

    PubMed Central

    Reimold, Ursula; Kröger, Manfred; Kreuzaler, Fritz; Hahlbrock, Klaus

    1983-01-01

    The nucleotide sequence of an almost complete cDNA copy of chalcone synthase mRNA from cultured parsley cells (Petroselinum hortense) has been determined. The cDNA copy comprised the complete coding sequence for chalcone synthase, a short A-rich stretch of the 5' non-coding region and the complete 3' non-coding region including a poly(A) tail. The amino acid sequence deduced from the nucleotide sequence of the cDNA is consistent with a partial N-terminal sequence analysis, the total amino acid composition, the cyanogen bromide cleavage pattern, and the apparent mol. wt. of the subunit of the purified enzyme. PMID:16453477

  11. ANTICALIgN: visualizing, editing and analyzing combined nucleotide and amino acid sequence alignments for combinatorial protein engineering.

    PubMed

    Jarasch, Alexander; Kopp, Melanie; Eggenstein, Evelyn; Richter, Antonia; Gebauer, Michaela; Skerra, Arne

    2016-07-01

    ANTIC ALIGN: is an interactive software developed to simultaneously visualize, analyze and modify alignments of DNA and/or protein sequences that arise during combinatorial protein engineering, design and selection. ANTIC ALIGN: combines powerful functions known from currently available sequence analysis tools with unique features for protein engineering, in particular the possibility to display and manipulate nucleotide sequences and their translated amino acid sequences at the same time. ANTIC ALIGN: offers both template-based multiple sequence alignment (MSA), using the unmutated protein as reference, and conventional global alignment, to compare sequences that share an evolutionary relationship. The application of similarity-based clustering algorithms facilitates the identification of duplicates or of conserved sequence features among a set of selected clones. Imported nucleotide sequences from DNA sequence analysis are automatically translated into the corresponding amino acid sequences and displayed, offering numerous options for selecting reading frames, highlighting of sequence features and graphical layout of the MSA. The MSA complexity can be reduced by hiding the conserved nucleotide and/or amino acid residues, thus putting emphasis on the relevant mutated positions. ANTIC ALIGN: is also able to handle suppressed stop codons or even to incorporate non-natural amino acids into a coding sequence. We demonstrate crucial functions of ANTIC ALIGN: in an example of Anticalins selected from a lipocalin random library against the fibronectin extradomain B (ED-B), an established marker of tumor vasculature. Apart from engineered protein scaffolds, ANTIC ALIGN: provides a powerful tool in the area of antibody engineering and for directed enzyme evolution. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. PPDMs-a resource for mapping small molecule bioactivities from ChEMBL to Pfam-A protein domains.

    PubMed

    Kruger, Felix A; Gaulton, Anna; Nowotka, Michal; Overington, John P

    2015-03-01

    PPDMs is a resource that maps small molecule bioactivities to protein domains from the Pfam-A collection of protein families. Small molecule bioactivities mapped to protein domains add important precision to approaches that use protein sequence searches alignments to assist applications in computational drug discovery and systems and chemical biology. We have previously proposed a mapping heuristic for a subset of bioactivities stored in ChEMBL with the Pfam-A domain most likely to mediate small molecule binding. We have since refined this mapping using a manual procedure. Here, we present a resource that provides up-to-date mappings and the possibility to review assigned mappings as well as to participate in their assignment and curation. We also describe how mappings provided through the PPDMs resource are made accessible through the main schema of the ChEMBL database. The PPDMs resource and curation interface is available at https://www.ebi.ac.uk/chembl/research/ppdms/pfam_maps. The source-code for PPDMs is available under the Apache license at https://github.com/chembl/pfam_maps. Source code is available at https://github.com/chembl/pfam_map_loader to demonstrate the integration process with the main schema of ChEMBL. © The Author 2014. Published by Oxford University Press.

  13. Gene-based single nucleotide polymorphism discovery in bovine muscle using next-generation transcriptomic sequencing

    PubMed Central

    2013-01-01

    Background Genetic information based on molecular markers has increasingly being used in cattle breeding improvement programmes, as a mean to improve conventionally phenotypic selection. Advances in molecular genetics have led to the identification of several genetic markers associated with genes affecting economic traits. Until recently, the identification of the causative genetic variants involved in the phenotypes of interest has remained a difficult task. The advent of novel sequencing technologies now offers a new opportunity for the identification of such variants. Despite sequencing costs plummeting, sequencing whole-genomes or large targeted regions is still too expensive for most laboratories. A transcriptomic-based sequencing approach offers a cheaper alternative to identify a large number of polymorphisms and possibly to discover causative variants. In the present study, we performed a gene-based single nucleotide polymorphism (SNP) discovery analysis in bovine Longissimus thoraci, using RNA-Seq. To our knowledge, this represents the first study done in bovine muscle. Results Messenger RNAs from Longissimus thoraci from three Limousin bull calves were subjected to high-throughput sequencing. Approximately 36–46 million paired-end reads were obtained per library. A total of 19,752 transcripts were identified and 34,376 different SNPs were detected. Fifty-five percent of the SNPs were found in coding regions and ~22% resulted in an amino acid change. Applying a very stringent SNP quality threshold, we detected 8,407 different high-confidence SNPs, 18% of which are non synonymous coding SNPs. To analyse the accuracy of RNA-Seq technology for SNP detection, 48 SNPs were selected for validation by genotyping. No discrepancies were observed when using the highest SNP probability threshold. To test the usefulness of the identified SNPs, the 48 selected SNPs were assessed by genotyping 93 bovine samples, representing mostly the nine major breeds used in France

  14. Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library.

    PubMed

    Sánchez, Cecilia Castaño; Smith, Timothy P L; Wiedmann, Ralph T; Vallejo, Roger L; Salem, Mohamed; Yao, Jianbo; Rexroad, Caird E

    2009-11-25

    To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs) have been used for single nucleotide polymorphism (SNP) discovery in salmonids. In those strategies, the salmonid semi-tetraploid genomes often led to assemblies of paralogous sequences and therefore resulted in a high rate of false positive SNP identification. Sequencing genomic DNA using primers identified from ESTs proved to be an effective but time consuming methodology of SNP identification in rainbow trout, therefore not suitable for high throughput SNP discovery. In this study, we employed a high-throughput strategy that used pyrosequencing technology to generate data from a reduced representation library constructed with genomic DNA pooled from 96 unrelated rainbow trout that represent the National Center for Cool and Cold Water Aquaculture (NCCCWA) broodstock population. The reduced representation library consisted of 440 bp fragments resulting from complete digestion with the restriction enzyme HaeIII; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends). Three independent data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 independent contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map. Approximately 48% (183) of the putative SNPs were validated; 167 markers were successfully incorporated into the rainbow trout linkage map. In addition, 2% of the sequences from the

  15. Compilation of 5S rRNA and 5S rRNA gene sequences

    PubMed Central

    Specht, Thomas; Wolters, Jörn; Erdmann, Volker A.

    1990-01-01

    The BERLIN RNA DATABANK as of Dezember 31, 1989, contains a total of 667 sequences of 5S rRNAs or their genes, which is an increase of 114 new sequence entries over the last compilation (1). It covers sequences from 44 archaebacteria, 267 eubacteria, 20 plastids, 6 mitochondria, 319 eukaryotes and 11 eukaryotic pseudogenes. The hardcopy shows only the list (Table 1) of those organisms whose sequences have been determined. The BERLIN RNA DATABANK uses the format of the EMBL Nucleotide Sequence Data Library complemented by a Sequence Alignment (SA) field including secondary structure information. PMID:1692116

  16. Complete Nucleotide Sequences and Genome Organization of Two Pepper Mild Mottle Virus Isolates from Capsicum annuum in South Korea

    PubMed Central

    Choi, Seung-Kook; Choi, Gug-Seoun; Kwon, Sun-Jung

    2016-01-01

    The complete genome sequences of pepper mild mottle virus (PMMoV)-P2 and -P3 were determined by the Sanger sequencing method. Although PMMoV-P2 and PMMoV-P3 have different pathogenicity in some pepper cultivars, the complete genome sequences of PMMoV-P2 and -P3 are composed of 6,356 nucleotides (nt). In this study, we report the complete genome sequences and genome organization of PMMoV-P2 and -P3 isolates from pepper species in South Korea. PMID:27198033

  17. Complete nucleotide sequence of a begomovirus associated with satellites molecules infecting a new host Tagetes patula in India.

    PubMed

    Marwal, Avinash; Sahu, Anurag Kumar; Choudhary, Devendra Kumar; Gaur, R K

    2013-08-01

    In the year 2012 leaf curl disease was observed on Marigold (Tagetes patula) in Lakshmangrh, Sikar province of India. Affected plants were severely stunted with apical leaf curl and crinkled leaves, symptoms typical of begomovirus infection. This is the first report of complete nucleotide sequence of a begomovirus associated with satellites molecules infecting a new host Tagetes patula in India.

  18. A high-density simple sequence repeat and single nucleotide polymorphism genetic map of the tetraploid cotton genome

    USDA-ARS?s Scientific Manuscript database

    Cotton genome complexity was investigated with a saturated molecular genetic map that combined several sets of microsatellites or simple sequence repeats (SSR) and the first major public set of single nucleotide polymorphism (SNP) markers in cotton genomes (Gossypium spp.), and that was constructed ...

  19. Comparing genotyping-by-sequencing and Single Nucleotide Polymorphism chip genotyping in Quantitive Trait Loci mapping in wheat

    USDA-ARS?s Scientific Manuscript database

    Array- or chip-based single nucleotide polymorphism (SNP) markers are widely used in genomic studies because of their abundance in a genome and cost less per data point compared to older marker technologies. Genotyping by sequencing (GBS), a relatively newer approach of genotyping, suggests equal or...

  20. [Polymorphism of DNA nucleotide sequence as a source of enhancement of the discrimination potential of the STR-markers].

    PubMed

    Zemskova, E Yu; Timoshenko, T V; Leonov, S N; Ivanov, P L

    2016-01-01

    The objective of the present pilot investigation was to reveal and to study polymorphism of nucleotide sequence in the alleles of STR loci of human autosomal DNA with special reference to the role of this phenomenon as a source of the differences between homonymous allelic variants. The secondary objection was to evaluate the possibility of using the data thus obtained for the enhancement of the informative value of the forensic medical genotyping of STR loci by means of identification of single nucleotide polymorphisms (SNP) for the purpose of extending their allelic spectrum. The methodological basis of the study was constituted by the comprehensive amplified fragment length polymorphism (AFLP) analysis and amplified fragment sequence polymorphisms (AFSP) analysis of DNA with the use of the PLEX-ID^TM analytical mass-spectrometry platform (Abbot Molecular, USA). The study has demonstrated that polymorphism of DNA nucleotide sequence can be regarded as the possible source of enhancement of the discriminating potential of STR markers. It means that the analysis of polymorphism of DNA nucleotide sequence for genotyping AFLP-type markers of chromosomal DNA can considerably increase the effectiveness of their application as individualizing markers for the purpose of molecular genetic expertises.

  1. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... is DNA, RNA, or PRT (protein). If a nucleotide sequence contains both DNA and RNA fragments, the type shall be “DNA.” In addition, the combined DNA/RNA molecule shall be further described in the to feature... combined DNA/RNA” Name/Key Provide appropriate identifier for feature, preferably from WIPO Standard ST.25...

  2. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... is DNA, RNA, or PRT (protein). If a nucleotide sequence contains both DNA and RNA fragments, the type shall be “DNA.” In addition, the combined DNA/RNA molecule shall be further described in the to feature... combined DNA/RNA” Name/Key Provide appropriate identifier for feature, preferably from WIPO Standard ST.25...

  3. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... is DNA, RNA, or PRT (protein). If a nucleotide sequence contains both DNA and RNA fragments, the type shall be “DNA.” In addition, the combined DNA/RNA molecule shall be further described in the to feature... combined DNA/RNA” Name/Key Provide appropriate identifier for feature, preferably from WIPO Standard ST.25...

  4. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... is DNA, RNA, or PRT (protein). If a nucleotide sequence contains both DNA and RNA fragments, the type shall be “DNA.” In addition, the combined DNA/RNA molecule shall be further described in the to feature... combined DNA/RNA” Name/Key Provide appropriate identifier for feature, preferably from WIPO Standard ST.25...

  5. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... is DNA, RNA, or PRT (protein). If a nucleotide sequence contains both DNA and RNA fragments, the type shall be “DNA.” In addition, the combined DNA/RNA molecule shall be further described in the to feature... combined DNA/RNA” Name/Key Provide appropriate identifier for feature, preferably from WIPO Standard ST.25...

  6. Molecular cloning and nucleotide sequence of a transforming gene detected by transfection of chicken B-cell lymphoma DNA

    NASA Astrophysics Data System (ADS)

    Goubin, Gerard; Goldman, Debra S.; Luce, Judith; Neiman, Paul E.; Cooper, Geoffrey M.

    1983-03-01

    A transforming gene detected by transfection of chicken B-cell lymphoma DNA has been isolated by molecular cloning. It is homologous to a conserved family of sequences present in normal chicken and human DNAs but is not related to transforming genes of acutely transforming retroviruses. The nucleotide sequence of the cloned transforming gene suggests that it encodes a protein that is partially homologous to the amino terminus of transferrin and related proteins although only about one tenth the size of transferrin.

  7. The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition

    NASA Astrophysics Data System (ADS)

    Štambuk, Nikola

    The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.

  8. Molecular cloning and nucleotide sequence of the alpha and beta subunits of allophycocyanin from the cyanelle genome of Cyanophora paradoxa.

    PubMed Central

    Bryant, D A; de Lorimier, R; Lambert, D H; Dubbs, J M; Stirewalt, V L; Stevens, S E; Porter, R D; Tam, J; Jay, E

    1985-01-01

    The genes for the alpha- and beta-subunit apoproteins of allophycocyanin (AP) were isolated from the cyanelle genome of Cyanophora paradoxa and subjected to nucleotide sequence analysis. The AP beta-subunit apoprotein gene was localized to a 7.8-kilobase-pair Pst I restriction fragment from cyanelle DNA by hybridization with a tetradecameric oligonucleotide probe. Sequence analysis using that oligonucleotide and its complement as primers for the dideoxy chain-termination sequencing method confirmed the presence of both AP alpha- and beta-subunit genes on this restriction fragment. Additional oligonucleotide primers were synthesized as sequencing progressed and were used to determine rapidly the nucleotide sequence of a 1336-base-pair region of this cloned fragment. This strategy allowed the sequencing to be completed without a detailed restriction map and without extensive and time-consuming subcloning. The sequenced region contains two open reading frames whose deduced amino acid sequences are 81-85% homologous to cyanobacterial and red algal AP subunits whose amino acid sequences have been determined. The two open reading frames are in the same orientation and are separated by 39 base pairs. AP alpha is 5' to AP beta and both coding sequences are preceded by a polypurine, Shine-Dalgarno-type sequence. Sequences upstream from AP alpha closely resemble the Escherichia coli consensus promoter sequences and also show considerable homology to promoter sequences for several chloroplast-encoded psbA genes. A 56-base-pair palindromic sequence downstream from the AP beta gene could play a role in the termination of transcription or translation. The allophycocyanin apoprotein subunit genes are located on the large single-copy region of the cyanelle genome. PMID:2987916

  9. [Molecular phylogenetic analysis of the genus Abies (Pinaceae) based on the nucleotide sequence of chloroplast DNA].

    PubMed

    Semerikova, S A; Semerikov, V L

    2014-01-01

    A phylogenetic study of firs (Abies Mill.) was conducted using nucleotide sequences of several chloroplast DNA regions with a total length of 5580 bp. The analysis included 37 taxa, which represented the main evolutionary lineages of the genus, and Keteleeria daviana. According to phylogenetic reconstruction the Abies species were subdivided into six main groups, generally corresponding to their geographic distribution. The phylogenetic tree had three basal clades. All of these clades contained American species, and only one of them contained Eurasian species. The divergence time calibrations, based on paleobotanical data and the chloroplast DNA mutation rate estimates in Pinaceae, produced similar results..The age of diversification among the clades of the present-day Abies was estimated as the end of the Oligocene-beginning of Miocene. The age of the separation of Mediterranean firs from the Asian-North American branch corresponds to the Miocene. The age of diversification within the young groups of Mediterranean, Asian, and boreal American firs (A. lasiocarpa, A. balsamea, A. fraseri) was estimated as the Pliocene-Pleistocene. Based on the phylogenetic reconstruction obtained, the most plausible biogeographic scenarios were suggested. It is noted that the existing systematic classification of the genus Abies strongly contradicts with phylogenetic reconstruction and requires revision.

  10. Whole-genome sequencing identifies genomic heterogeneity at a nucleotide and chromosomal level in bladder cancer

    PubMed Central

    Morrison, Carl D.; Liu, Pengyuan; Woloszynska-Read, Anna; Zhang, Jianmin; Luo, Wei; Qin, Maochun; Bshara, Wiam; Conroy, Jeffrey M.; Sabatini, Linda; Vedell, Peter; Xiong, Donghai; Liu, Song; Wang, Jianmin; Shen, He; Li, Yinwei; Omilian, Angela R.; Hill, Annette; Head, Karen; Guru, Khurshid; Kunnev, Dimiter; Leach, Robert; Eng, Kevin H.; Darlak, Christopher; Hoeflich, Christopher; Veeranki, Srividya; Glenn, Sean; You, Ming; Pruitt, Steven C.; Johnson, Candace S.; Trump, Donald L.

    2014-01-01

    Using complete genome analysis, we sequenced five bladder tumors accrued from patients with muscle-invasive transitional cell carcinoma of the urinary bladder (TCC-UB) and identified a spectrum of genomic aberrations. In three tumors, complex genotype changes were noted. All three had tumor protein p53 mutations and a relatively large number of single-nucleotide variants (SNVs; average of 11.2 per megabase), structural variants (SVs; average of 46), or both. This group was best characterized by chromothripsis and the presence of subclonal populations of neoplastic cells or intratumoral mutational heterogeneity. Here, we provide evidence that the process of chromothripsis in TCC-UB is mediated by nonhomologous end-joining using kilobase, rather than megabase, fragments of DNA, which we refer to as “stitchers,” to repair this process. We postulate that a potential unifying theme among tumors with the more complex genotype group is a defective replication–licensing complex. A second group (two bladder tumors) had no chromothripsis, and a simpler genotype, WT tumor protein p53, had relatively few SNVs (average of 5.9 per megabase) and only a single SV. There was no evidence of a subclonal population of neoplastic cells. In this group, we used a preclinical model of bladder carcinoma cell lines to study a unique SV (translocation and amplification) of the gene glutamate receptor ionotropic N-methyl D-aspertate as a potential new therapeutic target in bladder cancer. PMID:24469795

  11. Whole-genome sequencing identifies genomic heterogeneity at a nucleotide and chromosomal level in bladder cancer.

    PubMed

    Morrison, Carl D; Liu, Pengyuan; Woloszynska-Read, Anna; Zhang, Jianmin; Luo, Wei; Qin, Maochun; Bshara, Wiam; Conroy, Jeffrey M; Sabatini, Linda; Vedell, Peter; Xiong, Donghai; Liu, Song; Wang, Jianmin; Shen, He; Li, Yinwei; Omilian, Angela R; Hill, Annette; Head, Karen; Guru, Khurshid; Kunnev, Dimiter; Leach, Robert; Eng, Kevin H; Darlak, Christopher; Hoeflich, Christopher; Veeranki, Srividya; Glenn, Sean; You, Ming; Pruitt, Steven C; Johnson, Candace S; Trump, Donald L

    2014-02-11

    Using complete genome analysis, we sequenced five bladder tumors accrued from patients with muscle-invasive transitional cell carcinoma of the urinary bladder (TCC-UB) and identified a spectrum of genomic aberrations. In three tumors, complex genotype changes were noted. All three had tumor protein p53 mutations and a relatively large number of single-nucleotide variants (SNVs; average of 11.2 per megabase), structural variants (SVs; average of 46), or both. This group was best characterized by chromothripsis and the presence of subclonal populations of neoplastic cells or intratumoral mutational heterogeneity. Here, we provide evidence that the process of chromothripsis in TCC-UB is mediated by nonhomologous end-joining using kilobase, rather than megabase, fragments of DNA, which we refer to as "stitchers," to repair this process. We postulate that a potential unifying theme among tumors with the more complex genotype group is a defective replication-licensing complex. A second group (two bladder tumors) had no chromothripsis, and a simpler genotype, WT tumor protein p53, had relatively few SNVs (average of 5.9 per megabase) and only a single SV. There was no evidence of a subclonal population of neoplastic cells. In this group, we used a preclinical model of bladder carcinoma cell lines to study a unique SV (translocation and amplification) of the gene glutamate receptor ionotropic N-methyl D-aspertate as a potential new therapeutic target in bladder cancer.

  12. Proteus mirabilis MR/P fimbrial operon: genetic organization, nucleotide sequence, and conditions for expression.

    PubMed

    Bahrani, F K; Mobley, H L

    1994-06-01

    Proteus mirabilis, an agent of urinary tract infection, expresses at least four fimbrial types. Among these are the MR/P (mannose-resistant/Proteus-like) fimbriae. MrpA, the structural subunit, is optimally expressed at 37 degrees C in Luria broth cultured statically for 48 h by each of seven strains examined. Genes encoding this fimbria were isolated, and the complete nucleotide sequence was determined. The mrp gene cluster encoded by 7,293 bp predicts eight polypeptides: MrpI (22,133 Da), MrpA (17,909 Da), MrpB (19,632 Da), MrpC (96,823 Da), MrpD (27,886 Da), MrpE (19,470 Da), MrpF (17,363 Da), and MrpG (13,169 Da). mrpI is upstream of the gene encoding the major structural subunit gene mrpA and is transcribed in the direction opposite to that of the rest of the operon. All predicted polypeptides share > or = 25% amino acid identity with at least one other enteric fimbrial gene product encoded by the pap, fim, smf, fan, or mrk gene clusters.

  13. Proteus mirabilis MR/P fimbrial operon: genetic organization, nucleotide sequence, and conditions for expression.

    PubMed Central

    Bahrani, F K; Mobley, H L

    1994-01-01

    Proteus mirabilis, an agent of urinary tract infection, expresses at least four fimbrial types. Among these are the MR/P (mannose-resistant/Proteus-like) fimbriae. MrpA, the structural subunit, is optimally expressed at 37 degrees C in Luria broth cultured statically for 48 h by each of seven strains examined. Genes encoding this fimbria were isolated, and the complete nucleotide sequence was determined. The mrp gene cluster encoded by 7,293 bp predicts eight polypeptides: MrpI (22,133 Da), MrpA (17,909 Da), MrpB (19,632 Da), MrpC (96,823 Da), MrpD (27,886 Da), MrpE (19,470 Da), MrpF (17,363 Da), and MrpG (13,169 Da). mrpI is upstream of the gene encoding the major structural subunit gene mrpA and is transcribed in the direction opposite to that of the rest of the operon. All predicted polypeptides share > or = 25% amino acid identity with at least one other enteric fimbrial gene product encoded by the pap, fim, smf, fan, or mrk gene clusters. Images PMID:7910820

  14. Nucleotide sequence of a lysine tRNA from Bacillus subtilis.

    PubMed Central

    Yamada, Y; Ishikura, H

    1977-01-01

    A lysine tRNA (tRNA1Lys) was purified from Bacillus subtilis W168 by a consecutive use of several column chromatographic systems. The nucleotide sequence was determined to be pG-A-G-C-C-A-U-U-A-G-C-U-C-A-G-U-D-G-G-D-A-G-A-G-C-A-U-C-U-G-A-C-U-U(U*)-U-U-K-A-psi-C-A-G-A-G-G-m7G(G)-U-C-G-A-A-G-G-T-psi-C-G-A-G-U-C-C-U-U-C-A-U-G-G-C-U-C-A-C-C-AOH, where K and U* are unidentified nucleosides. The nucleosides of U34 and m7G46 were partially substituted with U* and G, respectively. The binding ability of lysyl-tRNA1Lys to Escherichia coli ribosomes was stimulated with ApApA as well as ApApG. PMID:414208

  15. Quantitative theory of entropic forces acting on constrained nucleotide sequences applied to viruses

    PubMed Central

    Greenbaum, Benjamin D.; Cocco, Simona; Levine, Arnold J.; Monasson, Rémi

    2014-01-01

    We outline a theory to quantify the interplay of entropic and selective forces on nucleotide organization and apply it to the genomes of single-stranded RNA viruses. We quantify these forces as intensive variables that can easily be compared between sequences, outline a computationally efficient transfer-matrix method for their calculation, and apply this method to influenza and HIV viruses. We find viruses altering their dinucleotide motif use under selective forces, with these forces on CpG dinucleotides growing stronger in influenza the longer it replicates in humans. For a subset of genes in the human genome, many involved in antiviral innate immunity, the forces acting on CpG dinucleotides are even greater than the forces observed in viruses, suggesting that both effects are in response to similar selective forces involving the innate immune system. We further find that the dynamics of entropic forces balancing selective forces can be used to predict how long it will take a virus to adapt to a new host, and that it would take H1N1 several centuries to adapt to humans from birds, typically contributing many of its synonymous substitutions to the forcible removal of CpG dinucleotides. By examining the probability landscape of dinucleotide motifs, we predict where motifs are likely to appear using only a single-force parameter and uncover the localization of UpU motifs in HIV. Essentially, we extend the natural language and concepts of statistical physics, such as entropy and conjugated forces, to understanding viral sequences and, more generally, constrained genome evolution. PMID:24639520

  16. Quantitative theory of entropic forces acting on constrained nucleotide sequences applied to viruses.

    PubMed

    Greenbaum, Benjamin D; Cocco, Simona; Levine, Arnold J; Monasson, Rémi

    2014-04-01

    We outline a theory to quantify the interplay of entropic and selective forces on nucleotide organization and apply it to the genomes of single-stranded RNA viruses. We quantify these forces as intensive variables that can easily be compared between sequences, outline a computationally efficient transfer-matrix method for their calculation, and apply this method to influenza and HIV viruses. We find viruses altering their dinucleotide motif use under selective forces, with these forces on CpG dinucleotides growing stronger in influenza the longer it replicates in humans. For a subset of genes in the human genome, many involved in antiviral innate immunity, the forces acting on CpG dinucleotides are even greater than the forces observed in viruses, suggesting that both effects are in response to similar selective forces involving the innate immune system. We further find that the dynamics of entropic forces balancing selective forces can be used to predict how long it will take a virus to adapt to a new host, and that it would take H1N1 several centuries to adapt to humans from birds, typically contributing many of its synonymous substitutions to the forcible removal of CpG dinucleotides. By examining the probability landscape of dinucleotide motifs, we predict where motifs are likely to appear using only a single-force parameter and uncover the localization of UpU motifs in HIV. Essentially, we extend the natural language and concepts of statistical physics, such as entropy and conjugated forces, to understanding viral sequences and, more generally, constrained genome evolution.

  17. Nucleotide sequences and genetic analysis of hydrogen oxidation (hox) genes in Azotobacter vinelandii.

    PubMed Central

    Menon, A L; Mortenson, L E; Robson, R L

    1992-01-01

    Azotobacter vinelandii contains a heterodimeric, membrane-bound [NiFe]hydrogenase capable of catalyzing the reversible oxidation of H2. The beta and alpha subunits of the enzyme are encoded by the structural genes hoxK and hoxG, respectively, which appear to form part of an operon that contains at least one further potential gene (open reading frame 3 [ORF3]). In this study, determination of the nucleotide sequence of a region of 2,344 bp downstream of ORF3 revealed four additional closely spaced or overlapping ORFs. These ORFs, ORF4 through ORF7, potentially encode polypeptides with predicted masses of 22.8, 11.4, 16.3, and 31 kDa, respectively. Mutagenesis of the chromosome of A. vinelandii in the area sequenced was carried out by introduction of antibiotic resistance gene cassettes. Disruption of hoxK and hoxG by a kanamycin resistance gene abolished whole-cell hydrogenase activity coupled to O2 and led to loss of the hydrogenase alpha subunit. Insertional mutagenesis of ORF3 through ORF7 with a promoterless lacZ-Kmr cassette established that the region is transcriptionally active and involved in H2 oxidation. We propose to call ORF3 through ORF7 hoxZ, hoxM, hoxL, hoxO, and hoxQ, respectively. The predicted hox gene products resemble those encoded by genes from hydrogenase-related operons in other bacteria, including Escherichia coli and Alcaligenes eutrophus. Images PMID:1624446

  18. Postzygotic single-nucleotide mosaicisms in whole-genome sequences of clinically unremarkable individuals

    PubMed Central

    Huang, August Y; Xu, Xiaojing; Ye, Adam Y; Wu, Qixi; Yan, Linlin; Zhao, Boxun; Yang, Xiaoxu; He, Yao; Wang, Sheng; Zhang, Zheng; Gu, Bowen; Zhao, Han-Qing; Wang, Meng; Gao, Hua; Gao, Ge; Zhang, Zhichao; Yang, Xiaoling; Wu, Xiru; Zhang, Yuehua; Wei, Liping

    2014-01-01

    Postzygotic single-nucleotide mutations (pSNMs) have been studied in cancer and a few other overgrowth human disorders at whole-genome scale and found to play critical roles. However, in clinically unremarkable individuals, pSNMs have never been identified at whole-genome scale largely due to technical difficulties and lack of matched control tissue samples, and thus the genome-wide characteristics of pSNMs remain unknown. We developed a new Bayesian-based mosaic genotyper and a series of effective error filters, using which we were able to identify 17 SNM sites from ∼80× whole-genome sequencing of peripheral blood DNAs from three clinically unremarkable adults. The pSNMs were thoroughly validated using pyrosequencing, Sanger sequencing of individual cloned fragments, and multiplex ligation-dependent probe amplification. The mutant allele fraction ranged from 5%-31%. We found that C→T and C→A were the predominant types of postzygotic mutations, similar to the somatic mutation profile in tumor tissues. Simulation data showed that the overall mutation rate was an order of magnitude lower than that in cancer. We detected varied allele fractions of the pSNMs among multiple samples obtained from the same individuals, including blood, saliva, hair follicle, buccal mucosa, urine, and semen samples, indicating that pSNMs could affect multiple sources of somatic cells as well as germ cells. Two of the adults have children who were diagnosed with Dravet syndrome. We identified two non-synonymous pSNMs in SCN1A, a causal gene for Dravet syndrome, from these two unrelated adults and found that the mutant alleles were transmitted to their children, highlighting the clinical importance of detecting pSNMs in genetic counseling. PMID:25312340

  19. Methylation levels of the "long interspersed nucleotide element-1" repetitive sequences predict survival of melanoma patients.

    PubMed

    Sigalotti, Luca; Fratta, Elisabetta; Bidoli, Ettore; Covre, Alessia; Parisi, Giulia; Colizzi, Francesca; Coral, Sandra; Massarut, Samuele; Kirkwood, John M; Maio, Michele

    2011-05-26

    The prognosis of cutaneous melanoma (CM) differs for patients with identical clinico-pathological stage, and no molecular markers discriminating the prognosis of stage III individuals have been established. Genome-wide alterations in DNA methylation are a common event in cancer. This study aimed to define the prognostic value of genomic DNA methylation levels in stage III CM patients. Overall level of genomic DNA methylation was measured using bisulfite pyrosequencing at three CpG sites (CpG1, CpG2, CpG3) of the Long Interspersed Nucleotide Element-1 (LINE-1) sequences in short-term CM cultures from 42 stage IIIC patients. The impact of LINE-1 methylation on overall survival (OS) was assessed using Cox regression and Kaplan-Meier analysis. Hypomethylation (i.e., methylation below median) at CpG2 and CpG3 sites significantly associated with improved prognosis of CM, CpG3 showing the strongest association. Patients with hypomethylated CpG3 had increased OS (P = 0.01, log-rank = 6.39) by Kaplan-Meyer analysis. Median OS of patients with hypomethylated or hypermethylated CpG3 were 31.9 and 11.5 months, respectively. The 5 year OS for patients with hypomethylated CpG3 was 48% compared to 7% for patients with hypermethylated sequences. Among the variables examined by Cox regression analysis, LINE-1 methylation at CpG2 and CpG3 was the only predictor of OS (Hazard Ratio = 2.63, for hypermethylated CpG3; 95% Confidence Interval: 1.21-5.69; P = 0.01). LINE-1 methylation is identified as a molecular marker of prognosis for CM patients in stage IIIC. Evaluation of LINE-1 promises to represent a key tool for driving the most appropriate clinical management of stage III CM patients.

  20. Phylogenetic analysis of beta-papillomaviruses as inferred from nucleotide and amino acid sequence data.

    PubMed

    Gottschling, Marc; Köhler, Anja; Stockfleth, Eggert; Nindl, Ingo

    2007-01-01

    Human papillomaviruses (HPV) of the beta-group seem to be involved in the pathogenesis of non-melanoma skin cancer. Papillomaviruses are host specific and are considered closely co-evolving with their hosts. Evolutionary incongruence between early genes and late genes has been reported among oncogenic genital alpha-papillomaviruses and considerably challenge phylogenetic reconstructions. We investigated the relationships of 29 beta-HPV (25 types plus four putative new types, subtypes, or variants) as inferred from codon aligned and amino acid sequence data of the genes E1, E2, E6, E7, L1, and L2 using likelihood, distance, and parsimony approaches. An analysis of a L1 fragment included additional nucleotide and amino acid sequences from seven non-human beta-papillomaviruses. Early genes and late genes evolution did not conflict significantly in beta-papillomaviruses based on partition homogeneity tests (p > or = 0.001). As inferred from the complete genome analyses, beta-papillomaviruses were monophyletic and segregated into four highly supported monophyletic assemblages corresponding to the species 1, 2, 3, and fused 4/5. They basically split into the species 1 and the remainder of beta-papillomaviruses, whose species 3, 4, and 5 constituted the sistergroup of species 2. beta-Papillomaviruses have been isolated from humans, apes, and monkeys, and phylogenetic analyses of the L1 fragment showed non-human papillomaviruses highly polyphyletic nesting within the HPV species. Thus, host and virus phylogenies were not congruent in beta-papillomaviruses, and multiple invasions across species borders may contribute (additionally to host-linked evolution) to their diversification.

  1. Evaluation of the flanking nucleotide sequences of sarcomeric hypertrophic cardiomyopathy substitution mutations.

    PubMed

    Meurs, Kathryn M; Mealey, Katrina L

    2008-07-03

    Hypertrophic cardiomyopathy (HCM) is a familial myocardial disease with a prevalence of 1 in 500. More than 400 causative mutations have been identified in 13 sarcomeric and myofilament related genes, 350 of these are substitution mutations within eight sarcomeric genes. Within a population, examples of recurring identical disease causing mutations that appear to have arisen independently have been noted as well as those that appear to have been inherited from a common ancestor. The large number of novel HCM mutations could suggest a mechanism of increased mutability within the sarcomeric genes. The objective of this study was to evaluate the most commonly reported HCM genes, beta myosin heavy chain (MYH7), myosin binding protein C, troponin I, troponin T, cardiac regulatory myosin light chain, cardiac essential myosin light chain, alpha tropomyosin and cardiac alpha-actin for sequence patterns surrounding the substitution mutations that may suggest a mechanism of increased mutability. The mutations as well as the 10 flanking nucleotides were evaluated for frequency of di-, tri- and tetranucleotides containing the mutation as well as for the presence of certain tri- and tetranculeotide motifs. The most common substitutions were guanine (G) to adenine (A) and cytosine (C) to thymidine (T). The CG dinucleotide had a significantly higher relative mutability than any other dinucleotide (p<0.05). The relative mutability of each possible trinucleotide and tetranucleotide sequence containing the mutation was calculated; none were at a statistically higher frequency than the others. The large number of G to A and C to T mutations as well as the relative mutability of CG may suggest that deamination of methylated CpG is an important mechanism for mutation development in at least some of these cardiac genes.

  2. Methylation levels of the "long interspersed nucleotide element-1" repetitive sequences predict survival of melanoma patients

    PubMed Central

    2011-01-01

    Background The prognosis of cutaneous melanoma (CM) differs for patients with identical clinico-pathological stage, and no molecular markers discriminating the prognosis of stage III individuals have been established. Genome-wide alterations in DNA methylation are a common event in cancer. This study aimed to define the prognostic value of genomic DNA methylation levels in stage III CM patients. Methods Overall level of genomic DNA methylation was measured using bisulfite pyrosequencing at three CpG sites (CpG1, CpG2, CpG3) of the Long Interspersed Nucleotide Element-1 (LINE-1) sequences in short-term CM cultures from 42 stage IIIC patients. The impact of LINE-1 methylation on overall survival (OS) was assessed using Cox regression and Kaplan-Meier analysis. Results Hypomethylation (i.e., methylation below median) at CpG2 and CpG3 sites significantly associated with improved prognosis of CM, CpG3 showing the strongest association. Patients with hypomethylated CpG3 had increased OS (P = 0.01, log-rank = 6.39) by Kaplan-Meyer analysis. Median OS of patients with hypomethylated or hypermethylated CpG3 were 31.9 and 11.5 months, respectively. The 5 year OS for patients with hypomethylated CpG3 was 48% compared to 7% for patients with hypermethylated sequences. Among the variables examined by Cox regression analysis, LINE-1 methylation at CpG2 and CpG3 was the only predictor of OS (Hazard Ratio = 2.63, for hypermethylated CpG3; 95% Confidence Interval: 1.21-5.69; P = 0.01). Conclusion LINE-1 methylation is identified as a molecular marker of prognosis for CM patients in stage IIIC. Evaluation of LINE-1 promises to represent a key tool for driving the most appropriate clinical management of stage III CM patients. PMID:21615918

  3. Full length nucleotide sequences of 30 common SLC44A2 alleles encoding human neutrophil antigen-3 (HNA-3)

    PubMed Central

    Chen, Qing; Srivastava, Kshitij; Ardinski, Stefanie C.; Lam, Kevin; Huvard, Michael J.; Schmid, Pirmin; Flegel, Willy A.

    2015-01-01

    Background HNA-3a alloantibodies can cause severe transfusion-related acute lung injury (TRALI). The frequency of the single nucleotide polymorphisms (SNPs) indicative of the two clinically relevant HNA-3a/b antigens are known in many populations. In the present study, we determined the full length nucleotide sequence of common SLC44A2 alleles encoding the choline transporter-like protein-2 (CTL2) that harbors HNA-3a/b antigens. Study design and methods A method was devised to determine the full length coding sequence and adjacent intron sequences from genomic DNA by 8 polymerase chain reaction (PCR) amplifications covering all 22 SLC44A2 exons. Samples from 200 African American, 96 Caucasian, 2 Hispanic and 4 Asian blood donors were analyzed. We developed a decision tree to determine alleles (confirmed haplotypes) from the genotype data. Results A total of 10 SNPs were detected in the SLC44A2 coding sequence. The non-coding sequences harbored an additional 28 SNPs (1 in the 5’-untranslated region (UTR); 23 in the introns; and 4 in the 3’-UTR). No SNP indicative of a non-functional allele was detected. The nucleotide sequences for 30 SLC44A2 alleles (haplotypes) were confirmed. There may be 66 haplotypes among the 604 chromosomes screened. Conclusions We found 38 SNPs, including 1 novel SNP, in 8192 nucleotides covering the coding sequence of the SLC44A2 gene among 302 blood donors. Population frequencies of these SNPs were established for African Americans and Caucasians. Because alleles encoding HNA-3b are more common than non-functional SLC44A2 alleles, we confirmed our previous postulate that African American donors are less likely to form HNA-3a antibodies compared to Caucasians. PMID:26437811

  4. The primary structure of E. coli RNA polymerase, Nucleotide sequence of the rpoC gene and amino acid sequence of the beta'-subunit.

    PubMed

    Ovchinnikov YuA; Monastyrskaya, G S; Gubanov, V V; Guryev, S O; Salomatina, I S; Shuvaeva, T M; Lipkin, V M; Sverdlov, E D

    1982-07-10

    The primary structure of the E. coli rpoC gene (5321 base pairs) coding the beta'-subunit of RNA polymerase as well as its adjacent segment have been determined. The structure analysis of the peptides obtained by cleavage of the protein with cyanogen bromide and trypsin has confirmed the amino acid sequence of the beta'-subunit deduced from the nucleotide sequence analysis. The beta'-subunit of E. coli RNA polymerase contains 1407 amino acid residues. Its translation is initiated by codon GUG and terminated by codon TAA. It has been detected that the sequence following the terminating codon is strikingly homologous to known sequences of rho-independent terminators.

  5. Sequence-Specific Incorporation of Enzyme-Nucleotide Chimera by DNA Polymerases.

    PubMed

    Welter, Moritz; Verga, Daniela; Marx, Andreas

    2016-08-16

    DNA polymerases select the right nucleotide for the growing polynucleotide chain based on the shape and geometry of the nascent nucleotide pairs and thereby ensure high DNA replication selectivity. High-fidelity DNA polymerases are believed to possess tight active sites that allow little deviation from the canonical structures. However, DNA polymerases are known to use nucleotides with small modifications as substrates, which is key for numerous core biotechnology applications. We show that even high-fidelity DNA polymerases are capable of efficiently using nucleotide chimera modified with a large protein like horseradish peroxidase as substrates for template-dependent DNA synthesis, despite this "cargo" being more than 100-fold larger than the natural substrates. We exploited this capability for the development of systems that enable naked-eye detection of DNA and RNA at single nucleotide resolution. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. The nucleotide sequence of blue-green algae phenylalanine-tRNA and the evolutionary origin of chloroplasts.

    PubMed Central

    Hecker, L I; Barnett, W E; Lin, F K; Furr, T D; Heckman, J E; RajBhandary, U L; Chang, S H

    1982-01-01

    Phenylalanine tRNA from the blue-green alga, Agmenellum quadruplicatum, has been purified to homogeneity. The nucleotide sequence of this tRNA was determined to be: (see tests) Comparisons of the sequence and the modified nucleosides of this tRNA with those of other tRNAPhes thus far sequenced, indicate that this blue green algal tRNAPhe is typically prokaryotic and closely resembles the chloroplast tRNAPhes of higher plants and Euglena. The significance of this observation to the evolutionary origin of chloroplasts is discussed. Images PMID:6817301

  7. Nucleotide sequence of a cluster of early and late genes in a conserved segment of the vaccinia virus genome.

    PubMed Central

    Plucienniczak, A; Schroeder, E; Zettlmeissl, G; Streeck, R E

    1985-01-01

    The nucleotide sequence of a 7.6 kb vaccinia DNA segment from a genomic region conserved among different orthopox virus has been determined. This segment contains a tight cluster of 12 partly overlapping open reading frames most of which can be correlated with previously identified early and late proteins and mRNAs. Regulatory signals used by vaccinia virus have been studied. Presumptive promoter regions are rich in A, T and carry the consensus sequences TATA and AATAA spaced at 20-24 base pairs. Tandem repeats of a CTATTC consensus sequence are proposed to be involved in the termination of early transcription. PMID:2987815

  8. Nucleotide sequence of a complementary DNA encoding pea cytosolic copper/zinc superoxide dismutase. [Pisum sativum L

    SciTech Connect

    White, D.A.; Zilinskas, B.A. )

    1991-08-01

    The authors now report the nucleotide sequence of the cytosolic Cu/Zn SOD cloned from a {lambda}gt11 cDNA library constructed from mRNA extracted from leaves of 7- to 10-d pea seedlings (Pisum sativum L.). The clone was isolated using a 22-base synthetic oligonucleotide complementary to the amino acid sequence CGIIGLQG. This sequence, found at the protein's carboxy terminus, is highly conserved among plant cytosolic Cu/Zn SODs but not chloroplastic Cu/Zn SODs. The 738-base pair sequence contains an open reading frame specifying 152 codons and a predicted M{sub r} of 18,024 D. The deduced amino acid sequence is highly homologous (79-82% identity) with the sequences of other known plant cytosolic Cu/Zn SODs but less highly conserved (63-65%) when compared with several chloroplastic Cu/Zn SODs including pea (10).

  9. ChEMBL web services: streamlining access to drug discovery data and utilities

    PubMed Central

    Davies, Mark; Nowotka, Michał; Papadatos, George; Dedman, Nathan; Gaulton, Anna; Atkinson, Francis; Bellis, Louisa; Overington, John P.

    2015-01-01

    ChEMBL is now a well-established resource in the fields of drug discovery and medicinal chemistry research. The ChEMBL database curates and stores standardized bioactivity, molecule, target and drug data extracted from multiple sources, including the primary medicinal chemistry literature. Programmatic access to ChEMBL data has been improved by a recent update to the ChEMBL web services (version 2.0.x, https://www.ebi.ac.uk/chembl/api/data/docs), which exposes significantly more data from the underlying database and introduces new functionality. To complement the data-focused services, a utility service (version 1.0.x, https://www.ebi.ac.uk/chembl/api/utils/docs), which provides RESTful access to commonly used cheminformatics methods, has also been concurrently developed. The ChEMBL web services can be used together or independently to build applications and data processing workflows relevant to drug discovery and chemical biology. PMID:25883136

  10. Single nucleotide polymorphism discovery from expressed sequence tags in the waterflea Daphnia magna

    PubMed Central

    2011-01-01

    Background Daphnia (Crustacea: Cladocera) plays a central role in standing aquatic ecosystems, has a well known ecology and is widely used in population studies and environmental risk assessments. Daphnia magna is, especially in Europe, intensively used to study stress responses of natural populations to pollutants, climate change, and antagonistic interactions with predators and parasites, which have all been demonstrated to induce micro-evolutionary and adaptive responses. Although its ecology and evolutionary biology is intensively studied, little is known on the functional genomics underpinning of phenotypic responses to environmental stressors. The aim of the present study was to find genes expressed in presence of environmental stressors, and target such genes for single nucleotide polymorphic (SNP) marker development. Results We developed three expressed sequence tag (EST) libraries using clonal lineages of D. magna exposed to ecological stressors, namely fish predation, parasite infection and pesticide exposure. We used these newly developed ESTs and other Daphnia ESTs retrieved from NCBI GeneBank to mine for SNP markers targeting synonymous as well as non synonymous genetic variation. We validate the developed SNPs in six natural populations of D. magna distributed at regional scale. Conclusions A large proportion (47%) of the produced ESTs are Daphnia lineage specific genes, which are potentially involved in responses to environmental stress rather than to general cellular functions and metabolic activities, or reflect the arthropod's aquatic lifestyle. The characterization of genes expressed under stress and the validation of their SNPs for population genetic study is important for identifying ecologically responsive genes in D. magna. PMID:21668940

  11. Detection of nasopharyngeal carcinoma susceptibility with single nucleotide polymorphism analysis using next-generation sequencing technology

    PubMed Central

    Wu, Mu-Yun; Huang, Shu-Jing; Yang, Fan; Qin, Xin-Tian; Liu, Dong; Ding, Ying; Yang, Shu; Wang, Xi-Cheng

    2017-01-01

    Nasopharyngeal carcinoma (NPC) is a head and neck cancer with high incidence in South China and East Asia. To provide a theoretical basis for NPC risk screening and early prevention, we conducted a meta-analysis of relevant literature on the association of single nucleotide polymorphisms (SNP)s with NPC susceptibility. Further, expression of 15 candidate SNPs identified in the meta-analysis was evaluated in a cohort of NPC patients and healthy volunteers using next-generation sequencing technology. Among the 15 SNPs detected in the meta-analysis, miR-146a (rs2910164, C>G), HCG9 (rs3869062, A>G), HCG9 (rs16896923, T>C), MMP2 (rs243865, C>T), GABBR1 (rs2076483, T>C), and TP53 (rs1042522, C>G) were associated with decreased susceptibility to NPC, while GSTM1 (+/DEL), IL-10 (rs1800896, A>G), MDM2 (rs2279744, T>G), MDS1-EVI1 (rs6774494, G>A), XPC (rs2228000, C>T), HLA-F (rs3129055, T>C), SPLUNC1 (rs2752903, T>C; and rs750064, A>G), and GABBR1 (rs29232, G>A) were associated with increased susceptibility to NPC. In our case-control study, an association with increased risk for NPC was found for the AG vs AA genotype in HCG9 (rs3869062, A>G). In addition, heterozygous deletion of the GSTM1 allele was associated with increased susceptibility to NPC, while an SNP in GABBR1 (rs29232, G>A) was associated with decreased risk, and might thus have a protective role on NPC carcinogenesis. This work provides the first comprehensive assessment of SNP expression and its relationship to NPC risk. It suggests the need for well-designed, larger confirmatory studies to validate its findings. PMID:28881764

  12. Nucleotide sequence and characterization of a Bacillus subtilis gene encoding a flagellar switch protein.

    PubMed Central

    Zuberi, A R; Bischoff, D S; Ordal, G W

    1991-01-01

    The nucleotide sequence of the Bacillus subtilis fliM gene has been determined. This gene encodes a 38-kDa protein that is homologous to the FliM flagellar switch proteins of Escherichia coli and Salmonella typhimurium. Expression of this gene in Che+ cells of E. coli and B. subtilis interferes with normal chemotaxis. The nature of the chemotaxis defect is dependent upon the host used. In B. subtilis, overproduction of FliM generates mostly nonmotile cells. Those cells that are motile switch less frequently. Expression of B. subtilis FliM in E. coli also generates nonmotile cells. However, those cells that are motile have a tumble bias. The B. subtilis fliM gene cannot complement an E. coli fliM mutant. A frameshift mutation was constructed in the fliM gene, and the mutation was transferred onto the B. subtilis chromosome. The mutant has a Fla- phenotype. This phenotype is consistent with the hypothesis that the FliM protein encodes a component of the flagellar switch in B. subtilis. Additional characterization of the fliM mutant suggests that the hag and mot loci are not expressed. These loci are regulated by the SigD form of RNA polymerase. We also did not observe any methyl-accepting chemotaxis proteins in an in vivo methylation experiment. The expression of these proteins is also dependent upon SigD. It is possible that a functional basal body-hook complex may be required for the expression of SigD-regulated chemotaxis and motility genes. Images PMID:1898932

  13. MISHIMA--a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data.

    PubMed

    Kryukov, Kirill; Saitou, Naruya

    2010-03-18

    Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences. We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences. MISHIMA provides improved performance compared to the commonly used multiple alignment methods. As an example, six complete genome sequences of bacteria species Helicobacter pylori (about 1.7 Mb each) were successfully aligned in about 6 hours using a single PC.

  14. HLA-C locus allelic dropout in Sanger sequence-based typing due to intronic single nucleotide polymorphism.

    PubMed

    Cheng, Christopher; Kashi, Zahra Mehdizadeh; Martin, Russell; Woodruff, Gillian; Dinauer, David; Agostini, Tina

    2014-12-01

    We report a novel HLA-C allele that was identified during routine HLA typing using sequence-based methods. The patient was initially typed as a C*06:02, 06:04 with two nucleotide mismatches in exon 3, (C to T and T to G changes) which would have resulted in a non-synonymous mutation of a leucine residue being replaced with tryptophan. Further resolution of the patient's type by using sequence-specific primers (SSP) revealed that the companion allele to C*06:02 was a novel C*17:01. Confirmation of the existence of the new allele was performed across multiple platforms: Sanger sequencing, SSP, and Next Generation Sequencing (NGS) on the original sample and allele-specific clones for the entire HLA-C locus. The investigation revealed a single nucleotide mismatch within the Sanger sequencing primer binding site in intron 3. The mutation caused the initial C*17 dropout in exons 2 and 3. Further analysis of the Sanger and NGS data revealed that the C*17 had two additional unique positions in introns 2 and 7. The companion C*06:02 allele also possessed a novel position at intron 3. On August 31, 2013, the WHO nomenclature committee officially named the novel C*17:01 allele sequence as C*17:01:01:03 and the novel C*06:02 allele sequence as C*06:02:01:03.

  15. Nucleotide Sequence and Genetic Structure of a Novel Carbaryl Hydrolase Gene (cehA) from Rhizobium sp. Strain AC100

    PubMed Central

    Hashimoto, Masayuki; Fukui, Mitsuru; Hayano, Kouichi; Hayatsu, Masahito

    2002-01-01

    Rhizobium sp. strain AC100, which is capable of degrading carbaryl (1-naphthyl-N-methylcarbamate), was isolated from soil treated with carbaryl. This bacterium hydrolyzed carbaryl to 1-naphthol and methylamine. Carbaryl hydrolase from the strain was purified to homogeneity, and its N-terminal sequence, molecular mass (82 kDa), and enzymatic properties were determined. The purified enzyme hydrolyzed 1-naphthyl acetate and 4-nitrophenyl acetate indicating that the enzyme is an esterase. We then cloned the carbaryl hydrolase gene (cehA) from the plasmid DNA of the strain and determined the nucleotide sequence of the 10-kb region containing cehA. No homologous sequences were found by a database homology search using the nucleotide and deduced amino acid sequences of the cehA gene. Six open reading frames including the cehA gene were found in the 10-kb region, and sequencing analysis shows that the cehA gene is flanked by two copies of insertion sequence-like sequence, suggesting that it makes part of a composite transposon. PMID:11872471

  16. Cloning and nucleotide sequence analysis of the colH gene from Clostridium histolyticum encoding a collagenase and a gelatinase.

    PubMed Central

    Yoshihara, K; Matsushita, O; Minami, J; Okabe, A

    1994-01-01

    The colH gene encoding a collagenase was cloned from Clostridium histolyticum JCM 1403. Nucleotide sequencing showed a major open reading frame encoding a 116-kDa protein of 1,021 amino acid residues. The deduced amino acid sequence contains a putative signal sequence and a zinc metalloprotease consensus sequence, HEXXH. A 116-kDa collagenase and a 98-kDa gelatinase were copurified from culture supernatants of C. histolyticum. While the former degraded both native and denatured collagen, the latter degraded only denatured collagen. Peptide mapping with V8 protease showed that all peptide fragments, except a few minor ones, liberated from the two enzymes coincided with each other. Analysis of the N-terminal amino acid sequence of the two enzymes revealed that their first 24 amino acid residues were identical and coincided with those deduced from the nucleotide sequence. These results indicate that the 98-kDa gelatinase is generated from the 116-kDa collagenase by cleaving off the C-terminal region, which could be responsible for binding or increasing the accessibility of the collagenase to native collagen fibers. The role of the C-terminal region in the functional and evolutional aspects of the collagenase was further studied by comparing the amino acid sequence of the C. histolyticum collagenase with those of three homologous enzymes: the collagenases from Clostridium perfringens and Vibrio alginolyticus and Achromobacter lyticus protease I. Images PMID:7961400

  17. Nucleotide sequences and mapping of novel heterogenous 5'-termini of adenovirus 2 early region 4 mRNA.

    PubMed Central

    Hashimoto, S; Pursley, M H; Green, M

    1981-01-01

    The major 5'-termini of human adenovirus type 2 early gene block 4 mRNA were sequenced. Poly(A+) polyribosomal RNA was isolated from Ad2 early infected cells, the 5'-terminal m7GPPP removed and the 5'-OH of the penultimate 2'-0-methylated nucleotide labeled with [gamma-32P]ATP using polynucleotide kinase. Ad2 E4 mRNA was purified by hybridization to the Ad2 EcoRI-C fragment and was digested with RNase T1. The resulting oligonucleotides were resolved by two dimensional paper electrophoresis-homochromatography. Four major and 3-4 minor 5'-terminal sequences were identified and characterized. The sequence of the 5'-terminal structures of the major four termini are: (1) m7GpppUmU(m)UUACACUGp, (2) m7GpppUmU(m)UACACUGp, (3) m7GpppUmU(m)ACACUGp, and (4) m7Gppp(m6)AmC(m)ACUGp. These major 5'-terminal sequences were aligned with nucleotide 325, 326, 327, and 329 from the righthand end of the known Ad2 DNA sequence (1) in the region mapped as the 5'-terminus of E4 mRNA by electron microscopy (2,3) and S1 nuclease-gel (4) mapping. Two potential ribosomal binding sites and an initiator codon were found at 40 to 65 nucleotides and about 80 nucleotides, respectively, from these heterogenous 5'-termini. Ad2 E4 major mRNA species appear to be unique since mRNA molecules initiate at a pyrimidine, perhaps by RNA polymerase stuttering, or they are products of an unusual type of RNA processing. Images PMID:6164992

  18. In silico development and characterization of tri-nucleotide simple sequence repeat markers in hazelnut (Corylus avellana L.)

    PubMed Central

    2017-01-01

    Plant genomes are now sequenced rapidly and inexpensively. In silico approaches allow efficient development of simple sequence repeat markers, also known as microsatellite markers, from these sequences. A search of the genome sequence of 'Jefferson' hazelnut (Corylus avellana L.) identified 8,708 tri-nucleotide simple sequence repeats with at least five repeat units, and stepwise removal of the less promising sequences led to the development of 150 polymorphic markers. Fragments in the 'Jefferson' sequence containing tri-nucleotide repeats were used as references and aligned with genomic sequences from seven other cultivars. Following in silico alignment, sequences that showed variation in number of repeat units were selected and primer pairs were designed for 243 of them. Screening on agarose gels identified 173 as polymorphic. Removal of duplicate and previously published sequences reduced the number to 150, for which fluorescent primers and capillary electrophoresis were used for amplicon sizing. These were characterized using 50 diverse hazelnut accessions. Of the 150, 132 generated the expected one or two alleles per accession while 18 amplified more than two amplicons in at least one accession. Diversity parameters of the 132 marker loci averaged 4.73 for number of alleles, 0.51 for expected heterozygosity (He), 0.49 for observed heterozygosity (Ho), 0.46 for polymorphism information content (PIC), and 0.04 for frequency of null alleles. The clustering of the 50 accessions in a dendrogram constructed from the 150 markers confirmed the wide genetic diversity and presence of three of the four major geographic groups: Central European, Black Sea, and Spanish-Italian. In the mapping population, 105 loci segregated, of which 101 were assigned to a linkage group (LG), with positions well-dispersed across all 11 LGs. These new markers will be useful for cultivar fingerprinting, diversity studies, genome comparisons, mapping, and alignment of the linkage map with the

  19. Nucleotide sequence and analysis of the 58.3 to 65.5-kb early region of bacteriophage T4.

    PubMed Central

    Valerie, K; Stevens, J; Lynch, M; Henderson, E E; de Riel, J K

    1986-01-01

    The complete 7.2-kb nucleotide sequence from the 58.3 to 65.5-kb early region of bacteriophage T4 has been determined by Maxam and Gilbert sequencing. Computer analysis revealed at least 20 open reading frames (ORFs) within this sequence. All major ORFs are transcribed from the left strand, suggesting that they are expressed early during infection. Among the ORFs, we have identified the ipIII, ipII, denV and tk genes. The ORFs are very tightly spaced, even overlapping in some instances, and when ORF interspacing occurs, promoter-like sequences can be implicated. Several of the sequences preceding the ORFs, in particular those at ipIII, ipII, denV, and orf61.9, can potentially form stable stem-loop structures. PMID:3024113

  20. Complete nucleotide sequence and analysis of the putative polyprotein of maize dwarf mosaic virus genomic RNA (Bulgarian isolate).

    PubMed

    Kong, P; Steinbiss, H H

    1998-01-01

    The complete nucleotide sequence of maize dwarf mosaic virus Bulgarian isolate (MDMV-Bg) was determined. The viral genome was 9515 nt and contained an open reading frame encoding 3042 amino acids, flanked by 3'- and 5'-UTRs of 139 and 250 nucleotides, respectively. MDMV-Bg was more conserved in the coding region (52.9%) than in the UTRs (45.8%) when compared to the 15 other potyviruses. Of ten putative gene products of MDMV-Bg, the P1 was the most variable protein (24.9%) while the NIb was the most conserved protein (67.3%). Several sequence variations were observed between MDMV-Bg and Johnson grass mosaic virus (JGMV), and more between MDMV-Bg and the dicot potyviruses. Phylogenetic analysis suggested that MDMV-Bg was the most closely related to JGMV.

  1. The nucleotide sequence of the coat protein genes of satsuma dwarf virus and naval orange infectious mottling virus.

    PubMed

    Iwanami, T; Kondo, Y; Makita, Y; Azeyanagi, C; Ieki, H

    1998-01-01

    The sequence of the 3'-terminal 4320 and 2409 nucleotides were determined for RNA2 of satsuma dwarf virus (SDV) and navel infectious mottling virus (NIMV). Both sequences contained a part of a long open reading frame which encodes larger and smaller coat proteins (CPs) at the 3'-terminus followed by a 3'non-coding region upstream of a poly (A) tail. Amino acid sequence identity for larger and smaller CPs ranged 81-84% and 68-78%, respectively, among SDV, NIMV and the previously sequenced citrus mosaic virus (CiMV). No significant sequence similarity was found between the CPs of SDV or NIMV and those of the como-, nepo- or other viruses. The nucleotide sequence identity of the 3' non-coding region of RNA2 were 68%-78% among SDV, CiMV and NIMV. These results suggest that SDV, CiMV and NIMV are distinct, though related, viruses. They may be assigned as members of the new genus, which is close to the genera of Comovirus and Nepovirus.

  2. A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution.

    PubMed

    Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme

    2013-07-01

    The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/. Supplementary data are available at Bioinformatics online.

  3. Cloning and nucleotide sequence of the gene coding for aspartokinase II from a thermophilic methylotrophic Bacillus sp.

    PubMed Central

    Schendel, F J; Flickinger, M C

    1992-01-01

    The structural gene coding for the lysine-sensitive aspartokinase II of the methylotrophic thermotolerant Bacillus sp. strain MGA3 was cloned from a genomic library by complementation of an Escherichia coli auxotrophic mutant lacking all three aspartokinase isozymes. The nucleotide sequence of the entire 2.2-kb PstI fragment was determined, and a single open reading frame coding for the aspartokinase II enzyme was found. Aspartokinase II was shown to be an alpha 2 beta 2 tetramer (M(r) 122,000) with the beta subunit (M(r) 18,000) encoded within the alpha subunit (M(r) 45,000) in the samea reading frame. The enzyme was purified, and the N-terminal sequences of the alpha and beta subunits were identical with those predicted from the gene sequences. The predicted amino acid sequence was 76% identical with the sequence of the Bacillus subtilis aspartokinase II. The transcription initiation site was located approximately 350 bp upstream of the translation start site, and putative promoter regions at -10 (TATGCT) and -35 (ATGACA) were identified. A 300-nucleotide intervening sequence between the transcription initiation and translational start sites suggests a possible attenuation mechanism for the regulation of transcription of this enzyme in the presence of lysine. Images PMID:1444390

  4. Complete nucleotide sequence of the haemagglutinin gene from a human influenza virus of the Hong Kong subtype.

    PubMed Central

    Both, G W; Sleigh, M J

    1980-01-01

    The complete nucleotide sequence has been determined for a cloned double-stranded DNA copy of the haemagglutinin gene from the human influenza strain A/NT/60/68/29C, a laboratory-isolated variant of A/NT/60/68, an early strain of the Hong Kong subtype. The gene is 1765 nucleotides long and contains information sufficient to code for a protein of 566 amino acids, which includes a hydrophobic leader peptide (16 residues), HA1 (328), HA2 (221) and an arginine residue which joins the HA subunits. Comparison of the predicted amino acid sequence for 29C haemagglutinin with protein sequence data available for HA from other influenza strains shows that no potential coding information is lost by processing of the mRNA. A comparison of the amino acid sequences predicted from the gene sequences for 29C and fowl plague virus haemagglutinins, (1) indicates the extent to which changes can occur in the primary sequence of different regions of the protein, while maintaining essential structure and function. Images PMID:6253883

  5. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms

    PubMed Central

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450

  6. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    PubMed

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  7. The Nucleotide Capture Region of Alpha Hemolysin: Insights into Nanopore Design for DNA Sequencing from Molecular Dynamics Simulations.

    PubMed

    Manara, Richard M A; Tomasio, Susana; Khalid, Syma

    2015-01-27

    Nanopore technology for DNA sequencing is constantly being refined and improved. In strand sequencing a single strand of DNA is fed through a nanopore and subsequent fluctuations in the current are measured. A major hurdle is that the DNA is translocated through the pore at a rate that is too fast for the current measurement systems. An alternative approach is "exonuclease sequencing", in which an exonuclease is attached to the nanopore that is able to process the strand, cleaving off one base at a time. The bases then flow through the nanopore and the current is measured. This method has the advantage of potentially solving the translocation rate problem, as the speed is controlled by the exonuclease. Here we consider the practical details of exonuclease attachment to the protein alpha hemolysin. We employ molecular dynamics simulations to determine the ideal (a) distance from alpha-hemolysin, and (b) the orientation of the monophosphate nucleotides upon release from the exonuclease such that they will enter the protein. Our results indicate an almost linear decrease in the probability of entry into the protein with increasing distance of nucleotide release. The nucleotide orientation is less significant for entry into the protein.

  8. A close relationship between primary nucleotides sequence structure and the composition of functional genes in the genome of prokaryotes.

    PubMed

    Garcia, Juan A L; Fernández-Guerra, Antoni; Casamayor, Emilio O

    2011-12-01

    Comparative genomics is an essential tool to unravel how genomes change over evolutionary time and to gain clues on the links between functional genomics and evolution. In prokaryotes, the large, good quality, genome sequences available in public databases and the recently developed large-scale computational methods, offer an unprecedent view on the ecology and evolution of microorganisms through comparative genomics. In this work, we examined the links among genome structure (i.e., the sequential distribution of nucleotides itself by detrended fluctuation analysis, DFA) and genomic diversity (i.e., gene functionality by Clusters of Orthologous Genes, COGs) in 828 full sequenced prokaryotic genomes from 548 different bacteria and archaea species. DFA scaling exponent α indicated persistent long-range correlations (fractality) in each genome analyzed. Higher resolution power was found when considering the sequential succession of purine (AG) vs. pyrimidine (CT) bases than either keto (GT) to amino (AC) forms or strongly (GC) vs. weakly (AT) bonded nucleotides. Interestingly, the phyla Aquificae, Fusobacteria, Dictyoglomi, Nitrospirae, and Thermotogae were closer to archaea than to their bacterial counterparts. A strong significant correlation was found between scaling exponent α and COGs distribution, and we consistently observed that the larger α the more heterogeneous was the gene distribution within each functional category, suggesting a close relationship between primary nucleotides sequence structure and functional genes composition.

  9. [Analysis on the preference of synonymous codon in VP1 nucleotide sequence of the EV71 based on RSCU method].

    PubMed

    Qi, Bin; Zhao, Jing-Jing; Gao, Lei; Zhu, Ping

    2009-11-01

    Based on RSCU method and by analyzing the preference of codon usage in VP1 nucleotide sequences of EV71 isolated in Chinese mainland and Taiwan region from 1998 to 2008, it is clear that there is an obvious time discrimination in RSCU calculated from EV71 VP1 strain between two different regions of China and it is more obvious in Taiwan region, therefore, according to the diversity of RSCU, the years can be divided into 2 intervals in Chinese mainland and 4 intervals in Taiwan region, especially, the number of intervals in one region have a positive co-relation with the activity of variation of the EV71 in the same region. The change of the preference of codon usage in VP1 nucleotide sequences of EV71 can significantly embody the Variation of the EV71, so we can make use of the analysis on preference of codon usage in VP1 nucleotide sequences of EV71 to predict the possible variation trend of the EV71.

  10. Complete nucleotide sequence and gene rearrangement of the mitochondrial genome of the bell-ring frog, Buergeria buergeri (family Rhacophoridae).

    PubMed

    Sano, Naomi; Kurabayashi, Atsushi; Fujii, Tamotsu; Yonekawa, Hiromichi; Sumida, Masayuki

    2004-06-01

    In this study we determined the complete nucleotide sequence (19,959 bp) of the mitochondrial DNA of the rhacophorid frog Buergeria buergeri. The gene content, nucleotide composition, and codon usage of B. buergeri conformed to those of typical vertebrate patterns. However, due to an accumulation of lengthy repetitive sequences in the D-loop region, this species possesses the largest mitochondrial genome among all the vertebrates examined so far. Comparison of the gene organizations among amphibian species (Rana, Xenopus, salamanders and caecilians) revealed that the positioning of four tRNA genes and the ND5 gene in the mtDNA of B. buergeri diverged from the common vertebrate gene arrangement shared by Xenopus, salamanders and caecilians. The unique positions of the tRNA genes in B. buergeri are shared by ranid frogs, indicating that the rearrangements of the tRNA genes occurred in a common ancestral lineage of ranids and rhacophorids. On the other hand, the novel position of the ND5 gene seems to have arisen in a lineage leading to rhacophorids (and other closely related taxa) after ranid divergence. Phylogenetic analysis based on nucleotide sequence data of all mitochondrial genes also supported the gene rearrangement pathway.

  11. Nucleotide Sequence of the blaRTG-2 (CARB-5) Gene and Phylogeny of a New Group of Carbenicillinases

    PubMed Central

    Choury, Daniele; Szajnert, Marie-France; Joly-Guillou, Marie-Laure; Azibi, Kemal; Delpech, Marc; Paul, Gérard

    2000-01-01

    We determined the nucleotide sequence of the bla gene for the Acinetobacter calcoaceticus β-lactamase previously described as CARB-5. Alignment of the deduced amino acid sequence with those of known β-lactamases revealed that CARB-5 possesses an RTG triad in box VII, as described for the Proteus mirabilis GN79 enzyme, instead of the RSG consensus characteristic of the other carbenicillinases. Phylogenetic studies showed that these RTG enzymes constitute a new, separate group, possibly ancestors of the carbenicillinase family. PMID:10722515

  12. Testing evolutionary models to explain the process of nucleotide substitution in gut bacterial 16S rRNA gene sequences.

    PubMed

    Garcia-Mazcorro, Jose F

    2013-09-01

    The 16S rRNA gene has been widely used as a marker of gut bacterial diversity and phylogeny, yet we do not know the model of evolution that best explains the differences in its nucleotide composition within and among taxa. Over 46 000 good-quality near-full-length 16S rRNA gene sequences from five bacterial phyla were obtained from the ribosomal database project (RDP) by study and, when possible, by within-study characteristics (e.g. anatomical region). Using alignments (RDPX and MUSCLE) of unique sequences, the FINDMODEL tool available at http://www.hiv.lanl.gov/ was utilized to find the model of character evolution (28 models were available) that best describes the input sequence data, based on the Akaike information criterion. The results showed variable levels of agreement (from 33% to 100%) in the chosen models between the RDP-based and the MUSCLE-based alignments among the taxa. Moreover, subgroups of sequences (using either alignment method) from the same study were often explained by different models. Nonetheless, the different representatives of the gut microbiota were explained by different proportions of the available models. This is the first report using evolutionary models to explain the process of nucleotide substitution in gut bacterial 16S rRNA gene sequences. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  13. MoD Tools: regulatory motif discovery in nucleotide sequences from co-regulated or homologous genes

    PubMed Central

    Pavesi, Giulio; Mereghetti, Paolo; Zambelli, Federico; Stefani, Marco; Mauri, Giancarlo; Pesole, Graziano

    2006-01-01

    Understanding the complex mechanisms regulating gene expression at the transcriptional and post-transcriptional levels is one of the greatest challenges of the post-genomic era. The MoD (MOtif Discovery) Tools web server comprises a set of tools for the discovery of novel conserved sequence and structure motifs in nucleotide sequences, motifs that in turn are good candidates for regulatory activity. The server includes the following programs: Weeder, for the discovery of conserved transcription factor binding sites (TFBSs) in nucleotide sequences from co-regulated genes; WeederH, for the discovery of conserved TFBSs and distal regulatory modules in sequences from homologous genes; RNAProfile, for the discovery of conserved secondary structure motifs in unaligned RNA sequences whose secondary structure is not known. In this way, a given gene can be compared with other co-regulated genes or with its homologs, or its mRNA can be analyzed for conserved motifs regulating its post-transcriptional fate. The web server thus provides researchers with different strategies and methods to investigate the regulation of gene expression, at both the transcriptional and post-transcriptional levels. Available at and . PMID:16845071

  14. IRE1α nucleotide sequence cleavage specificity in the unfolded protein response.

    PubMed

    Poothong, Juthakorn; Sopha, Pattarawut; Kaufman, Randal J; Tirasophon, Witoon

    2017-01-01

    Inositol-requiring enzyme 1 (IRE1) is a conserved sensor of the unfolded protein response that has protein kinase and endoribonuclease (RNase) enzymatic activities and thereby initiates HAC1/XBP1 splicing. Previous studies demonstrated that human IRE1α (hIRE1α) does not cleave Saccharomyces cerevisiae HAC1 mRNA. Using an in vitro cleavage assay, we show that adenine to cytosine nucleotide substitution at the +1 position in the 3' splice site of HAC1 RNA is required for specific cleavage by hIRE1α. A similar restricted nucleotide specificity in the RNA substrate was observed for XBP1 splicing in vivo. Together these findings underscore the essential role of cytosine nucleotide at +1 in the 3' splice site for determining cleavage specificity of hIRE1α.

  15. SMRT Sequencing of Long Tandem Nucleotide Repeats in SCA10 Reveals Unique Insight of Repeat Expansion Structure

    PubMed Central

    Landrian, Ivette; Godiska, Ronald; Shanker, Savita; Yu, Fahong; Farmerie, William G.; Ashizawa, Tetsuo

    2015-01-01

    A large, non-coding ATTCT repeat expansion causes the neurodegenerative disorder, spinocerebellar ataxia type 10 (SCA10). In a subset of SCA10 patients, interruption motifs are present at the 5’ end of the expansion and strongly correlate with epileptic seizures. Thus, interruption motifs are a predictor of the epileptic phenotype and are hypothesized to act as a phenotypic modifier in SCA10. Yet, the exact internal sequence structure of SCA10 expansions remains unknown due to limitations in current technologies for sequencing across long extended tracts of tandem nucleotide repeats. We used the third generation sequencing technology, Single Molecule Real Time (SMRT) sequencing, to obtain full-length contiguous expansion sequences, ranging from 2.5 to 4.4 kb in length, from three SCA10 patients with different clinical presentations. We obtained sequence spanning the entire length of the expansion and identified the structure of known and novel interruption motifs within the SCA10 expansion. The exact interruption patterns in expanded SCA10 alleles will allow us to further investigate the potential contributions of these interrupting sequences to the pathogenic modification leading to the epilepsy phenotype in SCA10. Our results also demonstrate that SMRT sequencing is useful for deciphering long tandem repeats that pose as “gaps” in the human genome sequence. PMID:26295943

  16. SNUFER: A software for localization and presentation of single nucleotide polymorphisms using a Clustal multiple sequence alignment output file

    PubMed Central

    Mansur, Marco A B; Cardozo, Giovana P; Santos, Elaine V; Marins, Mozart

    2008-01-01

    SNUFER is a software for the automatic localization and generation of tables used for the presentation of single nucleotide polymorphisms (SNPs). After input of a fasta file containing the sequences to be analyzed, a multiple sequence alignment is generated using ClustalW ran inside SNUFER. The ClustalW output file is then used to generate a table which displays the SNPs detected in the aligned sequences and their degree of similarity. This table can be exported to Microsoft Word, Microsoft Excel or as a single text file, permitting further editing for publication. The software was written using Delphi 7 for programming and FireBird 2.0 for sequence database management. It is freely available for noncommercial use and can be downloaded from http://www.heranza.com.br/bioinformatica2.htm. PMID:19238196

  17. Nucleotide sequence and phylogeny of a chloramphenicol acetyltransferase encoded by the plasmid pSCS7 from Staphylococcus aureus.

    PubMed

    Schwarz, S; Cardoso, M

    1991-08-01

    The nucleotide sequence of the chloramphenicol acetyltransferase gene (cat) and its regulatory region, encoded by the plasmid pSCS7 from Staphylococcus aureus, was determined. The structural cat gene encoded a protein of 209 amino acids, which represented one monomer of the enzyme chloramphenicol acetyltransferase (CAT). Comparisons between the amino acid sequences of the pSCS7-encoded CAT from S. aureus and the previously sequenced CAT variants from S. aureus, Staphylococcus intermedius, Staphylococcus haemolyticus, Bacillus pumilis, Clostridium difficile, Clostridium perfringens, Escherichia coli, Shigella flexneri, and Proteus mirabilis were performed. An alignment of CAT amino acid sequences demonstrated the presence of 34 conserved amino acids among all CAT variants. These conserved residues were considered for their possible roles in the structure and function of CAT. On the basis of the alignment, a phylogenetic tree was constructed. It demonstrated relatively large evolutionary distances between the CAT variants of enteric bacteria, Clostridium, Bacillus, and Staphylococcus species.

  18. Complete Nucleotide Sequence and Genetic Organization of the 210-Kilobase Linear Plasmid of Rhodococcus erythropolis BD2

    PubMed Central

    Stecker, Christiane; Johann, Andre; Herzberg, Christina; Averhoff, Beate; Gottschalk, Gerhard

    2003-01-01

    The complete nucleotide sequence of the linear plasmid pBD2 from Rhodococcus erythropolis BD2 comprises 210,205 bp. Sequence analyses of pBD2 revealed 212 putative open reading frames (ORFs), 97 of which had an annotatable function. These ORFs could be assigned to six functional groups: plasmid replication and maintenance, transport and metalloresistance, catabolism, transposition, regulation, and protein modification. Many of the transposon-related sequences were found to flank the isopropylbenzene pathway genes. This finding together with the significant sequence similarities of the ipb genes to genes of the linear plasmid-encoded biphenyl pathway in other rhodococci suggests that the ipb genes were acquired via transposition events and subsequently distributed among the rhodococci via horizontal transfer. PMID:12923100

  19. Organization and nucleotide sequence of a densovirus genome imply a host-dependent evolution of the parvoviruses.

    PubMed Central

    Bando, H; Kusuda, J; Gojobori, T; Maruyama, T; Kawase, S

    1987-01-01

    The genome structure of a densovirus from a silkworm was determined by sequencing more than 85% of the complete genome DNA. This is the first report of the genome organization of an insect parvovirus deduced from the DNA sequence. In the viral genome, two large open reading frames designated 1 and 2 and one smaller open reading frame designated 3 were identified. The first two open reading frames shared the same strand, while the third was found in the complementary sequence. Computer analysis suggested that open reading frame 2 may encode all four structural proteins. The genome organization and a part of the nucleotide sequence were conserved among the insect densovirus, rodent parvoviruses, and a human dependovirus. These viruses may have diverged from a common ancestor. PMID:3027382

  20. Nucleotide sequence analysis of the coat protein genes of two Korean isolates of sweet potato feathery mottle potyvirus.

    PubMed

    Ryu, K H; Kim, S J; Park, W M

    1998-01-01

    The coat protein (CP) genes of the genomic RNA of two Korean isolates of sweet potato feathery mottle potyvirus (SPFMV), SPFMV-K1 and SPFMV-K2, were cloned and their complete nucleotide sequences were determined. Sequence comparisons of the two Korean isolates showed 97.8% amino acid identity in the CP cistron, and 79.9% to 99.0% identity with those of 6 other known SPFMV strains. Of 74 amino acid changes totally among the SPFMV strains, 39 changes were located at the N-terminal region. Pairwise amino acid sequence comparison revealed sequence similarities of 48.6 to 70.2% between SPFMV and 20 other potyviruses, indicating SPFMV to be a quite distinct species. Multiple alignment of the CP cistrons from other potyviruses showed that most of the conserved amino acid residues of the genus Potyvirus are well preserved in the corresponding locations.

  1. Nucleotide sequence of the melA gene, coding for alpha-galactosidase in Escherichia coli K-12.

    PubMed Central

    Liljeström, P L; Liljeström, P

    1987-01-01

    Melibiose uptake and hydrolysis in E.coli is performed by the MelB and MelA proteins, respectively. We report the cloning and sequencing of the melA gene. The nucleotide sequence data showed that melA codes for a 450 amino acid long protein with a molecular weight of 50.6 kd. The sequence data also supported the assumption that the mel locus forms an operon with melA in proximal position. A comparison of MelA with alpha-galactosidase proteins from yeast and human origin showed that these proteins have only limited homology, the yeast and human proteins being more related. However, regions common to all three proteins were found indicating sequences that might comprise the active site of alpha-galactosidase. PMID:3031590

  2. Cloning and nucleotide sequence of the gene coding for enzymatically active fragments of the Bacillus polymyxa beta-amylase.

    PubMed Central

    Kawazu, T; Nakanishi, Y; Uozumi, N; Sasaki, T; Yamagata, H; Tsukagoshi, N; Udaka, S

    1987-01-01

    The gene encoding beta-amylase was cloned from Bacillus polymyxa 72 into Escherichia coli HB101 by inserting HindIII-generated DNA fragments into the HindIII site of pBR322. The 4.8-kilobase insert was shown to direct the synthesis of beta-amylase. A 1.8-kilobase AccI-AccI fragment of the donor strain DNA was sufficient for the beta-amylase synthesis. Homologous DNA was found by Southern blot analysis to be present only in B. polymyxa 72 and not in other bacteria such as E. coli or B. subtilis. B. polymyxa, as well as E. coli harboring the cloned DNA, was found to produce enzymatically active fragments of beta-amylases (70,000, 56,000, or 58,000, and 42,000 daltons), which were detected in situ by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Nucleotide sequence analysis of the cloned 3.1-kilobase DNA revealed that it contains one open reading frame of 2,808 nucleotides without a translational stop codon. The deduced amino acid sequence for these 2,808 nucleotides encoding a secretory precursor of the beta-amylase protein is 936 amino acids including a signal peptide of 33 or 35 residues at its amino-terminal end. The existence of a beta-amylase of larger than 100,000 daltons, which was predicted on the basis of the results of nucleotide sequence analysis of the gene, was confirmed by examining culture supernatants after various cultivation periods. It existed only transiently during cultivation, but the multiform beta-amylases described above existed for a long time. The large beta-amylase (approximately 160,000 daltons) existed for longer in the presence of a protease inhibitor such as chymostatin, suggesting that proteolytic cleavage is the cause of the formation of multiform beta-amylases. Images PMID:2435707

  3. Cloning and nucleotide sequence of the gene coding for enzymatically active fragments of the Bacillus polymyxa beta-amylase.

    PubMed

    Kawazu, T; Nakanishi, Y; Uozumi, N; Sasaki, T; Yamagata, H; Tsukagoshi, N; Udaka, S

    1987-04-01

    The gene encoding beta-amylase was cloned from Bacillus polymyxa 72 into Escherichia coli HB101 by inserting HindIII-generated DNA fragments into the HindIII site of pBR322. The 4.8-kilobase insert was shown to direct the synthesis of beta-amylase. A 1.8-kilobase AccI-AccI fragment of the donor strain DNA was sufficient for the beta-amylase synthesis. Homologous DNA was found by Southern blot analysis to be present only in B. polymyxa 72 and not in other bacteria such as E. coli or B. subtilis. B. polymyxa, as well as E. coli harboring the cloned DNA, was found to produce enzymatically active fragments of beta-amylases (70,000, 56,000, or 58,000, and 42,000 daltons), which were detected in situ by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Nucleotide sequence analysis of the cloned 3.1-kilobase DNA revealed that it contains one open reading frame of 2,808 nucleotides without a translational stop codon. The deduced amino acid sequence for these 2,808 nucleotides encoding a secretory precursor of the beta-amylase protein is 936 amino acids including a signal peptide of 33 or 35 residues at its amino-terminal end. The existence of a beta-amylase of larger than 100,000 daltons, which was predicted on the basis of the results of nucleotide sequence analysis of the gene, was confirmed by examining culture supernatants after various cultivation periods. It existed only transiently during cultivation, but the multiform beta-amylases described above existed for a long time. The large beta-amylase (approximately 160,000 daltons) existed for longer in the presence of a protease inhibitor such as chymostatin, suggesting that proteolytic cleavage is the cause of the formation of multiform beta-amylases.

  4. A novel regucalcin gene promoter region-related protein: comparison of nucleotide and amino acid sequences in vertebrate species.

    PubMed

    Sawada, Natsumi; Yamaguchi, Masayoshi

    2005-01-01

    The molecular cloning and sequencing of the cDNA coding for a novel regucalcin gene promoter region-related protein (RGPR-p117) from bovine, rabbit and chicken livers was investigated using rapid amplification of cDNA endo (RACE) method. Their nucleotide and amino acid sequences were compared with human, rat and mouse sequences published previously. RGPR-p117 of bovine, rabbit and chicken livers consisted of 1052, 1045, and 929 amino acid residues with calculated molecular mass of 117, 114, and 103 kDa, and estimated pI of 5.64, 5.84, and 5.59, respectively. Comparison analysis revealed that the nucleotide sequences of RGPR-p117 from mammalian species were highly-conserved in their coding region, and the homologies were at least 72.9%. The RGPR-p117 proteins in mammalian species consisted of 1045-1060 amino acids, and had 63.1-90.2% identity. Meanwhile, the nucleotide and amino acid sequences of chicken RGPR-p117 had at least 36.4 and 43.7% identities, respectively. Phylogenetic analysis showed that RGPR-p117 in six vertebrates appears to form a single cluster. Mammalian RGPR-p117 conserved a leucine zipper motif. Moreover, the analysis for subcellular localization of RGPR-p117 from six vertebrates showed the probability of nuclear localization >52.2%; the nuclear localization in rat and mouse was 78.3%. This study demonstrates a great conservation of RGPR-p117 genes throughout evolution.

  5. Single nucleotide polymorphism barcoding of cytochrome c oxidase I sequences for discriminating 17 species of Columbidae by decision tree algorithm.

    PubMed

    Yang, Cheng-Hong; Wu, Kuo-Chuan; Dahms, Hans-Uwe; Chuang, Li-Yeh; Chang, Hsueh-Wei

    2017-07-01

    DNA barcodes are widely used in taxonomy, systematics, species identification, food safety, and forensic science. Most of the conventional DNA barcode sequences contain the whole information of a given barcoding gene. Most of the sequence information does not vary and is uninformative for a given group of taxa within a monophylum. We suggest here a method that reduces the amount of noninformative nucleotides in a given barcoding sequence of a major taxon, like the prokaryotes, or eukaryotic animals, plants, or fungi. The actual differences in genetic sequences, called single nucleotide polymorphism (SNP) genotyping, provide a tool for developing a rapid, reliable, and high-throughput assay for the discrimination between known species. Here, we investigated SNPs as robust markers of genetic variation for identifying different pigeon species based on available cytochrome c oxidase I (COI) data. We propose here a decision tree-based SNP barcoding (DTSB) algorithm where SNP patterns are selected from the DNA barcoding sequence of several evolutionarily related species in order to identify a single species with pigeons as an example. This approach can make use of any established barcoding system. We here firstly used as an example the mitochondrial gene COI information of 17 pigeon species (Columbidae, Aves) using DTSB after sequence trimming and alignment. SNPs were chosen which followed the rule of decision tree and species-specific SNP barcodes. The shortest barcode of about 11 bp was then generated for discriminating 17 pigeon species using the DTSB method. This method provides a sequence alignment and tree decision approach to parsimoniously assign a unique and shortest SNP barcode for any known species of a chosen monophyletic taxon where a barcoding sequence is available.

  6. Role of base stacking and sequence context in the inhibition of yeast DNA polymerase eta by pyrene nucleotide.

    PubMed

    Hwang, Hanshin; Taylor, John-Stephen

    2004-11-23

    The Y family DNA polymerase yeast pol eta inserts pyrene deoxyribose monophosphate (dPMP) in preference to A opposite an abasic site, the 3'-T of a thymine dimer, and a normal T with almost equal efficiency. In contrast, pol A family polymerases such as Klenow fragment and T7 DNA polymerase only insert dPMP efficiently opposite an abasic site and the 3'-T of a thymine dimer but not opposite undamaged DNA. Pyrene nucleotide is also an efficient chain-terminating inhibitor of DNA synthesis by pol eta but not by Klenow fragment or T7 DNA polymerase. To better understand the origin of the efficiency and sequence specificity of dPMP insertion by pol eta, the kinetics of dPMP insertion opposite various templates have been determined. In one sequence context, the efficiency of dPMP insertion increases 4.6-fold opposite G < A < T < C, suggesting that the templating nucleotide modulates dPMP insertion efficiency by having to destack prior to dPTP binding. The efficiency of insertion of dPMP opposite T in the same sequence context increases 7-fold for primers terminating in G < A < C < T and is similar to that observed for nontemplated blunt-end extension, suggesting that stacking interactions between the pyrene and the primer terminus are also important. On heterogeneous templates, the average selectivity for dPMP insertion relative to the complementary dNMP decreases in the order of dAMP > dGMP > dTMP > dCMP, from a high of 5.8 when dAMP is to be inserted following a T to a low of 0.5 when dCMP is to be inserted following a C. The relative preference for dPMP insertion at a given site can be largely explained by the energetic cost of destacking the templating base and stacking of pyrene nucleotide relative to that of stacking and base pairing the complementary nucleotide. Thus, pyrene nucleotide represents a novel class of nucleotide-based chain-terminating DNA synthesis inhibitors whose base portion consists of a hydrophobic, non-hydrogen bonding, base-pair mimic.

  7. Development of single-nucleotide polymorphism markers for Bromus tectorum (Poaceae) from a partially sequenced transcriptome

    Treesearch

    Keith R. Merrill; Craig E. Coleman; Susan E. Meyer; Elizabeth A. Leger; Katherine A. Collins

    2016-01-01

    Premise of the study: Bromus tectorum (Poaceae) is an annual grass species that is invasive in many areas of the world but most especially in the U.S. Intermountain West. Single-nucleotide polymorphism (SNP) markers were developed for use in investigating the geospatial and ecological diversity of B. tectorum in the Intermountain West to better understand the...

  8. Complete nucleotide sequence of Rose yellow leaf virus, a new member of the family Tombusviridae

    USDA-ARS?s Scientific Manuscript database

    The genome of the Rose yellow leaf virus (RYLV) has been determined to be 3918 nucleotides containing seven open reading frames (ORFs). ORF1 encodes a 27 kDa peptide (p27). ORF2 shares a common start codon with ORF1 and continues through the amber stop codon of p27 to encode a 87 kDa (p87) protein t...

  9. Cloning, sequence, and properties of the soluble pyridine nucleotide transhydrogenase of Pseudomonas fluorescens.

    PubMed Central

    French, C E; Boonstra, B; Bufton, K A; Bruce, N C

    1997-01-01

    The gene encoding the soluble pyridine nucleotide transhydrogenase (STH) of Pseudomonas fluorescens was cloned and expressed in Escherichia coli. STH is related to the flavoprotein disulfide oxidoreductases but lacks one of the conserved redox-active cysteine residues. The gene is highly similar to an E. coli gene of unknown function. PMID:9098078

  10. The Nucleotide Capture Region of Alpha Hemolysin: Insights into Nanopore Design for DNA Sequencing from Molecular Dynamics Simulations

    PubMed Central

    Manara, Richard M. A.; Tomasio, Susana; Khalid, Syma

    2015-01-01

    Nanopore technology for DNA sequencing is constantly being refined and improved. In strand sequencing a single strand of DNA is fed through a nanopore and subsequent fluctuations in the current are measured. A major hurdle is that the DNA is translocated through the pore at a rate that is too fast for the current measurement systems. An alternative approach is “exonuclease sequencing”, in which an exonuclease is attached to the nanopore that is able to process the strand, cleaving off one base at a time. The bases then flow through the nanopore and the current is measured. This method has the advantage of potentially solving the translocation rate problem, as the speed is controlled by the exonuclease. Here we consider the practical details of exonuclease attachment to the protein alpha hemolysin. We employ molecular dynamics simulations to determine the ideal (a) distance from alpha-hemolysin, and (b) the orientation of the monophosphate nucleotides upon release from the exonuclease such that they will enter the protein. Our results indicate an almost linear decrease in the probability of entry into the protein with increasing distance of nucleotide release. The nucleotide orientation is less significant for entry into the protein.

  11. Detection and nucleotide sequence analysis of the speC gene in Swedish clinical group A streptococcal isolates.

    PubMed Central

    Norrby-Teglund, A; Holm, S E; Norgren, M

    1994-01-01

    The production of pyrogenic exotoxins SpeA, SpeB, and SpeC by group A streptococci has been associated with streptococcal toxic shock syndrome. Several epidemiological studies using DNA hybridization and PCR analysis have been performed in attempts to correlate one or several of the toxins with streptococcal toxic shock syndrome. The results reveal great variation in the occurrence of the speA and speC genes among clinical isolates. In this study, we show that the speC gene could be detected by nested PCR in five Swedish T1M1 strains isolated from patients infected with group A streptococci as well as in three Norwegian T1M1 isolates, previously reported to lack speC as determined by dot blot hybridization. To verify the identities of the amplified products, the nucleotide sequences of the PCR fragments from one Swedish T1M1 strain and from the toxin reference strain NY5 were determined. The nucleotide sequences showed that the amplified products were speC and of allele type C2, on the basis of the nucleotides in positions 438 and 456. However, one additional base pair substitution was found in NY5 at position 147 and in the Swedish isolate at position 157, which resulted in nonsynonymous amino acid changes. Thus, these speC genes represent two new allelic variants. Images PMID:8195383

  12. Identification of essential nucleotides in an upstream repressing sequence of Saccharomyces cerevisiae by selection for increased expression of TRK2.

    PubMed Central

    Vidal, M; Buckley, A M; Yohn, C; Hoeppner, D J; Gaber, R F

    1995-01-01

    The TRK2 gene in Saccharomyces cerevisiae encodes a membrane protein involved in potassium transport and is expressed at extremely low levels. Dominant cis-acting mutations (TRK2D), selected by their ability to confer TRK2-dependent growth on low-potassium medium, identified an upstream repressor element (URS1-TRK2) in the TRK2 promoter. The URS1-TRK2 sequence (5'-AGCCGCACG-3') shares six nucleotides with the ubiquitous URS1 element (5'-AGCCGCCGA-3'), and the protein species binding URS1-CAR1 (URSF) is capable of binding URS1-TRK2 in vitro. Sequence analysis of 17 independent repression-defective TRK2D mutations identified three adjacent nucleotides essential for URS1-mediated repression in vivo. Our results suggest a role for context effects with regard to URS1-related sequences: several mutant alleles of the URS1 element previously reported to have little or no effect when analyzed within the context of a heterologous promoter (CYC1) [Luche, R.M., Sumrada, R. & Cooper, T.G. (1990) Mol. Cell. Biol. 10, 3884-3895] have major effects on repression in the context of their native promoters (TRK2 and CAR1). TRK2D mutations that abolish repression also reveal upstream activating sequence activity either within or adjacent to URS1. Additivity between TRK2D and sin3 delta mutations suggest that SIN3-mediated repression is independent of that mediated by URS1. Images Fig. 1 Fig. 4 PMID:7892273

  13. Complete nucleotide sequence of the Actinomyces viscosus T14V sialidase gene: presence of a conserved repeating sequence among strains of Actinomyces spp.

    PubMed Central

    Yeung, M K

    1993-01-01

    The nucleotide sequence of the Actinomyces viscosus T14V sialidase gene (nanH) and flanking regions was determined. An open reading frame of 2,703 nucleotides that encodes a predominately hydrophobic protein of 901 amino acids (M(r), 92,871) was identified. The amino acid sequence at the amino terminus of the predicted protein exhibited properties characteristic of a typical leader peptide. Five 12-amino-acid units that shared between 33 and 67% sequence identity were noted within the central domain of the protein. Each unit contained the sequence Ser-X-Asp-X-Gly-X-Thr-Trp, which is conserved among other bacterial and trypanosoma sp. sialidases. Thus, the A. viscosus T14V nanH gene and the other prokaryotic and eukaryotic sialidase genes evolved from a common ancestor. Southern hybridization analyses under conditions of high stringency revealed the existence of DNA sequences homologous to A. viscosus T14V nanH in the genomes of 18 strains of five Actinomyces species that expressed various levels of sialidase activity. The data demonstrate that the sialidase genes from divergent groups of Actinomyces spp. are highly conserved. Images PMID:8418033

  14. An Interpretation of the Ancestral Codon from Miller’s Amino Acids and Nucleotide Correlations in Modern Coding Sequences

    PubMed Central

    Carels, Nicolas; de Leon, Miguel Ponce

    2015-01-01

    Purine bias, which is usually referred to as an “ancestral codon”, is known to result in short-range correlations between nucleotides in coding sequences, and it is common in all species. We demonstrate that RWY is a more appropriate pattern than the classical RNY, and purine bias (Rrr) is the product of a network of nucleotide compensations induced by functional constraints on the physicochemical properties of proteins. Through deductions from universal correlation properties, we also demonstrate that amino acids from Miller’s spark discharge experiment are compatible with functional primeval proteins at the dawn of living cell radiation on earth. These amino acids match the hydropathy and secondary structures of modern proteins. PMID:25922573

  15. Nucleotide sequence of the Klebsiella pneumoniae nifD gene and predicted amino acid sequence of the alpha-subunit of nitrogenase MoFe protein.

    PubMed Central

    Ioannidis, I; Buck, M

    1987-01-01

    The nucleotide sequence of the Klebsiella pneumoniae nifD gene is presented and together with the accompanying paper [Holland, Zilberstein, Zamir & Sussman (1987) Biochem. J. 247, 277-285] completes the sequence of the nifHDK genes encoding the nitrogenase polypeptides. The K. pneumoniae nifD gene encodes the 483-amino acid-residue nitrogenase alpha-subunit polypeptide of Mr 54156. The alpha-subunit has five strongly conserved cysteine residues at positions 63, 89, 155, 184 and 275, some occurring in a region showing both primary sequence and potential structural homology to the K. pneumoniae nitrogenase beta-subunit. A comparison with six other alpha-subunit amino acid sequences has been made, which indicates a number of potentially important domains within alpha-subunits. PMID:3322262

  16. The primary structure of E. coli RNA polymerase, Nucleotide sequence of the rpoC gene and amino acid sequence of the beta'-subunit.

    PubMed Central

    Ovchinnikov YuA; Monastyrskaya, G S; Gubanov, V V; Guryev, S O; Salomatina, I S; Shuvaeva, T M; Lipkin, V M; Sverdlov, E D

    1982-01-01

    The primary structure of the E. coli rpoC gene (5321 base pairs) coding the beta'-subunit of RNA polymerase as well as its adjacent segment have been determined. The structure analysis of the peptides obtained by cleavage of the protein with cyanogen bromide and trypsin has confirmed the amino acid sequence of the beta'-subunit deduced from the nucleotide sequence analysis. The beta'-subunit of E. coli RNA polymerase contains 1407 amino acid residues. Its translation is initiated by codon GUG and terminated by codon TAA. It has been detected that the sequence following the terminating codon is strikingly homologous to known sequences of rho-independent terminators. PMID:6287430

  17. [Comparative analysis of nucleotide sequences of lactate dehydrogenase (LDH) gene and LDH epitopes of Plasmodium vivax and Plasmodium falciparum].

    PubMed

    Jiang, Li; Wang, Zhen-yu; Ma, Xiao-jiang; Zhang, Xiao-ping; Cai, Li

    2010-04-01

    To analyze the difference of nucleotide sequences of lactate dehydrogenase (LDH) gene and LDH epitopes of Plasmodium vivax and P. falciparum. Specific primers were designed to amplify the full-length LDH gene sequence of P. vivax and P. falciparum (GenBank accession number: DQ198262 and DQ060151 respectively). The PCR products were sequenced and compared. The epitopes of objective LDH antigens were predicted by SYFPEITHI software. Pv-LDH and Pf-LDH genes were cloned into prokaryotic plasmid pET28a, then expressed in E. coli BL21(DE3) with isopropyl beta-D-1-thiogalactopyranoside (IPTG) induction. The immunogenicity of the recombinants Pv-LDH and Pf-LDH was analyzed by Western blotting and neutralization ELISA assays. Pf-LDH gene was same to reference sequences(DQ198262), while there is a single nucleotide difference at the position 666 between Pv-LDH gene and reference sequences (DQ060151). The coding region of the two genes contained 951 bp encoding a 316-amino-acid residue. Compared with Pf-LDH, Pv-LDH showed a nucleotide sequence identity of 75.1%, and an amino acid sequence identity of 90.2%. T cell epitope prediction indicated that there were 28 human leukocyte antigen (HLA) types which could recognize pLDH antigen epitopes. The common or similar epitopes accounted for about 75% of the predicted 180 epitopes. The number of specific epitopes of Pv-LDH and Pf-LDH proteins was 38 and 45, respectively. Western blotting analysis showed that the Pv-LDH recombinant antigen reacted with the sera of malaria patients, and the reactivity was much lower than that of sera of immunized rabbit. Neutralization ELISA showed that about 70.3% reactivity of Pv-LDH polyclonal antibodies could be suppressed by Pv-LDH, while only 30.5% by Pf-LDH. There are differences in DNA sequences of LDH gene and LDH epitopes between P. vivax and P. falciparum. The antibodies induced by the specific epitopes account for a small proportion in the antibody repertoire.

  18. Partition enrichment of nucleotide sequences (PINS)--a generally applicable, sequence based method for enrichment of complex DNA samples.

    PubMed

    Kvist, Thomas; Sondt-Marcussen, Line; Mikkelsen, Marie Just

    2014-01-01

    The dwindling cost of DNA sequencing is driving transformative changes in various biological disciplines including medicine, thus resulting in an increased need for routine sequencing. Preparation of samples suitable for sequencing is the starting point of any practical application, but enrichment of the target sequence over background DNA is often laborious and of limited sensitivity thereby limiting the usefulness of sequencing. The present paper describes a new method, Probability directed Isolation of Nucleic acid Sequences (PINS), for enrichment of DNA, enabling the sequencing of a large DNA region surrounding a small known sequence. A 275,000 fold enrichment of a target DNA sample containing integrated human papilloma virus is demonstrated. Specifically, a sample containing 0.0028 copies of target sequence per ng of total DNA was enriched to 786 copies per ng. The starting concentration of 0.0028 target copies per ng corresponds to one copy of target in a background of 100,000 complete human genomes. The enriched sample was subsequently amplified using rapid genome walking and the resulting DNA sequence revealed not only the sequence of a the truncated virus, but also 1026 base pairs 5' and 50 base pairs 3' to the integration site in chromosome 8. The demonstrated enrichment method is extremely sensitive and selective and requires only minimal knowledge of the sequence to be enriched and will therefore enable sequencing where the target concentration relative to background is too low to allow the use of other sample preparation methods or where significant parts of the target sequence is unknown.

  19. Determination of the minimal essential nucleotide sequence for diphtheria tox repressor binding by in vitro affinity selection.

    PubMed

    Tao, X; Murphy, J R

    1994-09-27

    The expression of diphtheria toxin in lysogenic toxigenic strains of Corynebacterium diphtheriae is controlled by the heavy metal ion-activated regulatory protein DtxR. In the presence of divalent heavy metal ions, DtxR specifically binds to the diphtheria tox operator and protects a 27-bp interrupted palindromic sequence from DNase I digestion. To determine the consensus DNA sequence for DtxR binding, we have used gel electrophoresis mobility-shift assay and polymerase chain reaction (PCR) amplification for in vitro affinity selection of DNA binding sequences from a universe of 6.9 x 10(10) variants. After 10 rounds of in vitro affinity selection, each round coupled with 30 cycles of PCR amplification, we isolated and characterized a family of DNA sequences that function as DtxR-responsive genetic elements both in vitro and in vivo. Moreover, these DNA sequences were found to bind activated DtxR with an affinity similar to that of the wild-type tox operator. The DNA sequence analysis of 21 unique in vitro affinity-selected binding sites has revealed the minimal essential nucleotide sequence for DtxR binding to be a 9-bp palindrome separated by a single base pair.

  20. Nucleotide sequence polymorphism at the apical membrane antigen-1 locus reveals population history of Plasmodium vivax in Thailand

    PubMed Central

    Putaporntip, Chaturong; Jongwutiwes, Somchai; Grynberg, Priscila; Cui, Liwang; Hughes, Austin L.

    2009-01-01

    Apical membrane antigen-1 is a candidate for inclusion in a vaccine for the human malaria parasite Plasmodium vivax. We collected 231 complete sequences of the gene encoding this antigen (pvama-1) from three regions of Thailand, the most extensive collection to date of sequences at this locus. The domain II loop (previously mentioned as a potential vaccine component) was almost completely conserved, with a single amino acid variant (I313R) observed in a single sequence. The 3′ portion of the gene (domain II through the stop codon) showed significantly lower nucleotide diversity than the 5′ portion (start codon through domain I); and a given domain I sequence might be found in a haplotype with more than one domain II sequence. These results imply a hotspot of recombination between domains I and II. We found significant geographic subdivision among the three regions of Thailand (NW, East, and South) in which collections were made in 2007. Numbers of P. vivax infections have experienced overall declines since 1990 in all three regions; but the decline has been most recent in the NW, and there has been a rebound in numbers of infections in the South since 2000. Consistent with population history, amino acid sequence diversity was greatest in the NW. The South, which had by far the lowest sequence diversity of the three regions, showed signs of a population that has expanded from a small number of founders after a bottleneck. PMID:19643205

  1. Differentiation of Erysipelothrix rhusiopathiae strains by nucleotide sequence analysis of a hypervariable region in the spaA gene: discrimination of a live vaccine strain from field isolates.

    PubMed

    Nagai, Shinya; To, Ho; Kanda, Akira

    2008-05-01

    Erysipelothrix rhusiopathiae causes erysipelas in swine and is considered a reemerging disease contributing substantially to economic losses in the swine industry. Since an attenuated live vaccine was commercialized in 1974 in Japan, outbreaks of acute septicemia or subacute urticaria of erysipelas have decreased dramatically. In contrast, a chronic form of erysipelas found during meat inspections in slaughterhouses has been increasing. In this study, a new strain-typing method was developed based on nucleotide sequencing of a hypervariable region in the surface protective antigen (spaA) gene for discrimination of the live vaccine strain from field isolates. Sixteen strains isolated from arthritic lesions found in slaughtered pigs were segregated into 4 major patterns: 1) identical nucleotide sequence with the vaccine strain: 3 isolates; 2) 1 nucleotide substitution (C to A) at position 555: 5 isolates; 3) 1 nucleotide substitution at various positions: 5 isolates; and 4) 2 nucleotide substitutions: 3 isolates. Isolates with the same nucleotide sequence as the vaccine strain were further characterized by other properties, including the mouse pathogenicity test. One strain isolated from pigs on a farm where the live vaccine had been used was found to be closely related to the vaccine strain. The phylogenetic tree constructed based on the spaA sequence suggests that the evolutionary distance of the isolates is related to the pathogenicity in mice. The new strain-typing system based on nucleotide sequencing of the spaA region is useful to discriminate the vaccine strain from field isolates.

  2. Nucleotide sequence of Zygosaccharomyces bailii virus Z: Evidence for +1 programmed ribosomal frameshifting and for assignment to family Amalgaviridae.

    PubMed

    Depierreux, Delphine; Vong, Minh; Nibert, Max L

    2016-06-02

    Zygosaccharomyces bailii virus Z (ZbV-Z) is a monosegmented dsRNA virus that infects the yeast Zygosaccharomyces bailii and remains unclassified to date despite its discovery >20years ago. The previously reported nucleotide sequence of ZbV-Z (GenBank AF224490) encompasses two nonoverlapping long ORFs: upstream ORF1 encoding the putative coat protein and downstream ORF2 encoding the RNA-dependent RNA polymerase (RdRp). The lack of overlap between these ORFs raises the question of how the downstream ORF is translated. After examining the previous sequence of ZbV-Z, we predicted that it contains at least one sequencing error to explain the nonoverlapping ORFs, and hence we redetermined the nucleotide sequence of ZbV-Z, derived from the same isolate of Z. bailii as previously studied, to address this prediction. The key finding from our new sequence, which includes several insertions, deletions, and substitutions relative to the previous one, is that ORF2 in fact overlaps ORF1 in the +1 frame. Moreover, a proposed sequence motif for +1 programmed ribosomal frameshifting, previously noted in influenza A viruses, plant amalgaviruses, and others, is also present in the newly identified ORF1-ORF2 overlap region of ZbV-Z. Phylogenetic analyses provided evidence that ZbV-Z represents a distinct taxon most closely related to plant amalgaviruses (genus Amalgavirus, family Amalgaviridae). We conclude that ZbV-Z is the prototype of a new species, which we propose to assign as type species of a new genus of monosegmented dsRNA mycoviruses in family Amalgaviridae. Comparisons involving other unclassified mycoviruses with RdRps apparently related to those of plant amalgaviruses, and having either mono- or bisegmented dsRNA genomes, are also discussed.

  3. Nucleotide sequence relationship between intracisternal type A particles of Mus musculus and an endogenous retrovirus (M432) of Mus cervicolor.

    PubMed

    Kuff, E L; Lueders, K K; Scolnick, E M

    1978-10-01

    Intracisternal type A particles are retrovirus-like structures found in embryonic cells and many tumors of Mus musculus but having no clear relationship with other retroviruses of this mouse species. We have observed a partial nucleotide sequence homology between the high-molecular-weight (32S and 35S) RNA components of intracisternal A-particles from a neuroblastoma cell line and the 70S RNA fraction from M432, a type of retrovirus endogenous to the Asian mouse Mus cervicolor. M432 complementary DNA (cDNA) was hybridized to the extent of 30% by the A-particle RNAs. The hybrids showed a lower thermal stability (DeltaT(m), 7 degrees C) than those formed with homologous RNA. The reaction was commensurate with that found between M432 cDNA and divergent sequences in the M. musculus genome. The capacity to hybridize M432 cDNA was closely correlated with the concentration of A-particle sequences in the cytoplasmic RNA of several M. musculus cell types. The major RNA fraction of M432 virus showed a reciprocal partial reaction with the A-particle cDNA's; the virus, which was grown in NIH/3T3 (M. musculus) cells, also contained a small proportion of apparently authentic A-particle nucleotide sequences. A subset of A-particle sequences seemed to be almost totally lacking in the main M432 RNA. The A-particle cDNA's hybridized extensively with divergent sequences in M. cervicolor cellular DNA, indicating that this mouse species may contain not only the partially homologous M432 virogene, but also a more complete genetic equivalent of the intracisternal A-particle.

  4. A new class of cleavable fluorescent nucleotides: synthesis and optimization as reversible terminators for DNA sequencing by synthesis†

    PubMed Central

    Turcatti, Gerardo; Romieu, Anthony; Fedurco, Milan; Tairi, Ana-Paula

    2008-01-01

    Fluorescent 2′-deoxynucleotides containing a protecting group at the 3′-O-position are reversible terminators enabling array-based DNA sequencing by synthesis (SBS) approaches. Herein, we describe the synthesis of a new family of 3′-OH unprotected cleavable fluorescent 2′-deoxynucleotides and their evaluation as reversible terminators for high-throughput DNA SBS strategies. In this first version, all four modified nucleotides bearing a cleavable disulfide Alexa Fluor® 594 dye were assayed for their ability to act as a reversible stop for the incorporation of the next labeled base. Their use in SBS leaded to a signal–no signal output after successive addition of each labeled nucleotide during the sequencing process (binary read-out). Solid-phase immobilized synthetic DNA target sequences were used to optimize the method that has been applied to DNA polymerized colonies or clusters obtained by in situ solid-phase amplification of fragments of genomic DNA templates. PMID:18263613

  5. Genome-wide profiling of RNA polymerase transcription at nucleotide resolution in human cells with native elongating transcript sequencing

    PubMed Central

    Mayer, Andreas; Churchman, L. Stirling

    2017-01-01

    Many features of gene transcription in human cells remain unclear, mainly due to a lack of quantitative approaches to follow genome transcription with nucleotide precision in vivo. Here we present a robust genome-wide approach to study RNA polymerase (Pol) II-mediated transcription in human cells at single-nucleotide resolution by native elongating transcript sequencing (NET-seq). Elongating RNA polymerase and the associated nascent RNA is prepared by cell fractionation, avoiding immunoprecipitation or RNA labeling. The 3′-ends of nascent RNAs are captured through barcode linker ligation and converted into a DNA sequencing library. The identity and abundance of the 3′-ends are determined by high-throughput sequencing, revealing the exact genomic locations of Pol II. Human NET-seq can be applied to study the full spectrum of Pol II transcriptional activities, including the production of unstable RNAs and transcriptional pausing. Using the protocol described here, a NET-seq library can be obtained from human cells in 5 days. PMID:27010758

  6. Nucleotide sequences and operon structure of plasmid-borne genes mediating uptake and utilization of raffinose in Escherichia coli.

    PubMed Central

    Aslanidis, C; Schmid, K; Schmitt, R

    1989-01-01

    The plasmid-borne raf operon encodes functions required for inducible uptake and utilization of raffinose by Escherichia coli. Raf functions include active transport (Raf permease), alpha-galactosidase, and sucrose hydrolase, which are negatively controlled by the Raf repressor. We have defined the order and extent of the three structural genes, rafA, rafB, and rafD; these are contained in a 5,284-base-pair nucleotide sequence. By comparisons of derived primary structures with known subunit molecular weights and an N-terminal peptide sequence, rafA was assigned to alpha-galactosidase (708 amino acids), rafB was assigned to Raf permease (425 amino acids), and rafD was assigned to sucrose hydrolase (476 amino acids). Transcription was shown to initiate 13 nucleotides upstream of rafA; a putative promoter, a ribosome-binding site, and a transcription termination signal were identified. Striking similarities between Raf permease and lacY-encoded lactose permease, revealed by high sequence conservation (76%), overlapping substrate specificities, and similar transport kinetics, suggest a common origin of these transport systems. alpha-Galactosidase and sucrose hydrolase are not related to host enzymes but have their counterparts in other species. We propose a modular origin of the raf operon and discuss selective forces that favored the given gene organization also found in the E. coli lac operon. Images PMID:2556373

  7. Nucleotide sequence and infectious cDNA clone of the L1 isolate of Pea seed-borne mosaic potyvirus.

    PubMed

    Olsen, B S; Johansen, I E

    2001-01-01

    The complete nucleotide sequence of Pea seed-borne mosaic potyvirus isolate L1 has been determined from cloned virus cDNA. The PSbMV L1 genome is 9895 nucleotides in length excluding the poly(A) tail. Computer analysis of the sequence revealed a single long open reading frame (ORF) of 9594 nucleotides. The ORF potentially encodes a polyprotein of 3198 amino acids with a deduced Mr of 363537. Nine putative proteolytic cleavage sites were identified by analogy to consensus sequences and genome arrangement in other potyviruses. Two full-length cDNA clones, p35S-L1-4 and p35S-L1-5, were assembled under control of an enhanced 35S promoter and nopaline synthase terminator. Clone p35S-L1-4 was constructed with four introns and p35S-L1-5 with five introns inserted in the cDNA. Clone p35S-L1-4 was unstable in Escherichia coli often resulting in amplification of plasmids with deletions. Clone p35S-L1-5 was stable and apparently less toxic to Escherichia coli resulting in larger bacterial colonies and higher plasmid yield. Both clones were infectious upon mechanical inoculation of plasmid DNA on susceptible pea cultivars Fjord, Scout, and Brutus. Eight pea genotypes resistant to L1 virus were also resistant to the cDNA derived L1 virus. Both native PSbMV L1 and the cDNA derived virus infected Chenopodium quinoa systemically giving rise to characteristic necrotic lesions on uninoculated leaves.

  8. Nucleotide sequence analysis of pRS2 and pRS3, two small cryptic plasmids from Oenococcus oeni.

    PubMed

    Mesas, J M; Rodríguez, M C; Alegre, M T

    2001-09-01

    Nucleotide sequence analysis of two cryptic plasmids, pRS2 (2544 bp) and pRS3 (3948 bp), from Oenococcus oeni revealed the presence in both of three major open reading frames with significant similarity to other small cryptic plasmids from O. oeni. The results suggest that those plasmids could be separated into two subfamilies, one represented by pLo13 and pRS3, the other represented by pOg32, pRS1, and pRS2.

  9. Complete nucleotide sequence of cfr-carrying IncX4 plasmid pSD11 from Escherichia coli.

    PubMed

    Sun, Jian; Deng, Hui; Li, Liang; Chen, Mu-Ya; Fang, Liang-Xing; Yang, Qiu-E; Liu, Ya-Hong; Liao, Xiao-Ping

    2015-01-01

    We report the complete nucleotide sequence of a plasmid carrying the multiresistance gene cfr. This plasmid was isolated from an Escherichia coli strain of swine origin in 2011. This 37,672-bp plasmid, pSD11, had an IncX4 backbone similar to those of the IncX4 plasmids obtained from the United States and Australia, in which the cfr gene was flanked by two copies of IS26 and a truncated Tn1331 was inserted. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  10. DNA sequencing by a single molecule detection of labeled nucleotides sequentially cleaved from a single strand of DNA

    SciTech Connect

    Goodwin, P.M.; Schecker, J.A.; Wilkerson, C.W.; Hammond, M.L.; Ambrose, W.P.; Jett, J.H.; Martin, J.C.; Marrone, B.L.; Keller, R.A. ); Haces, A.; Shih, P.J.; Harding, J.D. )

    1993-01-01

    We are developing a laser-based technique for the rapid sequencing of large DNA fragments (several kb in size) at a rate of 100 to 1000 bases per second. Our approach relies on fluorescent labeling of the bases in a single fragment of DNA, attachment of this labeled DNA fragment to a support, movement of the supported DNA into a flowing sample stream, sequential cleavage of the end nucleotide from the DNA fragment with an exonuclease, and detection of the individual fluorescently labeled bases by laser-induced fluorescence.

  11. DNA sequencing by a single molecule detection of labeled nucleotides sequentially cleaved from a single strand of DNA

    SciTech Connect

    Goodwin, P.M.; Schecker, J.A.; Wilkerson, C.W.; Hammond, M.L.; Ambrose, W.P.; Jett, J.H.; Martin, J.C.; Marrone, B.L.; Keller, R.A.; Haces, A.; Shih, P.J.; Harding, J.D.

    1993-02-01

    We are developing a laser-based technique for the rapid sequencing of large DNA fragments (several kb in size) at a rate of 100 to 1000 bases per second. Our approach relies on fluorescent labeling of the bases in a single fragment of DNA, attachment of this labeled DNA fragment to a support, movement of the supported DNA into a flowing sample stream, sequential cleavage of the end nucleotide from the DNA fragment with an exonuclease, and detection of the individual fluorescently labeled bases by laser-induced fluorescence.

  12. Nucleotide sequence of polypyrimidines from cloned mouse DNA as determined by base-specific blockage of exonuclease action

    SciTech Connect

    Deugau, K.V.; Mitchel, R.E.J.; Birnboim, H.C.

    1983-01-01

    Cloned fragments of mouse DNA have been screened for the presence of long polypyrimidine/polypurine segments. The polypyrimidine portion of one such segment (about 2000 nucleotides in length) has been isolated by acidic depurination of the entire cloned fragment and plasmid vector followed by selective precipitation and 5'-/sup 32/P labeling. This polypyrimidine has been used to demonstrate a new procedure for sequencing. Covalent modification of thymine with a water-soluble carbodiimide, or cytosine with glutaric anhydride, at low levels blocked in the action of snake venom exonuclease. After deblocking, separation of the products of digestion by polyacrylamide gel electrophoresis yields a sequence ladder which can be used to determine the position of C and T residues as in other sequencing methods. A sequence of 72 residues adjacent to the 5' end had been established, consisting principally of the repeating tetranucleotide (CCTT)n. A low ratio of endonuclease to exonuclease is essential for application of this method to sequences of this size. Accordingly, a very sensitive modification of a fluorometric endonuclease assay was developed and used to optimize pH and Mg/sup 2 +/ conditions to favor exonuclease activity over the accompanying endonuclease activity. The results clearly indicate that long polypyrimidine tracts can be efficiently prepared and their sequences determined with this method using commercially available exonuclease preparations without additional purification. 26 references, 5 figures.

  13. Nucleotide sequences of the Pseudomonas savastanoi indoleacetic acid genes show homology with Agrobacterium tumefaciens T-DNA

    PubMed Central

    Yamada, Tetsuji; Palm, Curtis J.; Brooks, Bob; Kosuge, Tsune

    1985-01-01

    We report the nucleotide sequences of iaaM and iaaH, the genetic determinants for, respectively, tryptophan 2-monooxygenase and indoleacetamide hydrolase, the enzymes that catalyze the conversion of L-tryptophan to indoleacetic acid in the tumor-forming bacterium Pseudomonas syringae pv. savastanoi. The sequence analysis indicates that the iaaM locus contains an open reading frame encoding 557 amino acids that would comprise a protein with a molecular weight of 61,783; the iaaH locus contains an open reading frame of 455 amino acids that would comprise a protein with a molecular weight of 48,515. Significant amino acid sequence homology was found between the predicted sequence of the tryptophan monooxygenase of P. savastanoi and the deduced product of the T-DNA tms-1 gene of the octopine-type plasmid pTiA6NC from Agrobacterium tumefaciens. Strong homology was found in the 25 amino acid sequence in the putative FAD-binding region of tryptophan monooxygenase. Homology was also found in the amino acid sequences representing the central regions of the putative products of iaaH and tms-2 T-DNA. The results suggest a strong similarity in the pathways for indoleacetic acid synthesis encoded by genes in P. savastanoi and in A. tumefaciens T-DNA. Images PMID:16593610

  14. PerPlot & PerScan: tools for analysis of DNA curvature-related periodicity in genomic nucleotide sequences

    PubMed Central

    2011-01-01

    Background Periodic spacing of short adenine or thymine runs phased with DNA helical period of ~10.5 bp is associated with intrinsic DNA curvature and deformability, which play important roles in DNA-protein interactions and in the organization of chromosomes in both eukaryotes and prokaryotes. Local differences in DNA sequence periodicity have been linked to differences in gene expression in some organisms. Despite the significance of these periodic patterns, there are virtually no publicly accessible tools for their analysis. Results We present novel tools suitable for assessments of DNA curvature-related sequence periodicity in nucleotide sequences at the genome scale. Utility of the present software is demonstrated on a comparison of sequence periodicities in the genomes of Haemophilus influenzae, Methanocaldococcus jannaschii, Saccharomyces cerevisiae, and Arabidopsis thaliana. The software can be accessed through a web interface and the programs are also available for download. Conclusions The present software is suitable for comparing DNA curvature-related sequence periodicity among different genomes as well as for analysis of intrachromosomal heterogeneity of the sequence periodicity. It provides a quick and convenient way to detect anomalous regions of chromosomes that could have unusual structural and functional properties and/or distinct evolutionary history. PMID:22587738

  15. The complete nucleotide sequence of the mitochondrial DNA of the agnathan Lampetra fluviatilis: bearings on the phylogeny of cyclostomes.

    PubMed

    Delarbre, C; Escriva, H; Gallut, C; Barriel, V; Kourilsky, P; Janvier, P; Laudet, V; Gachelin, G

    2000-04-01

    There are two competing theories about the interrelationships of craniates: the cyclostome theory assumes that lampreys and hagfishes are a clade, the cyclostomes, whose sister group is the jawed vertebrates (gnathostomes); the vertebrate theory assumes that lampreys and gnathostomes are a clade, the vertebrates, whose sister group is hagfishes. The vertebrate theory is best supported by a number of unique anatomical and physiological characters. Molecular sequence data from 18S and 28S rRNA genes rather support the cyclostome theory, but mtDNA sequence of Myxine glutinosa rather supports the vertebrate theory. Additional molecular data are thus needed to elucidate this three-taxon problem. We determined the complete nucleotide sequence of the mtDNA of the lamprey Lampetra fluviatilis. The mtDNA of L. fluviatilis possesses the same genomic organization as Petromyzon marinus, which validates this gene order as a synapomorphy of lampreys. The mtDNA sequence of L. fluviatilis was used in combination with relevant mtDNA sequences for an approach to the hagfish/lamprey relationships using the maximum-parsimony, neighbor-joining, and maximum-likelihood methods. Although trees compatible with our present knowledge of the phylogeny of craniates can be reconstructed by using the three methods, the data collected do not support the vertebrate or the cyclostome hypothesis. The present data set does not allow the resolution of this three-taxon problem, and new kinds of data, such as nuclear DNA sequences, need to be collected.

  16. The nucleotide sequence and genomic organization of Citrus leaf blotch virus: candidate type species for a new virus genus.

    PubMed

    Vives, M C; Galipienso, L; Navarro, L; Moreno, P; Guerri, J

    2001-08-15

    The complete nucleotide sequence of Citrus leaf blotch virus (CLBV) was determined. CLBV genomic RNA (gRNA) has 8747 nt, excluding the 3'-terminal poly(A) tail, and contains three open reading frames (ORFs) and untranslated regions (UTR) of 73 and 541 nucleotides at the 5' and 3' termini, respectively. ORF1 potentially encodes a 227.4-kDa polypeptide, which has methyltransferase, papain-like protease, helicase, and RNA-dependent RNA polymerase motifs. ORF2 encodes a 40.2-kDa polypeptide containing a motif characteristic of cell-to-cell movement proteins. The 40.7-kDa polypeptide encoded by ORF3 was identified as the coat protein. The genome organization of CLBV resembles that of viruses in the genus Trichovirus, but they differ in various aspects: (i) in trichoviruses ORF2 overlaps ORFs 1 and 3, whereas in CLBV, ORFs 2 and 3 are separated and ORFs 1 and 2 overlap in one nucleotide; (ii) CLBV gRNA and CP are larger than those of trichoviruses; and (iii) the CLBV 3' UTR is larger than that of trichoviruses. Phylogenetic comparisons based on CP amino acid signatures clearly separates CLBV from trichoviruses. Also contrasting with trichoviruses, CLBV could not be transmitted to Chenopodium quinoa Willd. Considering these singularities, we propose that CLBV should be included in a new virus genus. Copyright 2001 Academic Press.

  17. Molecular cloning, nucleotide sequence, and expression in Escherichia coli of a hemolytic toxin (aerolysin) gene from Aeromonas trota

    SciTech Connect

    Khan, A.A.; Kim, E.; Cerniglia, C.E.

    1998-07-01

    Aeromonas trota AK2, which was derived from ATCC 49659 and produces the extracellular pore-forming hemolytic toxin aerolysin, was mutagenized with the transposon mini-Tn5Km1 to generate a hemolysin-deficient mutant, designated strain AK253. Southern blotting data indicated that an 8.7-kb NotI fragment of the genomic DNA of strain AK253 contained the kanamycin resistance gene of mini-Tn5Km1. The 8.7-kb NotI DNA fragment was cloned into the vector pGEM5Zf({minus}) by selecting for kanamycin resistance, and the resultant clone, pAK71, showed aerolysin activity in Escherichia coli JM109. The nucleotide sequence of the aerA gene, located on the 1.8-kb ApaI-EcoRI fragment, was determined to consist of 1,479 bp and to have an ATG initiation codon and a TAA termination codon. An in vitro coupled transcription-translation analysis of the 1.8-kb region suggested that the aerA gene codes for a 54-kDa protein, in agreement with nucleotide sequence data. The deduced amino acid sequence of the aerA gene product of A. trota exhibited 99% homology with the amino acid sequence of the aerA product of Aeromonas sobria AB3 and 57% homology with the amino acid sequences of the products of the aerA genes of Aeromonas salmonicida 17-2 and A. sobria 33.

  18. Nucleotide sequence of the FNR-regulated fumarase gene (fumB) of Escherichia coli K-12.

    PubMed Central

    Bell, P J; Andrews, S C; Sivak, M N; Guest, J R

    1989-01-01

    The nucleotide sequence of a 3,162-base-pair (bp) segment of DNA containing the FNR-regulated fumB gene, which encodes the anaerobic class I fumarase (FUMB) of Escherichia coli, was determined. The structural gene was found to comprise 1,641 bp, 547 codons (excluding the initiation and termination codons), and the gene product had a predicted Mr of 59,956. The amino acid sequence of FUMB contained the same number of residues as did that of the aerobic class I fumarase (FUMA), and there were identical amino acids at all but 56 positions (89.8% identity). There was no significant similarity between the class I fumarases and the class II enzyme (FUMC) except in one region containing the following consensus: Gly-Ser-Xxx-Ile-Met-Xxx-Xxx-Lys-Xxx-Asn. Some of the 56 amino acid substitutions must be responsible for the functional preferences of the enzymes for malate dehydration (FUMB) and fumarate hydration (FUMA). Significant similarities between the cysteine-containing sequence of the class I fumarases (FUMA and FUMB) and the mammalian aconitases were detected, and this finding further supports the view that these enzymes are all members of a family of iron-containing hydrolyases. The nucleotide sequence of a 1,142-bp distal sequence of an unidentified gene (genF) located upstream of fumB was also defined and found to encode a product that is homologous to the product of another unidentified gene (genA), located downstream of the neighboring aspartase gene (aspA). PMID:2656658

  19. Rapid DNA Sequencing by Direct Nanoscale Reading of Nucleotide Bases on Individual DNA Chains

    SciTech Connect

    Lee, James Weifu; Meller, Amit

    2007-01-01

    Since the independent invention of DNA sequencing by Sanger and by Gilbert 30 years ago, it has grown from a small scale technique capable of reading several kilobase-pair of sequence per day into today's multibillion dollar industry. This growth has spurred the development of new sequencing technologies that do not involve either electrophoresis or Sanger sequencing chemistries. Sequencing by Synthesis (SBS) involves multiple parallel micro-sequencing addition events occurring on a surface, where data from each round is detected by imaging. New High Throughput Technologies for DNA Sequencing and Genomics is the second volume in the Perspectives in Bioanalysis series, which looks at the electroanalytical chemistry of nucleic acids and proteins, development of electrochemical sensors and their application in biomedicine and in the new fields of genomics and proteomics. The authors have expertly formatted the information for a wide variety of readers, including new developments that will inspire students and young scientists to create new tools for science and medicine in the 21st century. Reviews of complementary developments in Sanger and SBS sequencing chemistries, capillary electrophoresis and microdevice integration, MS sequencing and applications set the framework for the book.

  20. Molecular evolution of a family of resistance gene analogs of nucleotide-binding site sequences in Solanum lycopersicum.

    PubMed

    Liao, Pei-Chun; Lin, Kuan-Hung; Ko, Chin-Ling; Hwang, Shih-Ying

    2011-10-01

    Nucleotide-binding site-leucine-rich repeats (NBS-LRR) gene families are one of the major plant resistance genes. Genomic NBS evolution was studied in many plant species for diverse arrays of NBS gene families. In this study, we focused on one family of NBS sequences in an attempt to understand how closely related NBS sequences evolved in the light of selection in domesticated plant species. A phylogenetic analysis revealed five major clades (A-E) and five subclades (A1-A5) within clade A of cloned NBS sequences. Positive selection was only detected in newly evolved NBS lineages in subclades of clade A. Positively selected codon sites were found among NBS sequences of clade A. A sliding-window analysis revealed that regions with Ka/Ks ratios of >1 were in the inter-motifs when paired clades were compared, but regions with Ka/Ks ratios of >1 were found across NBS sequences when subclades of clade A were compared. Our results based on a family of closely related NBS sequences showed that positive selection was first exerted on specific lineages across all NBS sequences after selective constraints. Subsequently, sequences with mutations in commonly conserved motifs were scrutinized by purifying selection. In the long term, conserved high frequency alleles in commonly conserved motifs and changes in inter-motifs were maintained in the investigated family of NBS sequences. Moreover, codons identified to be under positive selection in the inter-motifs were mainly located in regions involved in functions of ATP binding or hydrolysis.

  1. Complete nucleotide sequence of a gene encoding a functional human class I histocompatibility antigen (HLA-CW3).

    PubMed Central

    Sodoyer, R; Damotte, M; Delovitch, T L; Trucy, J; Jordan, B R; Strachan, T

    1984-01-01

    The HLA-CW3 gene contained in a cosmid clone identified by transfection expression experiments has been completely sequenced. This provides, for the first time, data on the structure of HLA-C locus products and constitutes, together with that of the gene coding for HLA-A3, the first complete nucleotide sequences of genes coding for serologically defined class I HLA molecules. In contrast to the organisation of the two class I HLA pseudogenes whose sequences have previously been determined, the sequence of the HLA-CW3 gene reveals an additional cytoplasmic encoding domain, making the organisation of this gene very similar to that of known H-2 class I genes and also the HLA-A3 gene. The deduced amino acid sequences of HLA-CW3 and HLA-A3 now allow a systematic comparison of such sequences of HLA class I molecules from the three classical transplantation antigen loci A, B, C. The compared sequences include the previously determined partial amino acid sequences of HLA-B7, HLA-B40, HLA-A2 and HLA-A28. The comparisons confirm the extreme polymorphism of HLA classical class I molecules, and permit a study of the level of diversity and the location of sequence differences. The distribution of differences is not uniform, most of them being located in the first and second extracellular domains, the third extracellular domain is extremely conserved, and the cytoplasmic domain is also a variable region. Although it is difficult to determine locus-specific regions, we have identified several candidate positions which may be C locus-specific. PMID:6609813

  2. The nucleotide sequences of 5S rRNAs from two Annelida species, Perinereis brevicirris and Sabellastarte japonica, and an Echiura species, Urechis unicinctus.

    PubMed Central

    Kumazaki, T; Hori, H; Osawa, S

    1983-01-01

    The nucleotide sequences of 5S rRNAs from two Annelida species, Perinereis brevicirris and Sabellastarte japonica, and an Echiura species, Urechis unicinctus have been determined. Their sequences are all 120 nucleotides long. The sequence similarity percents are 88% (Perinereis/Sabellastarte), 90% (Sabellastarte/Urechis) and 92% (Perinereis/Urechis), indicating that the Echiura is indistinguishable from the Annelida by their 5S rRNAs. The 5S rRNA sequences from the Annelida/Echiura are most related to those from the Nemertinea (87%), the Mollusca (87%) and the Rotifera (88%). PMID:6856459

  3. Complete nucleotide sequences of two NDM-1-encoding plasmids from the same sequence type 11 Klebsiella pneumoniae strain.

    PubMed

    Studentova, V; Dobiasova, H; Hedlova, D; Dolejska, M; Papagiannitsis, C C; Hrabak, J

    2015-02-01

    The sequence type 11 Klebsiella pneumoniae strain Kpn-3002cz was confirmed to harbor two NDM-1-encoding plasmids, pB-3002cz and pS-3002cz. pB-3002cz (97,649 bp) displayed extensive sequence similarity with the blaNDM-1-carrying plasmid pKPX-1. pS-3002cz (73,581 bp) was found to consist of an IncR-related sequence (13,535 bp) and a mosaic region (60,046 bp). A 40,233-bp sequence of pS-3002cz was identical to the mosaic region of pB-3002cz, indicating the en bloc acquisition of the NDM-1-encoding region from one plasmid by the other.

  4. The complete nucleotide sequence and genomic characterization of tropical soda apple mosaic virus

    USDA-ARS?s Scientific Manuscript database

    Tropical soda apple mosaic virus (TSAMV) was first identified in tropical soda apple (Solanum viarum), a noxious weed, in Florida in 2002. This report provides the first full genome sequence of TSAMV. The full genome sequence of this virus will enable research scientists to develop additional spec...

  5. Nucleotide sequence of satellite DNA contained in the eliminated genome of Ascaris lumbricoides.

    PubMed Central

    Müller, F; Walker, P; Aeby, P; Neuhaus, H; Felder, H; Back, E; Tobler, H

    1982-01-01

    Several restriction endonuclease fragments isolated from highly repetitive satellite DNA of the chromatin eliminating nematode Ascaris lumbricoides var. suum have been cloned. Each type of restriction fragment corresponds to a different variant of the same related ancestral sequence. These variants differ by small deletions, insertions and single base substitutions. Restriction and DBM blot analyses show that members of the same variant class are tandemly linked and therefore are physically separated from other variant classes. A comparison of all the determined sequences establishes a 121 bp long and AT rich consensus sequence. There is evidence for an internal short range periodicity of 11 bp length, indicating that the Ascaris satellite initially may have evolved from an ancestral undecamer sequence. The satellite DNA sequences are mostly but not entirely eliminated from the presumptive somatic cells during chromatin diminution. We have no evidence for transcriptional activity of satellite DNA at any stage or tissue analyzed. Images PMID:6296780

  6. tuple_plot: fast pairwise nucleotide sequence comparison with noise suppression.

    PubMed

    Szafranski, Karol; Jahn, Niels; Platzer, Matthias

    2006-08-01

    The program tuple_plot identifies and visualizes local similarities between two genomic sequences, typically 100 kb or longer, by applying the well-known dotplot principle. A dictionary of sequence words built from the input sequences serves to construct a task-specific expectancy model that is used to attribute significance values to pairwise word hits. The dictionary-based approach allows fast computation, the computation time scaling to O(N log N), depending on the size of the input sequences. The proposed scoring scheme appreciably increases the signal-to-noise ratio and may help to improve other word-based sequence comparison approaches. tuple_plot is available at http://genome.fli-leibniz.de/software.html and may be used under GNU public license.

  7. [Classification of nucleotide sequences over their frequency dictionaries reveals a relation between the structure of sequences and taxonomy of their bearers].

    PubMed

    Gorban', A N; Popova, T G; Sadovskiĭ, M G

    2003-01-01

    Classification of 16S RNA sequences over their frequency dictionaries, both real ones, and transformed ones was studied. Two entities were considered to be close each other from the point of view of their structure, if their frequency dictionaries were close, in Eucledian metric. A transformation procedure of a frequency dictionary has been implemented that reveals the peculiarities of information structure of a nucleotide sequence. A comparative study of two classification developed over the real frequency dictionary vs. that one developed over the transformed frequency dictionary was carried out. The strong correlation is revealed between the classification and the taxonomy of 16S RNA bearer. For the classes isolated, the information valuable words were identified. These words are the main factors of a difference between the classes. The frequency dictionaries containing the words of the length 3 exhibit the best correlation between a class and a genus. A genus, as a rule, is included into the same class, and the exclusion are sporadic. A development of hierarchy classification over the transformed frequency dictionaries separated one or two taxonomy groups, as each stage of classification. The unexpectedly frequent, or contrary, unexpectedly rare occurred of words (of the length 3) in entities under consideration make the structure difference between the classes of the nucleotide sequences.

  8. Bayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences

    PubMed Central

    Siebert, Matthias; Söding, Johannes

    2016-01-01

    Position weight matrices (PWMs) are the standard model for DNA and RNA regulatory motifs. In PWMs nucleotide probabilities are independent of nucleotides at other positions. Models that account for dependencies need many parameters and are prone to overfitting. We have developed a Bayesian approach for motif discovery using Markov models in which conditional probabilities of order k − 1 act as priors for those of order k. This Bayesian Markov model (BaMM) training automatically adapts model complexity to the amount of available data. We also derive an EM algorithm for de-novo discovery of enriched motifs. For transcription factor binding, BaMMs achieve significantly (P    =  1/16) higher cross-validated partial AUC than PWMs in 97% of 446 ChIP-seq ENCODE datasets and improve performance by 36% on average. BaMMs also learn complex multipartite motifs, improving predictions of transcription start sites, polyadenylation sites, bacterial pause sites, and RNA binding sites by 26–101%. BaMMs never performed worse than PWMs. These robust improvements argue in favour of generally replacing PWMs by BaMMs. PMID:27288444

  9. The nucleotide sequence of the putative transcription initiation site of a cloned ribosomal RNA gene of the mouse.

    PubMed Central

    Urano, Y; Kominami, R; Mishima, Y; Muramatsu, M

    1980-01-01

    Approximately one kilobase pairs surrounding and upstream the transcription initiation site of a cloned ribosomal DNA (rDNA) of the mouse were sequenced. The putative transcription initiation site was determined by two independent methods: one nuclease S1 protection and the other reverse transcriptase elongation mapping using isolated 45S ribosomal RNA precursor (45S RNA) and appropriate restriction fragments of rDNA. Both methods gave an identical result; 45S RNA had a structure starting from ACTCTTAG---. Characteristically, mouse rDNA had many T clusters (greater than or equal to 5) upstream the initiation site, the longest being 21 consecutive T's. A pentadecanucleotide, TGCCTCCCGAGTGCA, appeared twice within 260 nucleotides upstream the putative initiation site. No such characteristic sequences were found downstream this site. Little similarity was found in the upstream of the transcription initiation site between the mouse, Xenopus laevis and Saccharomyces cerevisiae rDNA. Images PMID:6162156

  10. A simple sequence repeat- and single-nucleotide polymorphism-based genetic linkage map of the brown planthopper, Nilaparvata lugens.

    PubMed

    Jairin, Jirapong; Kobayashi, Tetsuya; Yamagata, Yoshiyuki; Sanada-Morimura, Sachiyo; Mori, Kazuki; Tashiro, Kosuke; Kuhara, Satoru; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Yamamoto, Kimiko; Matsumura, Masaya; Yasui, Hideshi

    2013-02-01

    In this study, we developed the first genetic linkage map for the major rice insect pest, the brown planthopper (BPH, Nilaparvata lugens). The linkage map was constructed by integrating linkage data from two backcross populations derived from three inbred BPH strains. The consensus map consists of 474 simple sequence repeats, 43 single-nucleotide polymorphisms, and 1 sequence-tagged site, for a total of 518 markers at 472 unique positions in 17 linkage groups. The linkage groups cover 1093.9 cM, with an average distance of 2.3 cM between loci. The average number of marker loci per linkage group was 27.8. The sex-linkage group was identified by exploiting X-linked and Y-specific markers. Our linkage map and the newly developed markers used to create it constitute an essential resource and a useful framework for future genetic analyses in BPH.

  11. Complete nucleotide sequence of the Hsd plasmid pECO29 and identification of its functional regions.

    PubMed

    Zakharova, M V; Pertzev, A V; Kravetz, A N; Beletskaya, I V; Shlyapnikov, M G; Solonin, A S

    1998-06-16

    The complete nucleotide sequence of the Hsd plasmid pECO29 has been determined. The plasmid DNA consists of 3895 base pairs. These include 4 genes and 5 sites. Two genes encoding the proteins (restriction endonuclease and DNA methyltransferase) have been fully characterized. The pECO29 comprises a Co1El-type replication system coding for untranslated genes RNAI and RNAII, the emr recombination site containing palindromic sequences and involved in stable maintenance of the plasmid, two pseudo oriT sites homologous to the oriT site of R64 and F plasmids, as well as the bom locus of a Co1El-like plasmid. There are no genes involved in the mobilization of pECO29 plasmid.

  12. SNP@Domain: a web resource of single nucleotide polymorphisms (SNPs) within protein domain structures and sequences

    PubMed Central

    Han, Areum; Kang, Hyo Jin; Cho, Yoobok; Lee, Sunghoon; Kim, Young Joo; Gong, Sungsam

    2006-01-01

    The single nucleotide polymorphisms (SNPs) in conserved protein regions have been thought to be strong candidates that alter protein functions. Thus, we have developed SNP@Domain, a web resource, to identify SNPs within human protein domains. We annotated SNPs from dbSNP with protein structure-based as well as sequence-based domains: (i) structure-based using SCOP and (ii) sequence-based using Pfam to avoid conflicts from two domain assignment methodologies. Users can investigate SNPs within protein domains with 2D and 3D maps. We expect this visual annotation of SNPs within protein domains will help scientists select and interpret SNPs associated with diseases. A web interface for the SNP@Domain is freely available at and from . PMID:16845090

  13. Development of single-nucleotide polymorphism markers for Bromus tectorum (Poaceae) from a partially sequenced transcriptome1

    PubMed Central

    Merrill, Keith R.; Coleman, Craig E.; Meyer, Susan E.; Leger, Elizabeth A.; Collins, Katherine A.

    2016-01-01

    Premise of the study: Bromus tectorum (Poaceae) is an annual grass species that is invasive in many areas of the world but most especially in the U.S. Intermountain West. Single-nucleotide polymorphism (SNP) markers were developed for use in investigating the geospatial and ecological diversity of B. tectorum in the Intermountain West to better understand the mechanisms behind its successful invasion. Methods and Results: Normalized cDNA libraries from six diverse B. tectorum individuals were pooled and sequenced using 454 sequencing. Ninety-five SNP assays were developed for use on 96.96 arrays with the Fluidigm EP1 genotyping platform. Verification of the 95 SNPs by genotyping 251 individuals from 12 populations is reported, along with amplification data from four related Bromus species. Conclusions: These SNP markers are polymorphic across populations of B. tectorum, are optimized for high-throughput applications, and may be applicable to other, related Bromus species. PMID:27843723

  14. Nucleotide sequence of the pnd gene in plasmid R483 and role of the pnd gene product in plasmolysis.

    PubMed

    Ono, K; Akimoto, S; Ohnishi, Y

    1987-01-01

    The pnd gene of R plasmid R483, like the srnB gene of the F plasmid, increases the degradation of stable RNA in Escherichia coli. The nucleotide sequence of the pnd locus was determined and compared with that of the srnB locus. The genes have open reading frames that are 54% homologous, and both have an upstream inverted repeat sequence. The pnd gene expression seems to decrease the osmotic barrier of the cytoplasmic membrane, since no plasmolytic vacuoles were formed in the cells carrying the gene when the cells were exposed to hypertonic sucrose solution. This result suggests that RNase I in the periplasm passes through the altered membrane to degrade stable RNA in the cytoplasm.

  15. Mitochondrial DNA in the sea urchin Arbacia lixula: nucleotide sequence differences between two polymorphic molecules indicate asymmetry of mutations.

    PubMed

    De Giorgi, C; De Luca, F; Saccone, C

    1991-07-22

    Two polymorphic forms of mitochondrial DNA (mtDNA) extracted from Arbacia lixula eggs were cloned and the nucleotide sequences of specific regions determined. A comparison of the sequences of the sense strand of the two molecules demonstrates that all the differences are transitions and only of the A----G type. A change such as G----A (or A----G) on the sense mtDNA strand results from either a direct G----A (or A----G) mutation on that strand or a C----T (or T----C) on the complementary strand. None of the C----T (or T----C) changes were detected on the sense strand, which implies that the A----G mutation bias on the sense strand is not reversed for the other strand. Our observation indicates the existence of mechanisms acting asymmetrically on the two mtDNA strands, possibly during mtDNA replication.

  16. Complete nucleotide sequence and taxonomy of Sugarcane streak mosaic virus, member of a novel genus in the family Potyviridae.

    PubMed

    Xu, D-L; Zhou, G-H; Xie, Y-J; Mock, R; Li, R

    2010-06-01

    The complete genomic sequence of a Pakistani isolate of Sugarcane streak mosaic virus (SCSMV-PAK) is determined to be 9782 nucleotides in length, excluding the 3' poly(A) tail, and it comprises a large open reading frame encoding a polyprotein of 3130 amino acid residues. The deduced polyprotein is likely to be cleaved at nine putative protease sites by three viral proteases to ten mature proteins. Conserved motifs of orthologous proteins of other potyviruses are identified in corresponding positions of SCSMV-PAK. The genomic organization is virtually identical to the genera Ipomovirus, Potyvirus, Rymovirus, and Tritimovirus in the family Potyviridae. Sequence analyses indicate that the SCSMV-PAK genomic sequence is different from those of Sugarcane mosaic virus and Sorghum mosaic virus, two viruses with very similar symptoms and host range to SCSMV-PAK. SCSMV-PAK shares 52.7% identity with Triticum mosaic virus (TriMV) and 26.4-31.5% identities with species of the existing genera and unassigned viruses in the Potyviridae at the polyprotein sequence level. Phylogenetic analyses of the polyprotein and deduced mature protein amino acid sequences reveal that SCSMV, together with TriMV, forms a distinct group in the family at the genus level. Therefore, SCSMV should represent a new genus, Susmovirus, in the Potyviridae.

  17. Efficient detection of chromosome imbalances and exome single nucleotide variants using targeted sequencing in the clinical setting.

    PubMed

    Villela, Darine; da Costa, Silvia Souza; Vianna-Morgante, Angela M; Krepischi, Ana C V; Rosenberg, Carla

    2017-09-04

    We evaluated an approach to detect copy number variants (CNVs) and single nucleotide changes (SNVs), using a clinically focused exome panel complemented with a backbone and SNP probes that allows for genome-wide copy number changes and copy-neutral absence of heterozygosity (AOH) calls; this approach potentially substitutes the use of chromosomal microarray testing and sequencing into a single test. A panel of 16 DNA samples with known alterations ranging from megabase-scale CNVs to single base modifications were used as positive controls for sequencing data analysis. The DNA panel included CNVs (n = 13) of variable sizes (23 Kb to 27 Mb), uniparental disomy (UPD; n = 1), and single point mutations (n = 2). All DNA sequence changes were identified by the current platform, showing that CNVs of at least 23 Kb can be properly detected. The estimated size of genomic imbalances detected by microarrays and next generation sequencing are virtually the same, indicating that the resolution and sensitivity of this approach are at least similar to those provided by DNA microarrays. Accordingly, our data show that the combination of a sequencing platform comprising focused exome and whole genome backbone, with appropriate algorithms, enables a cost-effective and efficient solution for the simultaneous detection of CNVs and SNVs. Copyright © 2017. Published by Elsevier Masson SAS.

  18. Prioritization Of Nonsynonymous Single Nucleotide Variants For Exome Sequencing Studies Via Integrative Learning On Multiple Genomic Data

    PubMed Central

    Wu, Mengmeng; Wu, Jiaxin; Chen, Ting; Jiang, Rui

    2015-01-01

    The rapid advancement of next generation sequencing technology has greatly accelerated the progress for understanding human inherited diseases via such innovations as exome sequencing. Nevertheless, the identification of causative variants from sequencing data remains a great challenge. Traditional statistical genetics approaches such as linkage analysis and association studies have limited power in analyzing exome sequencing data, while relying on simply filtration strategies and predicted functional implications of mutations to pinpoint pathogenic variants are prone to produce false positives. To overcome these limitations, we herein propose a supervised learning approach, termed snvForest, to prioritize candidate nonsynonymous single nucleotide variants for a specific type of disease by integrating 11 functional scores at the variant level and 8 association scores at the gene level. We conduct a series of large-scale in silico validation experiments, demonstrating the effectiveness of snvForest across 2,511 diseases of different inheritance styles and the superiority of our approach over two state-of-the-art methods. We further apply snvForest to three real exome sequencing data sets of epileptic encephalophathies and intellectual disability to show the ability of our approach to identify causative de novo mutations for these complex diseases. The online service and standalone software of snvForest are found at http://bioinfo.au.tsinghua.edu.cn/jianglab/snvforest. PMID:26459872

  19. Prioritization Of Nonsynonymous Single Nucleotide Variants For Exome Sequencing Studies Via Integrative Learning On Multiple Genomic Data.

    PubMed

    Wu, Mengmeng; Wu, Jiaxin; Chen, Ting; Jiang, Rui

    2015-10-13

    The rapid advancement of next generation sequencing technology has greatly accelerated the progress for understanding human inherited diseases via such innovations as exome sequencing. Nevertheless, the identification of causative variants from sequencing data remains a great challenge. Traditional statistical genetics approaches such as linkage analysis and association studies have limited power in analyzing exome sequencing data, while relying on simply filtration strategies and predicted functional implications of mutations to pinpoint pathogenic variants are prone to produce false positives. To overcome these limitations, we herein propose a supervised learning approach, termed snvForest, to prioritize candidate nonsynonymous single nucleotide variants for a specific type of disease by integrating 11 functional scores at the variant level and 8 association scores at the gene level. We conduct a series of large-scale in silico validation experiments, demonstrating the effectiveness of snvForest across 2,511 diseases of different inheritance styles and the superiority of our approach over two state-of-the-art methods. We further apply snvForest to three real exome sequencing data sets of epileptic encephalophathies and intellectual disability to show the ability of our approach to identify causative de novo mutations for these complex diseases. The online service and standalone software of snvForest are found at http://bioinfo.au.tsinghua.edu.cn/jianglab/snvforest.

  20. Complete nucleotide sequence of Tulip virus X (TVX-J): the border between species and strains within the genus Potexvirus.

    PubMed

    Yamaji, Y; Kagiwada, S; Nakabayashi, H; Ugaki, M; Namba, S

    2001-12-01

    The complete nucleotide sequence of the genomic RNA of Tulip virus X Japanese isolate (TVX-J) has been determined. The sequence is 6056 nucleotides in length, excluding the poly(A) tail at the 3' terminus, and contains five open reading frames (ORFs) coding for proteins of Mr 153, 25, 12, 10, and 22 kDa (ORFs 1 through 5, respectively). The genome organization of TVX-J is similar to that of potexviruses, and the encoded proteins share a high degree of homology to the corresponding proteins of other potexviruses. Phylogenetic analyses based on the RNA-dependent RNA polymerase (RdRp) protein (the methyltransferase, helicase, and polymerase domains) encoded by ORF1 and the capsid protein (CP) encoded by ORF5, revealed a close relationship of TVX-J to Plantago asiatica mosaic virus (PlAMV). Pairwise comparison analyses revealed that the relationship between TVX and PlAMV is intermediate between that of strains and species, though previously they have not been considered related. Due to the relatively distant relationships of their replication apparatus and triple gene blocks, we conclude that TVX and PlAMV should be classified as distinct viruses. In addition, the borderline between species and strains of potexviruses is discussed.

  1. The genomic nucleotide sequences of two differentially expressed actin-coding genes from the sea star Pisaster ochraceus.

    PubMed

    Kowbel, D J; Smith, M J

    1989-04-30

    The genomic sequences of two differentially expressed actin genes from the sea star Pisaster ochraceus are reported. The cytoplasmic actin gene (Cy) is expressed in eggs and early development. The muscle actin gene (M) is expressed in tube feet and testes. Both genes contain an 1125-nucleotide coding region interrupted by three introns at codons 41, 121 and 204. Gene M contains two additional introns at codons 150 and 267. The intron position at codon 150, although present in higher vertebrate actins, has not been reported in actin genes from invertebrates. The M gene coding region has 89.5% nucleotide homology to the Cy gene, and differs from the Cy actin gene in 13 of 375 amino acids (aa), 11 of which are found in the C-terminal half of the gene. The C-terminal half of the M gene contains a significant number of muscle isotype codons. Even though there is only 1 aa change in the first 150 codons, there have been limited substitutions at many four-fold degenerate sites which may indicate selection pressure upon the secondary structure of the mRNA and/or a biased codon usage. Variant CCAAT, TATA, and poly(A)-addition signals have been identified in the 5' and 3' flanking regions. The presence of 5' and 3' splice junction sequences in the 5' flanking region of the Cy gene suggests the potential for an intron there.

  2. Image Encryption Algorithm Based on Hyperchaotic Maps and Nucleotide Sequences Database

    PubMed Central

    2017-01-01

    Image encryption technology is one of the main means to ensure the safety of image information. Using the characteristics of chaos, such as randomness, regularity, ergodicity, and initial value sensitiveness, combined with the unique space conformation of DNA molecules and their unique information storage and processing ability, an efficient method for image encryption based on the chaos theory and a DNA sequence database is proposed. In this paper, digital image encryption employs a process of transforming the image pixel gray value by using chaotic sequence scrambling image pixel location and establishing superchaotic mapping, which maps quaternary sequences and DNA sequences, and by combining with the logic of the transformation between DNA sequences. The bases are replaced under the displaced rules by using DNA coding in a certain number of iterations that are based on the enhanced quaternary hyperchaotic sequence; the sequence is generated by Chen chaos. The cipher feedback mode and chaos iteration are employed in the encryption process to enhance the confusion and diffusion properties of the algorithm. Theoretical analysis and experimental results show that the proposed scheme not only demonstrates excellent encryption but also effectively resists chosen-plaintext attack, statistical attack, and differential attack. PMID:28392799

  3. Nucleotide sequence of the gene encoding the nitrogenase iron protein of Thiobacillus ferrooxidans

    SciTech Connect

    Pretorius, I.M.; Rawlings, D.E.; O'Neill, E.G.; Jones, W.A.; Kirby, R.; Woods, D.R.

    1987-01-01

    The DNA sequence was determined for the cloned Thiobacillus ferrooxidans nifH and part of the nifD genes. The DNA chains were radiolabeled with (..cap alpha..-/sup 32/P)dCTP (3000 Ci/mmol) or (..cap alpha..-/sup 35/S)dCTP (400 Ci/mmol). A putative T. ferrooxidans nifH promoter was identified whose sequences showed perfect consensus with those of the Klebsiella pneumoniae nif promoter. Two putative consensus upstream activator sequences were also identified. The amino acid sequence was deduced from the DNA sequence. In a comparison of nifH DNA sequences from T. ferrooxidans and eight other nitrogen-fixing microbes, a Rhizobium sp. isolated from Parasponia andersonii showed the greatest homology (74%) and Clostridium pasteurianum (nifH1) showed the least homology (54%). In the comparison of the amino acid sequences of the Fe proteins, the Rhizobium sp. and Rhizobium japonicum showed the greatest homology (both 86%) and C. pasteurianum (nifH1 gene product) demonstrated the least homology (56%) to the T. ferrooxidans Fe protein.

  4. Image Encryption Algorithm Based on Hyperchaotic Maps and Nucleotide Sequences Database.

    PubMed

    Niu, Ying; Zhang, Xuncai; Han, Feng

    2017-01-01

    Image encryption technology is one of the main means to ensure the safety of image information. Using the characteristics of chaos, such as randomness, regularity, ergodicity, and initial value sensitiveness, combined with the unique space conformation of DNA molecules and their unique information storage and processing ability, an efficient method for image encryption based on the chaos theory and a DNA sequence database is proposed. In this paper, digital image encryption employs a process of transforming the image pixel gray value by using chaotic sequence scrambling image pixel location and establishing superchaotic mapping, which maps quaternary sequences and DNA sequences, and by combining with the logic of the transformation between DNA sequences. The bases are replaced under the displaced rules by using DNA coding in a certain number of iterations that are based on the enhanced quaternary hyperchaotic sequence; the sequence is generated by Chen chaos. The cipher feedback mode and chaos iteration are employed in the encryption process to enhance the confusion and diffusion properties of the algorithm. Theoretical analysis and experimental results show that the proposed scheme not only demonstrates excellent encryption but also effectively resists chosen-plaintext attack, statistical attack, and differential attack.

  5. The complete nucleotide sequence of white Amur bream (Parabramis pekinensis) mitochondrial genome.

    PubMed

    Zhang, Xiujie; Song, Wen; Wang, Yizhou; Du, Rui; Wang, Weimin

    2014-10-01

    White Amur bream, Parabramis pekinensis (Cypriniformes: Cyprinidae), a freshwater cyprinid fish, is an important economic fish in several countries, especially in China. The complete sequence of P. pekinensis mitochondrial genome has been determined. The genome is 16,622 bp in length, and consists of 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes and the noncoding control region, with the genomic organization being identical to that of typical vertebrates. Three conserved sequence blocks (CSB1 to CSB3) were identified in the control region. The complete mitochondrial genome sequence is useful for phylogenetic analysis and studies of population genetics of P. pekinensis.

  6. The nucleotide sequence of Beneckea harveyi 5S rRNA. [bioluminescent marine bacterium

    NASA Technical Reports Server (NTRS)

    Luehrsen, K. R.; Fox, G. E.

    1981-01-01

    The primary sequence of the 5S ribosomal RNA isolated from the free-living bioluminescent marine bacterium Beneckea harveyi is reported and discussed in regard to indications of phylogenetic relationships with the bacteria Escherichia coli and Photobacterium phosphoreum. Sequences were determined for oligonucleotide products generated by digestion with ribonuclease T1, pancreatic ribonuclease and ribonuclease T2. The presence of heterogeneity is indicated for two sites. The B. harveyi sequence can be arranged into the same four helix secondary structures as E. coli and other prokaryotic 5S rRNAs. Examination of the 5S-RNS sequences of the three bacteria indicates that B. harveyi and P. phosphoreum are specifically related and share a common ancestor which diverged from an ancestor of E. coli at a somewhat earlier time, consistent with previous studies.

  7. The nucleotide sequence of Beneckea harveyi 5S rRNA. [bioluminescent marine bacterium

    NASA Technical Reports Server (NTRS)

    Luehrsen, K. R.; Fox, G. E.

    1981-01-01

    The primary sequence of the 5S ribosomal RNA isolated from the free-living bioluminescent marine bacterium Beneckea harveyi is reported and discussed in regard to indications of phylogenetic relationships with the bacteria Escherichia coli and Photobacterium phosphoreum. Sequences were determined for oligonucleotide products generated by digestion with ribonuclease T1, pancreatic ribonuclease and ribonuclease T2. The presence of heterogeneity is indicated for two sites. The B. harveyi sequence can be arranged into the same four helix secondary structures as E. coli and other prokaryotic 5S rRNAs. Examination of the 5S-RNS sequences of the three bacteria indicates that B. harveyi and P. phosphoreum are specifically related and share a common ancestor which diverged from an ancestor of E. coli at a somewhat earlier time, consistent with previous studies.

  8. Nucleotide sequences of chloroplast 5S ribosomal ribonucleic acid in flowering plants.

    PubMed Central

    Dyer, T A; Bowman, C M

    1979-01-01

    Evidence for the sequence of duckweed (Lemna minor) chloroplast 5S rRNA was derived from the analysis of partial and complete enzymic digests of the 32P-labelled molecule. The possible sequence of the chloroplast 5S rRNA from three other flowering plants was deduced by complete digestion with T1 ribonuclease and comparison of the sequences of the oligonucleotide products with homologous sequences in the duckweed 5S rRNA. This analysis indicates that the chloroplast 5S rNA species differ appreciably from their cytosol counterparts but bear a strong resemblance to one another and to the 5S rRNA species of prokaryotes. Structural features apparently common to all 5S rRNA molecules are also discussed. Images Fig. 2. Fig. 4. PMID:540034

  9. Sequences, annotation and single nucleotide polymorphism of the major histocompatibility complex in the domestic cat.

    PubMed

    Yuhki, Naoya; Mullikin, James C; Beck, Thomas; Stephens, Robert; O'Brien, Stephen J

    2008-07-16

    Two sequences of major histocompatibility complex (MHC) regions in the domestic cat, 2.976 and 0.362 Mbps, which were separated by an ancient chromosome break (55-80 MYA) and followed by a chromosomal inversion were annotated in detail. Gene annotation of this MHC was completed and identified 183 possible coding regions, 147 human homologues, possible functional genes and 36 pseudo/unidentified genes) by GENSCAN and BLASTN, BLASTP RepeatMasker programs. The first region spans 2.976 Mbp sequence, which encodes six classical class II antigens (three DRA and three DRB antigens) lacking the functional DP, DQ regions, nine antigen processing molecules (DOA/DOB, DMA/DMB, TAPASIN, and LMP2/LMP7,TAP1/TAP2), 52 class III genes, nineteen class I genes/gene fragments (FLAI-A to FLAI-S). Three class I genes (FLAI-H, I-K, I-E) may encode functional classical class I antigens based on deduced amino acid sequence and promoter structure. The second region spans 0.362 Mbp sequence encoding no class I genes and 18 cross-species conserved genes, excluding class I, II and their functionally related/associated genes, namely framework genes, including three olfactory receptor genes. One previously identified feline endogenous retrovirus, a baboon retrovirus derived sequence (ECE1) and two new endogenous retrovirus sequences, similar to brown bat endogenous retrovirus (FERVmlu1, FERVmlu2) were found within a 140 Kbp interval in the middle of class I region. MHC SNPs were examined based on comparisons of this BAC sequence and MHC homozygous 1.9x WGS sequences and found that 11,654 SNPs in 2.84 Mbp (0.00411 SNP per bp), which is 2.4 times higher rate than average heterozygous region in the WGS (0.0017 SNP per bp genome), and slightly higher than the SNP rate observed in human MHC (0.00337 SNP per bp).

  10. Nucleotide sequence of cloned cDNA for human pancreatic kallikrein.

    PubMed

    Fukushima, D; Kitamura, N; Nakanishi, S

    1985-12-31

    Cloned cDNA sequences for human pancreatic kallikrein have been isolated and determined by molecular cloning and sequence analysis. The identity between human pancreatic and urinary kallikreins is indicated by the complete coincidence between the amino acid sequence deduced from the cloned cDNA sequence and that reported partially for urinary kallikrein. The active enzyme form of the human pancreatic kallikrein consists of 238 amino acids and is preceded by a signal peptide and a profragment of 24 amino acids. A sequence comparison of this with other mammalian kallikreins indicates that key amino acid residues required for both serine protease activity and kallikrein-like cleavage specificity are retained in the human sequence, and residues corresponding to some external loops of the kallikrein diverge from other kallikreins. Analyses by RNA blot hybridization, primer extension, and S1 nuclease mapping indicate that the pancreatic kallikrein mRNA is also expressed in the kidney and sublingual gland, suggesting the active synthesis of urinary kallikrein in these tissues. Furthermore, the tissue-specific regulation of the expression of the members of the human kallikrein gene family has been discussed.

  11. Four novel cystic fibrosis mutations in splice junction sequences affecting the CFTR nucleotide binding folds

    SciTech Connect

    Doerk, T.; Wulbrand, U.; Tuemmler, B. )

    1993-03-01

    Single cases of the four novel splice site mutations 1525[minus]1 G [r arrow] A (intron 9), 3601[minus]2 A [r arrow] G (intron 18), 3850[minus]3 T [r arrow] G (intron 19), and 4374+1 G [r arrow] T (intron 23) were detected in the CFTR gene of cystic fibrosis patients of Indo-Iranian, Turkish, Polish, and Germany descent. The nucleotide substitutions at the +1, [minus]1, and [minus]2 positions all destroy splice sites and lead to severe disease alleles associated with features typical of gastrointestinal and pulmonary cystic fibrosis disease. The 3850[minus]3 T-to-G change was discovered in a very mildly affected 33-year-old [Delta]F508 compound heterozygote, suggesting that the T-to-G transversion at the less conserved [minus]3 position of the acceptor splice site may retain some wildtype function. 13 refs., 1 fig., 2 tabs.

  12. Novel technologies applied to the nucleotide sequencing and comparative sequence analysis of the genomes of infectious agents in veterinary medicine.

    PubMed

    Granberg, F; Bálint, Á; Belák, S

    2016-04-01

    Next-generation sequencing (NGS), also referred to as deep, high-throughput or massively parallel sequencing, is a powerful new tool that can be used for the complex diagnosis and intensive monitoring of infectious disease in veterinary medicine. NGS technologies are also being increasingly used to study the aetiology, genomics, evolution and epidemiology of infectious disease, as well as host-pathogen interactions and other aspects of infection biology. This review briefly summarises recent progress and achievements in this field by first introducing a range of novel techniques and then presenting examples of NGS applications in veterinary infection biology. Various work steps and processes for sampling and sample preparation, sequence analysis and comparative genomics, and improving the accuracy of genomic prediction are discussed, as are bioinformatics requirements. Examples of sequencing-based applications and comparative genomics in veterinary medicine are then provided. This review is based on novel references selected from the literature and on experiences of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine, Uppsala, Sweden.

  13. Developmentally regulated plant genes: the nucleotide sequence of a wheat gliadin genomic clone.

    PubMed Central

    Rafalski, J A; Scheets, K; Metzler, M; Peterson, D M; Hedgcoth, C; Söll, D G

    1984-01-01

    Gliadins, the major wheat seed storage proteins, are encoded by a multigene family. Northern blot analysis shows that gliadin genes are transcribed in endosperm tissue into two classes of poly(A)+ mRNA, 1400 bases (class I) and 1600 bases (class II) in length. Using poly(A)+ RNA from developing wheat endosperm we constructed a cDNA library from which a number of clones coding for alpha/beta and gamma gliadins were identified by hybrid-selected mRNA translation and DNA sequencing. These cDNA clones were used as probes for the isolation of genomic gliadin clones from a wheat genomic library. One such genomic clone was characterized in detail and its DNA sequence determined. It contains a gene for a 33-kd alpha/beta gliadin protein (a 20 amino acid signal peptide and a 266 amino acid mature protein) which is very rich in glutamine (33.8%) and proline (15.4%). The gene sequence does not contain introns. A typical eukaryotic promoter sequence is present at -104 (relative to the translation initiation codon) and there are two normal polyadenylation signals 77 and 134 bases downstream from the translation termination codon. The coding sequence contains some internal sequence repetition, and is highly homologous to several alpha/beta gliadin cDNA clones. Homology to a gamma-gliadin cDNA clone is low, and there is no homology with known glutenin or zein cDNA sequences. Images Fig. 1. Fig. 2. PMID:6204862

  14. Identification of mitochondrial DNA sequence variation and development of single nucleotide polymorphic markers for CMS-D8 in cotton.

    PubMed

    Suzuki, Hideaki; Yu, Jiwen; Wang, Fei; Zhang, Jinfa

    2013-06-01

    Cytoplasmic male sterility (CMS), which is a maternally inherited trait and controlled by novel chimeric genes in the mitochondrial genome, plays a pivotal role in the production of hybrid seed. In cotton, no PCR-based marker has been developed to discriminate CMS-D8 (from Gossypium trilobum) from its normal Upland cotton (AD1, Gossypium hirsutum) cytoplasm. The objective of the current study was to develop PCR-based single nucleotide polymorphic (SNP) markers from mitochondrial genes for the CMS-D8 cytoplasm. DNA sequence variation in mitochondrial genes involved in the oxidative phosphorylation chain including ATP synthase subunit 1, 4, 6, 8 and 9, and cytochrome c oxidase 1, 2 and 3 subunits were identified by comparing CMS-D8, its isogenic maintainer and restorer lines on the same nuclear genetic background. An allelic specific PCR (AS-PCR) was utilized for SNP typing by incorporating artificial mismatched nucleotides into the third or fourth base from the 3' terminus in both the specific and nonspecific primers. The result indicated that the method modifying allele-specific primers was successful in obtaining eight SNP markers out of eight SNPs using eight primer pairs to discriminate two alleles between AD1 and CMS-D8 cytoplasms. Two of the SNPs for atp1 and cox1 could also be used in combination to discriminate between CMS-D8 and CMS-D2 cytoplasms. Additionally, a PCR-based marker from a nine nucleotide insertion-deletion (InDel) sequence (AATTGTTTT) at the 59-67 bp positions from the start codon of atp6, which is present in the CMS and restorer lines with the D8 cytoplasm but absent in the maintainer line with the AD1 cytoplasm, was also developed. A SNP marker for two nucleotide substitutions (AA in AD1 cytoplasm to CT in CMS-D8 cytoplasm) in the intron (1,506 bp) of cox2 gene was also developed. These PCR-based SNP markers should be useful in discriminating CMS-D8 and AD1 cytoplasms, or those with CMS-D2 cytoplasm as a rapid, simple, inexpensive, and

  15. Biological characterization and complete nucleotide sequence of a Tunisian isolate of Moroccan watermelon mosaic virus.

    PubMed

    Yakoubi, S; Desbiez, C; Fakhfakh, H; Wipf-Scheibel, C; Marrakchi, M; Lecoq, H

    2008-01-01

    During a survey conducted in October 2005, cucurbit leaf samples showing virus-like symptoms were collected from the major cucurbit-growing areas in Tunisia. DAS-ELISA showed the presence of Moroccan watermelon mosaic virus (MWMV, Potyvirus), detected for the first time in Tunisia, in samples from the region of Cap Bon (Northern Tunisia). MWMV isolate TN05-76 (MWMV-Tn) was characterized biologically and its full-length genome sequence was established. MWMV-Tn was found to have biological properties similar to those reported for the MWMV type strain from Morocco. Phylogenetic analysis including the comparison of complete amino-acid sequences of 42 potyviruses confirmed that MWMV-Tn is related (65% amino-acid sequence identity) to Papaya ringspot virus (PRSV) isolates but is a member of a distinct virus species. Sequence analysis on parts of the CP gene of MWMV isolates from different geographical origins revealed some geographic structure of MWMV variability, with three different clusters: one cluster including isolates from the Mediterranean region, a second including isolates from western and central Africa, and a third one including isolates from the southern part of Africa. A significant correlation was observed between geographic and genetic distances between isolates. Isolates from countries in the Mediterranean region where MWMV has recently emerged (France, Spain, Portugal) have highly conserved sequences, suggesting that they may have a common and recent origin. MWMV from Sudan, a highly divergent variant, may be considered an evolutionary intermediate between MWMV and PRSV.

  16. Nucleotide sequence of the rrnG ribosomal RNA promoter region of Escherichia coli.

    PubMed Central

    Shen, W F; Squires, C; Squires, C L

    1982-01-01

    The primary structure of the promoter region for a ribosomal RNA transcription unit (rrnG) of Escherichia coli K12 has been determined. The sequence was obtained from 1 1.5 kbp EcoRI fragment derived from the hybrid plasmid pLC23-30. This fragment contains 455 bp preceding P1 of the rrnG promoter region and 674 bp of the rrnG 16S RNA gene. The sequence before the rrnG promoter region contains an open reading frame (ORF-BG) followed by a possible hairpin structure that resembles other known transcription terminators. The sequence of the rrnG promoter region is similar but not identical to that of rrnA and rrnB. Several minor differences between the sequences of the 16S RNA genes of rrnG and rrnB were also noted. In addition, sequences were found that could generate special structures involving the promoter regions of rrn loci. Such structures are described and their possible involvement in the regulation of ribosomal RNA synthesis is discussed. PMID:6285294

  17. Improved Detection of Rhinoviruses by Nucleic Acid Sequence-Based Amplification after Nucleotide Sequence Determination of the 5′ Noncoding Regions of Additional Rhinovirus Strains

    PubMed Central

    Loens, K.; Ieven, M.; Ursi, D.; de Laat, C.; Sillekens, P.; Oudshoorn, P.; Goossens, H.

    2003-01-01

    The isothermal nucleic acid sequence-based amplification (NASBA) system was applied for the detection of rhinoviruses using primers targeted at the 5′ noncoding region (5′ NCR) of the viral genome. The nucleotide sequence of the 5′ NCRs of 34 rhinovirus isolates was determined to map the most conserved regions and design more appropriate primers and probes. The assay amplified RNA extracted from 30 rhinovirus reference strains and 88 rhinovirus isolates, it did not amplify RNA from 49 enterovirus isolates and other respiratory viruses. The assay allows one to discriminate between group A and B rhinoviruses. Sensitivities for the detection of group B and group A rhinoviruses was 20 and 200 50% tissue culture infective doses, respectively. PMID:12734236

  18. Characterisation of single nucleotide polymorphisms identified in the bovine lactoferrin gene sequences across a range of dairy cow breeds.

    PubMed

    O'Halloran, F; Bahar, B; Buckley, F; O'Sullivan, O; Sweeney, T; Giblin, L

    2009-01-01

    The lactoferrin gene sequences of 70 unrelated dairy cows representing six different dairy breeds were investigated for single nucleotide polymorphisms to establish a baseline of polymorphisms that exist within the Irish bovine population. Twenty-nine polymorphisms were identified within a 2.2kb regulatory region. Nineteen novel polymorphisms were identified and some of these were found within transcription factor binding sites, including GATA-1 and SPI transcription factor sites. Forty-seven polymorphisms were identified within exon sequences with unique polymorphisms that were associated with amino acid substitutions. These included a T/A SNP, identified in a Holstein Friesian animal, which resulted in a valine to aspartic acid substitution (Val89Asp) in the mature lactoferrin protein. Other SNPs of interest were associated with amino acid substitutions in the lactoferricin B peptide sequence and an A/G SNP, identified in a Jersey animal, was associated with a tyrosine to cysteine change (Tyr181Cys). The polymorphisms identified in the promoter region may have implications relating to lactoferrin expression levels in cows and those identified in the coding sequence indicate the existence of protein variants in the Irish bovine population. The data presented in this study emphasises the potential for lactoferrin to serve as a candidate gene to select for mastitis resistance with the aim of improving animal health.

  19. A conserved 11 nucleotide sequence contains an essential promoter element of the maize mitochondrial atp1 gene.

    PubMed Central

    Rapp, W D; Stern, D B

    1992-01-01

    To determine the structure of a functional plant mitochondrial promoter, we have partially purified an RNA polymerase activity that correctly initiates transcription at the maize mitochondrial atp1 promoter in vitro. Using a series of 5' deletion constructs, we found that essential sequences are located within--19 nucleotides (nt) of the transcription initiation site. The region surrounding the initiation site includes conserved sequence motifs previously proposed to be maize mitochondrial promoter elements. Deletion of a conserved 11 nt sequence showed that it is critical for promoter function, but deletion or alteration of conserved upstream G(A/T)3-4 repeats had no effect. When the atp1 11 nt sequence was inserted into different plasmids lacking mitochondrial promoter activity, transcription was only observed for one of these constructs. We infer from these data that the functional promoter extends beyond this motif, most likely in the 5' direction. The maize mitochondrial cox3 and atp6 promoters also direct transcription initiation in this in vitro system, suggesting that it may be widely applicable for studies of mitochondrial transcription in this species. Images PMID:1372246

  20. Molecular cloning and nucleotide sequence of cDNA for human glucose-6-phosphate dehydrogenase variant A(-)

    SciTech Connect

    Hirono, A.; Beutler, E. )

    1988-06-01

    Glucose-6-phosphate dehydrogenase A(-) is a common variant in Blacks that causes sensitivity to drug- and infection-induced hemolytic anemia. A cDNA library was constructed from Epstein-Barr virus-transformed lymphoblastoid cells from a male who was G6PD A(-). One of four cDNA clones isolated contained a sequence not found in the other clones nor in the published cDNA sequence. Consisting of 138 bases and coding 46 amino acids, this segment of cDNA apparently is derived from the alternative splicing involving the 3{prime} end of intron 7. Comparison of the remaining sequences of these clones with the published sequence revealed three nucleotide substitutions: C{sup 33} {yields} G, G{sup 202} {yields} A, and A{sup 376} {yields} G. Each change produces a new restriction site. Genomic DNA from five G6PD A(-) individuals was amplified by the polymerase chain reaction. The findings of the same mutation in G6PD A(-) as is found in G6PD A(+) strongly suggests that the G6PD A(-) mutation arose in an individual with G6PD A(+), adding another mutation that causes the in vivo instability of this enzyme protein.