Sample records for identical nucleotide sequences

  1. Nucleotide sequencing and identification of some wild mushrooms.

    PubMed

    Das, Sudip Kumar; Mandal, Aninda; Datta, Animesh K; Gupta, Sudha; Paul, Rita; Saha, Aditi; Sengupta, Sonali; Dubey, Priyanka Kumari

    2013-01-01

    The rDNA-ITS (Ribosomal DNA Internal Transcribed Spacers) fragment of the genomic DNA of 8 wild edible mushrooms (collected from Eastern Chota Nagpur Plateau of West Bengal, India) was amplified using ITS1 (Internal Transcribed Spacers 1) and ITS2 primers and subjected to nucleotide sequence determination for identification of mushrooms as mentioned. The sequences were aligned using ClustalW software program. The aligned sequences revealed identity (homology percentage from GenBank data base) of Amanita hemibapha [CN (Chota Nagpur) 1, % identity 99 (JX844716.1)], Amanita sp. [CN 2, % identity 98 (JX844763.1)], Astraeus hygrometricus [CN 3, % identity 87 (FJ536664.1)], Termitomyces sp. [CN 4, % identity 90 (JF746992.1)], Termitomyces sp. [CN 5, % identity 99 (GU001667.1)], T. microcarpus [CN 6, % identity 82 (EF421077.1)], Termitomyces sp. [CN 7, % identity 76 (JF746993.1)], and Volvariella volvacea [CN 8, % identity 100 (JN086680.1)]. Although out of 8 mushrooms 4 could be identified up to species level, the nucleotide sequences of the rest may be relevant to further characterization. A phylogenetic tree is constructed using Neighbor-Joining method showing interrelationship between/among the mushrooms. The determined nucleotide sequences of the mushrooms may provide additional information enriching GenBank database aiding to molecular taxonomy and facilitating its domestication and characterization for human benefits.

  2. Complete nucleotide sequence of a monopartite Begomovirus and associated satellites infecting Carica papaya in Nepal.

    PubMed

    Shahid, M S; Yoshida, S; Khatri-Chhetri, G B; Briddon, R W; Natsuaki, K T

    2013-06-01

    Carica papaya (papaya) is a fruit crop that is cultivated mostly in kitchen gardens throughout Nepal. Leaf samples of C. papaya plants with leaf curling, vein darkening, vein thickening, and a reduction in leaf size were collected from a garden in Darai village, Rampur, Nepal in 2010. Full-length clones of a monopartite Begomovirus, a betasatellite and an alphasatellite were isolated. The complete nucleotide sequence of the Begomovirus showed the arrangement of genes typical of Old World begomoviruses with the highest nucleotide sequence identity (>99 %) to an isolate of Ageratum yellow vein virus (AYVV), confirming it as an isolate of AYVV. The complete nucleotide sequence of betasatellite showed greater than 89 % nucleotide sequence identity to an isolate of Tomato leaf curl Java betasatellite originating from Indonesian. The sequence of the alphasatellite displayed 92 % nucleotide sequence identity to Sida yellow vein China alphasatellite. This is the first identification of these components in Nepal and the first time they have been identified in papaya.

  3. OrthoANI: An improved algorithm and software for calculating average nucleotide identity.

    PubMed

    Lee, Imchang; Ouk Kim, Yeong; Park, Sang-Cheol; Chun, Jongsik

    2016-02-01

    Species demarcation in Bacteria and Archaea is mainly based on overall genome relatedness, which serves a framework for modern microbiology. Current practice for obtaining these measures between two strains is shifting from experimentally determined similarity obtained by DNA-DNA hybridization (DDH) to genome-sequence-based similarity. Average nucleotide identity (ANI) is a simple algorithm that mimics DDH. Like DDH, ANI values between two genome sequences may be different from each other when reciprocal calculations are compared. We compared 63 690 pairs of genome sequences and found that the differences in reciprocal ANI values are significantly high, exceeding 1 % in some cases. To resolve this problem of not being symmetrical, a new algorithm, named OrthoANI, was developed to accommodate the concept of orthology for which both genome sequences were fragmented and only orthologous fragment pairs taken into consideration for calculating nucleotide identities. OrthoANI is highly correlated with ANI (using BLASTn) and the former showed approximately 0.1 % higher values than the latter. In conclusion, OrthoANI provides a more robust and faster means of calculating average nucleotide identity for taxonomic purposes. The standalone software tools are freely available at http://www.ezbiocloud.net/sw/oat.

  4. Conserved features of eukaryotic hsp70 genes revealed by comparison with the nucleotide sequence of human hsp70.

    PubMed Central

    Hunt, C; Morimoto, R I

    1985-01-01

    We have determined the nucleotide sequence of the human hsp70 gene and 5' flanking region. The hsp70 gene is transcribed as an uninterrupted primary transcript of 2440 nucleotides composed of a 5' noncoding leader sequence of 212 nucleotides, a 3' noncoding region of 242 nucleotides, and a continuous open reading frame of 1986 nucleotides that encodes a protein with predicted molecular mass of 69,800 daltons. Upstream of the 5' terminus are the canonical TATAAA box, the sequence ATTGG that corresponds in the inverted orientation to the CCAAT motif, and the dyad sequence CTGGAAT/ATTCCCG that shares homology in 12 of 14 positions with the consensus transcription regulatory sequence common to Drosophila heat shock genes. Comparison of the predicted amino acid sequences of human hsp70 with the published sequences of Drosophila hsp70 and Escherichia coli dnaK reveals that human hsp70 is 73% identical to Drosophila hsp70 and 47% identical to E. coli dnaK. Surprisingly, the nucleotide sequences of the human and Drosophila genes are 72% identical and human and E. coli genes are 50% identical, which is more highly conserved than necessary given the degeneracy of the genetic code. The lack of accumulated silent nucleotide substitutions leads us to propose that there may be additional information in the nucleotide sequence of the hsp70 gene or the corresponding mRNA that precludes the maximum divergence allowed in the silent codon positions. PMID:3931075

  5. Complete nucleotide sequence of Alfalfa mosaic virus isolated from alfalfa (Medicago sativa L.) in Argentina.

    PubMed

    Trucco, Verónica; de Breuil, Soledad; Bejerman, Nicolás; Lenardon, Sergio; Giolitti, Fabián

    2014-06-01

    The complete nucleotide sequence of an Alfalfa mosaic virus (AMV) isolate infecting alfalfa (Medicago sativa L.) in Argentina, AMV-Arg, was determined. The virus genome has the typical organization described for AMV, and comprises 3,643, 2,593, and 2,038 nucleotides for RNA1, 2 and 3, respectively. The whole genome sequence and each encoding region were compared with those of other four isolates that have been completely sequenced from China, Italy, Spain and USA. The nucleotide identity percentages ranged from 95.9 to 99.1 % for the three RNAs and from 93.7 to 99 % for the protein 1 (P1), protein 2 (P2), movement protein and coat protein (CP) encoding regions, whereas the amino acid identity percentages of these proteins ranged from 93.4 to 99.5 %, the lowest value corresponding to P2. CP sequences of AMV-Arg were compared with those of other 25 available isolates, and the phylogenetic analysis based on the CP gene was carried out. The highest percentage of nucleotide sequence identity of the CP gene was 98.3 % with a Chinese isolate and 98.6 % at the amino acid level with four isolates, two from Italy, one from Brazil and the remaining one from China. The phylogenetic analysis showed that AMV-Arg is closely related to subgroup I of AMV isolates. To our knowledge, this is the first report of a complete nucleotide sequence of AMV from South America and the first worldwide report of complete nucleotide sequence of AMV isolated from alfalfa as natural host.

  6. Nucleotide sequences of Japanese isolates of citrus vein enation virus.

    PubMed

    Nakazono-Nagaoka, Eiko; Fujikawa, Takashi; Iwanami, Toru

    2017-03-01

    The genomic sequences of five Japanese isolates of citrus vein enation virus (CVEV) isolates that induce vein enation were determined and compared with that of the Spanish isolate VE-1. The nucleotide sequences of all Japanese isolates were 5,983 nt in length. The genomic RNA of Japanese isolates had five potential open reading frames (ORF 0, ORF 1, ORF 2, ORF 3, and ORF 5) in the positive-sense strand. The nucleotide sequence identity among the Japanese isolates and Spanish isolate VE-1 ranged from 98.0% to 99.8%. Comparison of the partial amino acid sequences of ten Japanese isolates and three Spanish isolates suggested that four amino acid residues, at positions of 83, 104, and 113 in ORF 2 and position 41 in ORF 5, might be unique to some Japanese isolates.

  7. A nucleotide sequence comparison of coxsackievirus B4 isolates from aquatic samples and clinical specimens.

    PubMed Central

    Hughes, M. S.; Hoey, E. M.; Coyle, P. V.

    1993-01-01

    Ten coxsackievirus B4 (CVB4) strains isolated from clinical and environmental sources in Northern Ireland in 1985-7, were compared at the nucleotide sequence level. Dideoxynucleotide sequencing of a polymerase chain reaction (PCR) amplified fragment, spanning the VP1/P2A genomic region, classified the isolates into two distinct groups or genotypes as defined by Rico-Hesse and colleagues for poliovirus type 1. Isolates within each group shared approximately 99% sequence identity at the nucleotide level whereas < or = 86% sequence identity was shared between groups. One isolate derived from a clinical specimen in 1987 was grouped with six CVB4 isolates recovered from the aquatic environment in 1986-7. The second group comprised CVB4 isolates from clinical specimens in 1985-6. Both groups were different at the nucleotide level from the prototype strain isolated in 1950. It was concluded that the method could be used to sub-type CVB4 isolates and would be of value in epidemiological studies of CVB4. Predicted amino acid sequences revealed non-conservation of the tyrosine residue at the VP1/P2A cleavage site but were of little value in distinguishing CVB4 variants. PMID:8386098

  8. The nucleotide sequences of 5S rRNAs from a fern Dryopteris acuminata and a horsetail Equisetum arvense.

    PubMed Central

    Hori, H; Osawa, S; Takaiwa, F; Sugiura, M

    1984-01-01

    The nucleotide sequences from two Pteridophyta species, a fern Dryopteris acuminata and a horsetail Equisetum arvense have been determined. These two sequences are more related to those of the Bryophyta species (88% identity on average) than to those of seed plants (84% identity on average). PMID:6538332

  9. Nucleotide sequence and genetic organization of barley stripe mosaic virus RNA gamma.

    PubMed

    Gustafson, G; Hunter, B; Hanau, R; Armour, S L; Jackson, A O

    1987-06-01

    The complete nucleotide sequences of RNA gamma from the Type and ND18 strains of barley stripe mosaic virus (BSMV) have been determined. The sequences are 3164 (Type) and 2791 (ND18) nucleotides in length. Both sequences contain a 5'-noncoding region (87 or 88 nucleotides) which is followed by a long open reading frame (ORF1). A 42-nucleotide intercistronic region separates ORF1 from a second, shorter open reading frame (ORF2) located near the 3'-end of the RNA. There is a high degree of homology between the Type and ND18 strains in the nucleotide sequence of ORF1. However, the Type strain contains a 366 nucleotide direct tandem repeat within ORF1 which is absent in the ND18 strain. Consequently, the predicted translation product of Type RNA gamma ORF1 (mol wt 87,312) is significantly larger than that of ND18 RNA gamma ORF1 (mol wt 74,011). The amino acid sequence of the ORF1 polypeptide contains homologies with putative RNA polymerases from other RNA viruses, suggesting that this protein may function in replication of the BSMV genome. The nucleotide sequence of RNA gamma ORF2 is nearly identical in the Type and ND18 strains. ORF2 codes for a polypeptide with a predicted molecular weight of 17,209 (Type) or 17,074 (ND18) which is known to be translated from a subgenomic (sg) RNA. The initiation point of this sgRNA has been mapped to a location 27 nucleotides upstream of the ORF2 initiation codon in the intercistronic region between ORF1 and ORF2. The sgRNA is not coterminal with the 3'-end of the genomic RNA, but instead contains heterogeneous poly(A) termini up to 150 nucleotides long (J. Stanley, R. Hanau, and A. O. Jackson, 1984, Virology 139, 375-383). In the genomic RNA gamma, ORF2 is followed by a short poly(A) tract and a 238-nucleotide tRNA-like structure.

  10. Complete Nucleotide Sequence of Watermelon Chlorotic Stunt Virus Originating from Oman

    PubMed Central

    Khan, Akhtar J.; Akhtar, Sohail; Briddon, Rob W.; Ammara, Um; Al-Matrooshi, Abdulrahman M.; Mansoor, Shahid

    2012-01-01

    Watermelon chlorotic stunt virus (WmCSV) is a bipartite begomovirus (genus Begomovirus, family Geminiviridae) that causes economic losses to cucurbits, particularly watermelon, across the Middle East and North Africa. Recently squash (Cucurbita moschata) grown in an experimental field in Oman was found to display symptoms such as leaf curling, yellowing and stunting, typical of a begomovirus infection. Sequence analysis of the virus isolated from squash showed 97.6–99.9% nucleotide sequence identity to previously described WmCSV isolates for the DNA A component and 93–98% identity for the DNA B component. Agrobacterium-mediated inoculation to Nicotiana benthamiana resulted in the development of symptoms fifteen days post inoculation. This is the first bipartite begomovirus identified in Oman. Overall the Oman isolate showed the highest levels of sequence identity to a WmCSV isolate originating from Iran, which was confirmed by phylogenetic analysis. This suggests that WmCSV present in Oman has been introduced from Iran. The significance of this finding is discussed. PMID:22852046

  11. Complete nucleotide sequence of watermelon chlorotic stunt virus originating from Oman.

    PubMed

    Khan, Akhtar J; Akhtar, Sohail; Briddon, Rob W; Ammara, Um; Al-Matrooshi, Abdulrahman M; Mansoor, Shahid

    2012-07-01

    Watermelon chlorotic stunt virus (WmCSV) is a bipartite begomovirus (genus Begomovirus, family Geminiviridae) that causes economic losses to cucurbits, particularly watermelon, across the Middle East and North Africa. Recently squash (Cucurbita moschata) grown in an experimental field in Oman was found to display symptoms such as leaf curling, yellowing and stunting, typical of a begomovirus infection. Sequence analysis of the virus isolated from squash showed 97.6-99.9% nucleotide sequence identity to previously described WmCSV isolates for the DNA A component and 93-98% identity for the DNA B component. Agrobacterium-mediated inoculation to Nicotiana benthamiana resulted in the development of symptoms fifteen days post inoculation. This is the first bipartite begomovirus identified in Oman. Overall the Oman isolate showed the highest levels of sequence identity to a WmCSV isolate originating from Iran, which was confirmed by phylogenetic analysis. This suggests that WmCSV present in Oman has been introduced from Iran. The significance of this finding is discussed.

  12. The EMBL nucleotide sequence database

    PubMed Central

    Stoesser, Guenter; Baker, Wendy; van den Broek, Alexandra; Camon, Evelyn; Garcia-Pastor, Maria; Kanz, Carola; Kulikova, Tamara; Lombard, Vincent; Lopez, Rodrigo; Parkinson, Helen; Redaschi, Nicole; Sterk, Peter; Stoehr, Peter; Tuli, Mary Ann

    2001-01-01

    The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. PMID:11125039

  13. Nucleotide sequences encoding a thermostable alkaline protease

    DOEpatents

    Wilson, David B.; Lao, Guifang

    1998-01-01

    Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium.

  14. Terminal Duplex Stability and Nucleotide Identity Differentially Control siRNA Loading and Activity in RNA Interference

    PubMed Central

    Angart, Phillip A.; Carlson, Rebecca J.; Adu-Berchie, Kwasi

    2016-01-01

    Efficient short interfering RNA (siRNA)-mediated gene silencing requires selection of a sequence that is complementary to the intended target and possesses sequence and structural features that encourage favorable functional interactions with the RNA interference (RNAi) pathway proteins. In this study, we investigated how terminal sequence and structural characteristics of siRNAs contribute to siRNA strand loading and silencing activity and how these characteristics ultimately result in a functionally asymmetric duplex in cultured HeLa cells. Our results reiterate that the most important characteristic in determining siRNA activity is the 5′ terminal nucleotide identity. Our findings further suggest that siRNA loading is controlled principally by the hybridization stability of the 5′ terminus (Nucleotides: 1–2) of each siRNA strand, independent of the opposing terminus. Postloading, RNA-induced silencing complex (RISC)–specific activity was found to be improved by lower hybridization stability in the 5′ terminus (Nucleotides: 3–4) of the loaded siRNA strand and greater hybridization stability toward the 3′ terminus (Nucleotides: 17–18). Concomitantly, specific recognition of the 5′ terminal nucleotide sequence by human Argonaute 2 (Ago2) improves RISC half-life. These findings indicate that careful selection of siRNA sequences can maximize both the loading and the specific activity of the intended guide strand. PMID:27399870

  15. Nucleotide sequences encoding a thermostable alkaline protease

    DOEpatents

    Wilson, D.B.; Lao, G.

    1998-01-06

    Nucleotide sequences, derived from a thermophilic actinomycete microorganism, which encode a thermostable alkaline protease are disclosed. Also disclosed are variants of the nucleotide sequences which encode a polypeptide having thermostable alkaline proteolytic activity. Recombinant thermostable alkaline protease or recombinant polypeptide may be obtained by culturing in a medium a host cell genetically engineered to contain and express a nucleotide sequence according to the present invention, and recovering the recombinant thermostable alkaline protease or recombinant polypeptide from the culture medium. 3 figs.

  16. Nucleotide sequence of a resistance breaking mutant of southern bean mosaic virus.

    PubMed

    Lee, L; Anderson, E J

    1998-01-01

    SBMV-S is a resistance-breaking mutant of an Arkansas isolate of the bean strain of southern bean mosaic virus (SBMV-BARK) that is able to move systemically in Phaseolus vulgaris cvs. Pinto and Great Northern, whereas the wild-type SBMV-BARK causes local necrotic lesions and is restricted to the inoculated leaves of these hosts. Sequence analysis of the 4136 nucleotide genomes of SBMV-BARK and SBMV-S revealed seven nucleotide differences, but only four deduced amino acid changes. A single amino acid change occurred in the C-terminal region of the putative RNA-dependent RNA polymerase and three differences were identified in the N-terminal portion of the virus coat protein. SBMV-BARK and SBMV-S were compared with other sobemoviruses and were found to contain a high level of nucleotide sequence identity (91.3%) to SBMV-B. Unlike SBMV-B however, SBMV-BARK and SBMV-S contained four putative overlapping open reading frames, making them more similar in genome organization to the cowpea strain, SBMV-C. The possibility exists that mutations or even errors, that resulted in mis-identification of open reading frames, occurred in previously published information on nucleotide sequence and genomic organization for SBMV-B.

  17. Nucleotide sequence and proposed secondary structure of Columnea latent viroid: a natural mosaic of viroid sequences.

    PubMed Central

    Hammond, R; Smith, D R; Diener, T O

    1989-01-01

    The Columnea latent viroid (CLV) occurs latently in certain Columnea erythrophae plants grown commercially. In potato and tomato, CLV causes potato spindle tuber viroid (PSTV)-like symptoms. Its nucleotide sequence and proposed secondary structure reveal that CLV consists of a single-stranded circular RNA of 370 nucleotides which can assume a rod-like structure with extensive base-pairing characteristic of all known viroids. The electrophoretic mobility of circular CLV under nondenaturing conditions suggests a potential tertiary structure. CLV contains extensive sequence homologies to the PSTV group of viroids but contains a central conserved region identical to that of hop stunt viroid (HSV). CLV also shares some biological properties with each of the two types of viroids. Most probably, CLV is the result of intracellular RNA recombination between an HSV-type and one or more PSTV-type viroids replicating in the same plant. Images PMID:2602114

  18. The complete nucleotide sequence and genome organization of a novel betaflexivirus infecting Citrullus lanatus.

    PubMed

    Xin, Min; Zhang, Peipei; Liu, Wenwen; Ren, Yingdang; Cao, Mengji; Wang, Xifeng

    2017-10-01

    The complete nucleotide sequence of a novel positive single-stranded (+ss) RNA virus, tentatively named watermelon virus A (WVA), was determined using a combination of three methods: RNA sequencing, small RNA sequencing, and Sanger sequencing. The full genome of WVA is comprised of 8,372 nucleotides (nt), excluding the poly (A) tail, and contains four open reading frames (ORFs). The largest ORF, ORF1 encodes a putative replication-associated polyprotein (RP) with three conserved domains. ORF2 and ORF4 encode a movement protein (MP) and coat protein (CP), respectively. The putative product encoded by ORF3, of an estimated molecular mass of 25 kDa, has no significant similarity with other proteins. Identity and phylogenetic analysis indicate that WVA is a new virus, closely related to members of the family Betaflexiviridae. However, the final taxonomic allocation of WVA within the family is yet to be determined.

  19. Nucleotide sequence of a chickpea chlorotic stunt virus relative that infects pea and faba bean in China.

    PubMed

    Zhou, Cui-Ji; Xiang, Hai-Ying; Zhuo, Tao; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui

    2012-07-01

    We determined the genome sequence of a new polerovirus that infects field pea and faba bean in China. Its entire nucleotide sequence (6021 nt) was most closely related (83.3% identity) to that of an Ethiopian isolate of chickpea chlorotic stunt virus (CpCSV-Eth). With the exception of the coat protein (encoded by ORF3), amino acid sequence identities of all gene products of this virus to those of CpCSV-Eth and other poleroviruses were <90%. This suggests that it is a new member of the genus Polerovirus, and the name pea mild chlorosis virus is proposed.

  20. Human ribosomal RNA gene: nucleotide sequence of the transcription initiation region and comparison of three mammalian genes.

    PubMed Central

    Financsek, I; Mizumoto, K; Mishima, Y; Muramatsu, M

    1982-01-01

    The transcription initiation site of the human ribosomal RNA gene (rDNA) was located by using the single-strand specific nuclease protection method and by determining the first nucleotide of the in vitro capped 45S preribosomal RNA. The sequence of 1,211 nucleotides surrounding the initiation site was determined. The sequenced region was found to consist of 75% G and C and to contain a number of short direct and inverted repeats and palindromes. By comparison of the corresponding initiation regions of three mammalian species, several conserved sequences were found upstream and downstream from the transcription starting point. Two short A + T-rich sequences are present on human, mouse, and rat ribosomal RNA genes between the initiation site and 40 nucleotides upstream, and a C + T cluster is located at a position around -60. At and downstream from the initiation site, a common sequence, T-AG-C-T-G-A-C-A-C-G-C-T-G-T-C-C-T-CT-T, was found in the three genes from position -1 through +18. The strong conservation of these sequences suggests their functional significance in rDNA. The S1 nuclease protection experiments with cloned rDNA fragments indicated the presence in human 45S RNA of molecules several hundred nucleotides shorter than the supposed primary transcript. The first 19 nucleotides of these molecules appear identical--except for one mismatch--to the nucleotide sequence of the 5' end of a supposed early processing product of the mouse 45S RNA. Images PMID:6954460

  1. The complete nucleotide sequence of RNA beta from the type strain of barley stripe mosaic virus.

    PubMed Central

    Gustafson, G; Armour, S L

    1986-01-01

    The complete nucleotide sequence of RNA beta from the type strain of barley stripe mosaic virus (BSMV) has been determined. The sequence is 3289 nucleotides in length and contains four open reading frames (ORFs) which code for proteins of Mr 22,147 (ORF1), Mr 58,098 (ORF2), Mr 17,378 (ORF3), and Mr 14,119 (ORF4). The predicted N-terminal amino acid sequence of the polypeptide encoded by the ORF nearest the 5'-end of the RNA (ORF1) is identical (after the initiator methionine) to the published N-terminal amino acid sequence of BSMV coat protein for 29 of the first 30 amino acids. ORF2 occupies the central portion of the coding region of RNA beta and ORF3 is located at the 3'-end. The ORF4 sequence overlaps the 3'-region of ORF2 and the 5'-region of ORF3 and differs in codon usage from the other three RNA beta ORFs. The coding region of RNA beta is followed by a poly(A) tract and a 238 nucleotide tRNA-like structure which are common to all three BSMV genomic RNAs. Images PMID:3754962

  2. DNA Nucleotide Sequence Restricted by the RI Endonuclease

    PubMed Central

    Hedgpeth, Joe; Goodman, Howard M.; Boyer, Herbert W.

    1972-01-01

    The sequence of DNA base pairs adjacent to the phosphodiester bonds cleaved by the RI restriction endonuclease in unmodified DNA from coliphage λ has been determined. The 5′-terminal nucleotide labeled with 32P and oligonucleotides up to the heptamer were analyzed from a pancreatic DNase digest. The following sequence of nucleotides adjacent to the RI break made in λ DNA was deduced from these data and from the 3′-dinucleotide sequence and nearest-neighbor analysis obtained from repair synthesis with the DNA polymerase of Rous sarcoma virus [Formula: see text] The RI endonuclease cleavage of the phosphodiester bonds (indicated by arrows) generates 5′-phosphoryls and short cohesive termini of four nucleotides, pApApTpT. The most striking feature of the sequence is its symmetry. PMID:4343974

  3. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... DEPARTMENT OF COMMERCE Patent and Trademark Office Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request... Patent applications that contain nucleotide and/or amino acid sequence disclosures must include a copy of...

  4. The complete nucleotide sequence of the barley yellow dwarf GPV isolate from China shows that it is a new member of the genus Polerovirus.

    PubMed

    Zhang, Wenwei; Cheng, Zhuomin; Xu, Lei; Wu, Maosen; Waterhouse, Peter; Zhou, Guanghe; Li, Shifang

    2009-01-01

    The complete nucleotide sequence of the ssRNA genome of a Chinese GPV isolate of barley yellow dwarf virus (BYDV) was determined. It comprised 5673 nucleotides, and the deduced genome organization resembled that of members of the genus Polerovirus. It was most closely related to cereal yellow dwarf virus-RPV (77% nt identity over the entire genome; coat protein amino acid identity 79%). The GPV isolate also differs in vector specificity from other BYDV strains. Biological properties, phylogenetic analyses and detailed sequence comparisons suggest that GPV should be considered a member of a new species within the genus, and the name Wheat yellow dwarf virus-GPV is proposed.

  5. alpha-Amylase gene of Streptomyces limosus: nucleotide sequence, expression motifs, and amino acid sequence homology to mammalian and invertebrate alpha-amylases.

    PubMed Central

    Long, C M; Virolle, M J; Chang, S Y; Chang, S; Bibb, M J

    1987-01-01

    The nucleotide sequence of the coding and regulatory regions of the alpha-amylase gene (aml) of Streptomyces limosus was determined. High-resolution S1 mapping was used to locate the 5' end of the transcript and demonstrated that the gene is transcribed from a unique promoter. The predicted amino acid sequence has considerable identity to mammalian and invertebrate alpha-amylases, but not to those of plant, fungal, or eubacterial origin. Consistent with this is the susceptibility of the enzyme to an inhibitor of mammalian alpha-amylases. The amino-terminal sequence of the extracellular enzyme was determined, revealing the presence of a typical signal peptide preceding the mature form of the alpha-amylase. Images PMID:3500166

  6. Comparison of the nucleotide and amino acid sequences of the RsrI and EcoRI restriction endonucleases.

    PubMed

    Stephenson, F H; Ballard, B T; Boyer, H W; Rosenberg, J M; Greene, P J

    1989-12-21

    The RsrI endonuclease, a type-II restriction endonuclease (ENase) found in Rhodobacter sphaeroides, is an isoschizomer of the EcoRI ENase. A clone containing an 11-kb BamHI fragment was isolated from an R. sphaeroides genomic DNA library by hybridization with synthetic oligodeoxyribonucleotide probes based on the N-terminal amino acid (aa) sequence of RsrI. Extracts of E. coli containing a subclone of the 11-kb fragment display RsrI activity. Nucleotide sequence analysis reveals an 831-bp open reading frame encoding a polypeptide of 277 aa. A 50% identity exists within a 266-aa overlap between the deduced aa sequences of RsrI and EcoRI. Regions of 75-100% aa sequence identity correspond to key structural and functional regions of EcoRI. The type-II ENases have many common properties, and a common origin might have been expected. Nevertheless, this is the first demonstration of aa sequence similarity between ENases produced by different organisms.

  7. Nucleotide sequence analysis of the 3' terminal region of a wasabi strain of crucifer tobamovirus genomic RNA: subgrouping of crucifer tobamoviruses.

    PubMed

    Shimamoto, I; Sonoda, S; Vazquez, P; Minaka, N; Nishiguchi, M

    1998-01-01

    The 3' terminal 2378 nucleotides of a wasabi strain of crucifer tobamovirus (CTMV-W) infectious to crucifer plants was determined. This includes the 3' non-coding region of 235 nucleotides, coat protein (CP) gene (468 nucleotides), movement protein (MP) gene (798 nucleotides) and C-terminal partial readthrough portion of 180 K protein gene (940 nucleotides). Comparison of the sequence with homologous regions of thirteen other tobamovirus genomes showed that it had much higher identity to those of four other crucifer tobamoviruses, 85.2% to cr-TMV and turnip vein-clearing virus (TVCV), 87.4% to oilseed rape mosaic virus (ORMV) and 87.1% to TMV-Cg, than to those of other tobamoviruses. Thus CTMV-W was most similar to ORMV and TMV-Cg in sequence, but only marginally so, whereas the location and size of its MP gene was the same as cr-TMV amd TVCV. These results, together with other analyses, show that CTMV-W is a new crucifer tobamovirus, that the five crucifer tobamoviruses can be classified into two subgroups based on MP gene organization, and that the rate of sequence change is not the same in all lineages.

  8. Nucleotide sequences specific to Brucella and methods for the detection of Brucella

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McCready, Paula M; Radnedge, Lyndsay; Andersen, Gary L

    Nucleotide sequences specific to Brucella that serves as a marker or signature for identification of this bacterium were identified. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  9. Nucleotide sequence of the beta-lactamase gene from Enterococcus faecalis HH22 and its similarity to staphylococcal beta-lactamase genes.

    PubMed Central

    Zscheck, K K; Murray, B E

    1991-01-01

    The nucleotide sequence of the constitutively produced beta-lactamase (Bla) gene from Enterococcus faecalis HH22 was shown to be identical to the published sequences of three of four staphylococcal type A beta-lactamase genes; more differences were seen with the genes for staphylococcal type C and D enzymes. One hundred forty nucleotides upstream of the beta-lactamase start codon were determined for an inducible staphylococcal beta-lactamase and were identical to those of the constitutively expressed enterococcal gene, indicating that the changes resulting in constitutive expression are not due to changes in the promoter or operator region. Moreover, complementation studies indicated that production of the enterococcal enzyme could be repressed. The genes for the enterococcal Bla and an inducible staphylococcal Bla were each cloned into a shuttle vector and transformed into enterococcal and staphylococcal recipients. The major difference between the backgrounds of the two hosts was that more enzyme was produced by the staphylococcal host, regardless of the source of the gene. The location of the enzyme was found to be host dependent, since each cloned gene generated extracellular (free) enzyme in the staphylococcus and cell-bound enzyme in the enterococcus. On the basis of the identities of the enterococcal Bla and several staphylococcal Bla sequences, these data suggest the recent spread of beta-lactamase to enterococci and also suggest the loss of a functional repressor. PMID:1952840

  10. Complete nucleotide sequence and genome structure of a Japanese isolate of hibiscus latent Fort Pierce virus, a unique tobamovirus that contains an internal poly(A) region in its 3' end.

    PubMed

    Yoshida, Tetsuya; Kitazawa, Yugo; Komatsu, Ken; Neriya, Yutaro; Ishikawa, Kazuya; Fujita, Naoko; Hashimoto, Masayoshi; Maejima, Kensaku; Yamaji, Yasuyuki; Namba, Shigetou

    2014-11-01

    In this study, we detected a Japanese isolate of hibiscus latent Fort Pierce virus (HLFPV-J), a member of the genus Tobamovirus, in a hibiscus plant in Japan and determined the complete sequence and organization of its genome. HLFPV-J has four open reading frames (ORFs), each of which shares more than 98 % nucleotide sequence identity with those of other HLFPV isolates. Moreover, HLFPV-J contains a unique internal poly(A) region of variable length, ranging from 44 to 78 nucleotides, in its 3'-untranslated region (UTR), as is the case with hibiscus latent Singapore virus (HLSV), another hibiscus-infecting tobamovirus. The length of the HLFPV-J genome was 6431 nucleotides, including the shortest internal poly(A) region. The sequence identities of ORFs 1, 2, 3 and 4 of HLFPV-J to other tobamoviruses were 46.6-68.7, 49.9-70.8, 31.0-70.8 and 39.4-70.1 %, respectively, at the nucleotide level and 39.8-75.0, 43.6-77.8, 19.2-70.4 and 31.2-74.2 %, respectively, at the amino acid level. The 5'- and 3'-UTRs of HLFPV-J showed 24.3-58.6 and 13.0-79.8 % identity, respectively, to other tobamoviruses. In particular, when compared to other tobamoviruses, each ORF and UTR of HLFPV-J showed the highest sequence identity to those of HLSV. Phylogenetic analysis showed that HLFPV-J, other HLFPV isolates and HLSV constitute a malvaceous-plant-infecting tobamovirus cluster. These results indicate that the genomic structure of HLFPV-J has unique features similar to those of HLSV. To our knowledge, this is the first report of the complete genome sequence of HLFPV.

  11. VarDetect: a nucleotide sequence variation exploratory tool

    PubMed Central

    Ngamphiw, Chumpol; Kulawonganunchai, Supasak; Assawamakin, Anunchai; Jenwitheesuk, Ekachai; Tongsima, Sissades

    2008-01-01

    Background Single nucleotide polymorphisms (SNPs) are the most commonly studied units of genetic variation. The discovery of such variation may help to identify causative gene mutations in monogenic diseases and SNPs associated with predisposing genes in complex diseases. Accurate detection of SNPs requires software that can correctly interpret chromatogram signals to nucleotides. Results We present VarDetect, a stand-alone nucleotide variation exploratory tool that automatically detects nucleotide variation from fluorescence based chromatogram traces. Accurate SNP base-calling is achieved using pre-calculated peak content ratios, and is enhanced by rules which account for common sequence reading artifacts. The proposed software tool is benchmarked against four other well-known SNP discovery software tools (PolyPhred, novoSNP, Genalys and Mutation Surveyor) using fluorescence based chromatograms from 15 human genes. These chromatograms were obtained from sequencing 16 two-pooled DNA samples; a total of 32 individual DNA samples. In this comparison of automatic SNP detection tools, VarDetect achieved the highest detection efficiency. Availability VarDetect is compatible with most major operating systems such as Microsoft Windows, Linux, and Mac OSX. The current version of VarDetect is freely available at . PMID:19091032

  12. The nucleotide sequence and genome organization of Plasmopara halstedii virus.

    PubMed

    Heller-Dohmen, Marion; Göpfert, Jens C; Pfannstiel, Jens; Spring, Otmar

    2011-03-17

    Only very few viruses of Oomycetes have been studied in detail. Isometric virions were found in different isolates of the oomycete Plasmopara halstedii, the downy mildew pathogen of sunflower. However, complete nucleotide sequences and data on the genome organization were lacking. Viral RNA of different P. halstedii isolates was subjected to nucleotide sequencing and analysis of the viral genome. The N-terminal sequence of the viral coat protein was determined using Top-Down MALDI-TOF analysis. The complete nucleotide sequences of both single-stranded RNA segments (RNA1 and RNA2) were established. RNA1 consisted of 2793 nucleotides (nt) exclusive its 3' poly(A) tract and a single open-reading frame (ORF1) of 2745 nt. ORF1 was framed by a 5' untranslated region (5' UTR) of 18 nt and a 3' untranslated region (3' UTR) of 30 nt. ORF1 contained motifs of RNA-dependent RNA polymerases (RdRp) and showed similarities to RdRp of Scleropthora macrospora virus A (SmV A) and viruses within the Nodaviridae family. RNA2 consisted of 1526 nt exclusive its 3' poly(A) tract and a second ORF (ORF2) of 1128 nt. ORF2 coded for the single viral coat protein (CP) and was framed by a 5' UTR of 164 nt and a 3' UTR of 234 nt. The deduced amino acid sequence of ORF2 was verified by nano-LC-ESI-MS/MS experiments. Top-Down MALDI-TOF analysis revealed the N-terminal sequence of the CP. The N-terminal sequence represented a region within ORF2 suggesting a proteolytic processing of the CP in vivo. The CP showed similarities to CP of SmV A and viruses within the Tombusviridae family. Fragments of RNA1 (ca. 1.9 kb) and RNA2 (ca. 1.4 kb) were used to analyze the nucleotide sequence variation of virions in different P. halstedii isolates. Viral sequence variation was 0.3% or less regardless of their host's pathotypes, the geographical origin and the sensitivity towards the fungicide metalaxyl. The results showed the presence of a single and new virus type in different P. halstedii isolates

  13. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  14. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2011-07-01 2011-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  15. Quantum Point Contact Single-Nucleotide Conductance for DNA and RNA Sequence Identification.

    PubMed

    Afsari, Sepideh; Korshoj, Lee E; Abel, Gary R; Khan, Sajida; Chatterjee, Anushree; Nagpal, Prashant

    2017-11-28

    Several nanoscale electronic methods have been proposed for high-throughput single-molecule nucleic acid sequence identification. While many studies display a large ensemble of measurements as "electronic fingerprints" with some promise for distinguishing the DNA and RNA nucleobases (adenine, guanine, cytosine, thymine, and uracil), important metrics such as accuracy and confidence of base calling fall well below the current genomic methods. Issues such as unreliable metal-molecule junction formation, variation of nucleotide conformations, insufficient differences between the molecular orbitals responsible for single-nucleotide conduction, and lack of rigorous base calling algorithms lead to overlapping nanoelectronic measurements and poor nucleotide discrimination, especially at low coverage on single molecules. Here, we demonstrate a technique for reproducible conductance measurements on conformation-constrained single nucleotides and an advanced algorithmic approach for distinguishing the nucleobases. Our quantum point contact single-nucleotide conductance sequencing (QPICS) method uses combed and electrostatically bound single DNA and RNA nucleotides on a self-assembled monolayer of cysteamine molecules. We demonstrate that by varying the applied bias and pH conditions, molecular conductance can be switched ON and OFF, leading to reversible nucleotide perturbation for electronic recognition (NPER). We utilize NPER as a method to achieve >99.7% accuracy for DNA and RNA base calling at low molecular coverage (∼12×) using unbiased single measurements on DNA/RNA nucleotides, which represents a significant advance compared to existing sequencing methods. These results demonstrate the potential for utilizing simple surface modifications and existing biochemical moieties in individual nucleobases for a reliable, direct, single-molecule, nanoelectronic DNA and RNA nucleotide identification method for sequencing.

  16. Nucleotide sequence of a complementary DNA encoding pea cytosolic copper/zinc superoxide dismutase. [Pisum sativum L

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    White, D.A.; Zilinskas, B.A.

    1991-08-01

    The authors now report the nucleotide sequence of the cytosolic Cu/Zn SOD cloned from a {lambda}gt11 cDNA library constructed from mRNA extracted from leaves of 7- to 10-d pea seedlings (Pisum sativum L.). The clone was isolated using a 22-base synthetic oligonucleotide complementary to the amino acid sequence CGIIGLQG. This sequence, found at the protein's carboxy terminus, is highly conserved among plant cytosolic Cu/Zn SODs but not chloroplastic Cu/Zn SODs. The 738-base pair sequence contains an open reading frame specifying 152 codons and a predicted M{sub r} of 18,024 D. The deduced amino acid sequence is highly homologous (79-82% identity)more » with the sequences of other known plant cytosolic Cu/Zn SODs but less highly conserved (63-65%) when compared with several chloroplastic Cu/Zn SODs including pea (10).« less

  17. Identification and nucleotide sequence analysis of the repetitive DNA element in the genome of fish lymphocystis disease virus.

    PubMed

    Schnitzler, P; Delius, H; Scholz, J; Touray, M; Orth, E; Darai, G

    1987-12-01

    The genome of the fish lymphocystis disease virus (FLDV) was screened for the existence of repetitive DNA sequences using a defined and complete gene library of the viral genome (98 kbp) by DNA-DNA hybridization, heteroduplex analysis, and restriction fine mapping. A repetitive DNA sequence was detected at the coordinates 0.034 to 0.057 and 0.718 to 0.736 map units (m.u.) of the FLDV genome. The first region (0.034 to 0.057 m.u.) corresponds to the 5' terminus of the EcoRI FLDV DNA fragment B (0.034 to 0.165 m.u.) and the second region (0.718 to 0.736 m.u.) is identical to the EcoRI DNA fragment M of the viral genome. The DNA nucleotide sequence of the EcoRI FLDV DNA fragment M was determined. This analysis revealed the presence of many short direct and inverted repetitions, e.g., a 18-mer direct repetition (TTTAAAATTTAATTAA) that started at nucleotide positions 812 and 942 and a 14-mer inverted repeat (TTAAATTTAAATTT) at nucleotide positions 820 and 959. Only short open reading frames were detected within this region. The DNA repetitions are discussed as sequences that play a possible regulatory role for virus replication. Furthermore, hybridization experiments revealed that the repetitive DNA sequences are conserved in the genome of different strains of fish lymphocystis disease virus isolated from two species of Pleuronectidae (flounder and dab).

  18. Interactive computer programs for the graphic analysis of nucleotide sequence data.

    PubMed Central

    Luckow, V A; Littlewood, R K; Rownd, R H

    1984-01-01

    A group of interactive computer programs have been developed which aid in the collection and graphical analysis of nucleotide and protein sequence data. The programs perform the following basic functions: a) enter, edit, list, and rearrange sequence data; b) permit automatic entry of nucleotide sequence data directly from an autoradiograph into the computer; c) search for restriction sites or other specified patterns and plot a linear or circular restriction map, or print their locations; d) plot base composition; e) analyze homology between sequences by plotting a two-dimensional graphic matrix; and f) aid in plotting predicted secondary structures of RNA molecules. PMID:6546437

  19. WEB-server for search of a periodicity in amino acid and nucleotide sequences

    NASA Astrophysics Data System (ADS)

    E Frenkel, F.; Skryabin, K. G.; Korotkov, E. V.

    2017-12-01

    A new web server (http://victoria.biengi.ac.ru/splinter/login.php) was designed and developed to search for periodicity in nucleotide and amino acid sequences. The web server operation is based upon a new mathematical method of searching for multiple alignments, which is founded on the position weight matrices optimization, as well as on implementation of the two-dimensional dynamic programming. This approach allows the construction of multiple alignments of the indistinctly similar amino acid and nucleotide sequences that accumulated more than 1.5 substitutions per a single amino acid or a nucleotide without performing the sequences paired comparisons. The article examines the principles of the web server operation and two examples of studying amino acid and nucleotide sequences, as well as information that could be obtained using the web server.

  20. Repeated sequence sets in mitochondrial DNA molecules of root knot nematodes (Meloidogyne): nucleotide sequences, genome location and potential for host-race identification.

    PubMed Central

    Okimoto, R; Chamberlin, H M; Macfarlane, J L; Wolstenholme, D R

    1991-01-01

    Within a 7 kb segment of the mtDNA molecule of the root knot nematode, Meloidogyne javanica, that lacks standard mitochondrial genes, are three sets of strictly tandemly arranged, direct repeat sequences: approximately 36 copies of a 102 ntp sequence that contains a TaqI site; 11 copies of a 63 ntp sequence, and 5 copies of an 8 ntp sequence. The 7 kb repeat-containing segment is bounded by putative tRNAasp and tRNAf-met genes and the arrangement of sequences within this segment is: the tRNAasp gene; a unique 1,528 ntp segment that contains two highly stable hairpin-forming sequences; the 102 ntp repeat set; the 8 ntp repeat set; a unique 1,068 ntp segment; the 63 ntp repeat set; and the tRNAf-met gene. The nucleotide sequences of the 102 ntp copies and the 63 ntp copies have been conserved among the species examined. Data from Southern hybridization experiments indicate that 102 ntp and 63 ntp repeats occur in the mtDNAs of three, two and two races of M.incognita, M.hapla and M.arenaria, respectively. Nucleotide sequences of the M.incognita Race-3 102 ntp repeat were found to be either identical or highly similar to those of the M.javanica 102 ntp repeat. Differences in migration distance and number of 102 ntp repeat-containing bands seen in Southern hybridization autoradiographs of restriction-digested mtDNAs of M.javanica and the different host races of M.incognita, M.hapla and M.arenaria are sufficient to distinguish the different host races of each species. Images PMID:2027769

  1. Nucleotide sequences specific to Yersinia pestis and methods for the detection of Yersinia pestis

    DOEpatents

    McCready, Paula M [Tracy, CA; Radnedge, Lyndsay [San Mateo, CA; Andersen, Gary L [Berkeley, CA; Ott, Linda L [Livermore, CA; Slezak, Thomas R [Livermore, CA; Kuczmarski, Thomas A [Livermore, CA; Motin, Vladinir L [League City, TX

    2009-02-24

    Nucleotide sequences specific to Yersinia pestis that serve as markers or signatures for identification of this bacterium were identified. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  2. Information capacity of nucleotide sequences and its applications.

    PubMed

    Sadovsky, M G

    2006-05-01

    The information capacity of nucleotide sequences is defined through the specific entropy of frequency dictionary of a sequence determined with respect to another one containing the most probable continuations of shorter strings. This measure distinguishes a sequence both from a random one, and from ordered entity. A comparison of sequences based on their information capacity is studied. An order within the genetic entities is found at the length scale ranged from 3 to 8. Some other applications of the developed methodology to genetics, bioinformatics, and molecular biology are discussed.

  3. Unlinking the methylome pattern from nucleotide sequence, revealed by large-scale in vivo genome engineering and methylome editing in medaka fish

    PubMed Central

    Nakamura, Ryohei; Uno, Ayako; Kumagai, Masahiko; Fukushima, Hiroto S.; Morishita, Shinichi; Takeda, Hiroyuki

    2017-01-01

    The heavily methylated vertebrate genomes are punctuated by stretches of poorly methylated DNA sequences that usually mark gene regulatory regions. It is known that the methylation state of these regions confers transcriptional control over their associated genes. Given its governance on the transcriptome, cellular functions and identity, genome-wide DNA methylation pattern is tightly regulated and evidently predefined. However, how is the methylation pattern determined in vivo remains enigmatic. Based on in silico and in vitro evidence, recent studies proposed that the regional hypomethylated state is primarily determined by local DNA sequence, e.g., high CpG density and presence of specific transcription factor binding sites. Nonetheless, the dependency of DNA methylation on nucleotide sequence has not been carefully validated in vertebrates in vivo. Herein, with the use of medaka (Oryzias latipes) as a model, the sequence dependency of DNA methylation was intensively tested in vivo. Our statistical modeling confirmed the strong statistical association between nucleotide sequence pattern and methylation state in the medaka genome. However, by manipulating the methylation state of a number of genomic sequences and reintegrating them into medaka embryos, we demonstrated that artificially conferred DNA methylation states were predominantly and robustly maintained in vivo, regardless of their sequences and endogenous states. This feature was also observed in the medaka transgene that had passed across generations. Thus, despite the observed statistical association, nucleotide sequence was unable to autonomously determine its own methylation state in medaka in vivo. Our results apparently argue against the notion of the governance on the DNA methylation by nucleotide sequence, but instead suggest the involvement of other epigenetic factors in defining and maintaining the DNA methylation landscape. Further investigation in other vertebrate models in vivo will be needed

  4. Mining of haplotype-based expressed sequence tag single nucleotide polymorphisms in citrus

    PubMed Central

    2013-01-01

    Background Single nucleotide polymorphisms (SNPs), the most abundant variations in a genome, have been widely used in various studies. Detection and characterization of citrus haplotype-based expressed sequence tag (EST) SNPs will greatly facilitate further utilization of these gene-based resources. Results In this paper, haplotype-based SNPs were mined out of publicly available citrus expressed sequence tags (ESTs) from different citrus cultivars (genotypes) individually and collectively for comparison. There were a total of 567,297 ESTs belonging to 27 cultivars in varying numbers and consequentially yielding different numbers of haplotype-based quality SNPs. Sweet orange (SO) had the most (213,830) ESTs, generating 11,182 quality SNPs in 3,327 out of 4,228 usable contigs. Summed from all the individually mining results, a total of 25,417 quality SNPs were discovered – 15,010 (59.1%) were transitions (AG and CT), 9,114 (35.9%) were transversions (AC, GT, CG, and AT), and 1,293 (5.0%) were insertion/deletions (indels). A vast majority of SNP-containing contigs consisted of only 2 haplotypes, as expected, but the percentages of 2 haplotype contigs varied widely in these citrus cultivars. BLAST of the 25,417 25-mer SNP oligos to the Clementine reference genome scaffolds revealed 2,947 SNPs had “no hits found”, 19,943 had 1 unique hit / alignment, 1,571 had one hit and 2+ alignments per hit, and 956 had 2+ hits and 1+ alignment per hit. Of the total 24,293 scaffold hits, 23,955 (98.6%) were on the main scaffolds 1 to 9, and only 338 were on 87 minor scaffolds. Most alignments had 100% (25/25) or 96% (24/25) nucleotide identities, accounting for 93% of all the alignments. Considering almost all the nucleotide discrepancies in the 24/25 alignments were at the SNP sites, it served well as in silico validation of these SNPs, in addition to and consistent with the rate (81%) validated by sequencing and SNaPshot assay. Conclusions High-quality EST-SNPs from different

  5. Nucleotide sequences specific to Francisella tularensis and methods for the detection of Francisella tularensis

    DOEpatents

    McCready, Paula M [Tracy, CA; Radnedge, Lyndsay [San Mateo, CA; Andersen, Gary L [Berkeley, CA; Ott, Linda L [Livermore, CA; Slezak, Thomas R [Livermore, CA; Kuczmarski, Thomas A [Livermore, CA; Vitalis, Elizabeth A [Livermore, CA

    2007-02-06

    Described herein is the identification of nucleotide sequences specific to Francisella tularensis that serves as a marker or signature for identification of this bacterium. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  6. Nucleotide sequences specific to Francisella tularensis and methods for the detection of Francisella tularensis

    DOEpatents

    McCready, Paula M [Tracy, CA; Radnedge, Lyndsay [San Mateo, CA; Andersen, Gary L [Berkeley, CA; Ott, Linda L [Livermore, CA; Slezak, Thomas R [Livermore, CA; Kuczmarski, Thomas A [Livermore, CA; Vitalis, Elizabeth A [Livermore, CA

    2009-02-24

    Described herein is the identification of nucleotide sequences specific to Francisella tularensis that serves as a marker or signature for identification of this bacterium. In addition, forward and reverse primers and hybridization probes derived from these nucleotide sequences that are used in nucleotide detection methods to detect the presence of the bacterium are disclosed.

  7. Nucleotide sequence analysis establishes the role of endogenous murine leukemia virus DNA segments in formation of recombinant mink cell focus-forming murine leukemia viruses.

    PubMed Central

    Khan, A S

    1984-01-01

    The sequence of 363 nucleotides near the 3' end of the pol gene and 564 nucleotides from the 5' terminus of the env gene in an endogenous murine leukemia viral (MuLV) DNA segment, cloned from AKR/J mouse DNA and designated as A-12, was obtained. For comparison, the nucleotide sequence in an analogous portion of AKR mink cell focus-forming (MCF) 247 MuLV provirus was also determined. Sequence features unique to MCF247 MuLV DNA in the 3' pol and 5' env regions were identified by comparison with nucleotide sequences in analogous regions of NFS -Th-1 xenotropic and AKR ecotropic MuLV proviruses. These included (i) an insertion of 12 base pairs encoding four amino acids located 60 base pairs from the 3' terminus of the pol gene and immediately preceding the env gene, (ii) the deletion of 12 base pairs (encoding four amino acids) and the insertion of 3 base pairs (encoding one amino acid) in the 5' portion of the env gene, and (iii) single base substitutions resulting in 2 MCF247 -specific amino acids in the 3' pol and 23 in the 5' env regions. Nucleotide sequence comparison involving the 3' pol and 5' env regions of AKR MCF247 , NFS xenotropic, and AKR ecotropic MuLV proviruses with the cloned endogenous MuLV DNA indicated that MCF247 proviral DNA sequences were conserved in the cloned endogenous MuLV proviral segment. In fact, total nucleotide sequence identity existed between the endogenous MuLV DNA and the MCF247 MuLV provirus in the 3' portion of the pol gene. In the 5' env region, only 4 of 564 nucleotides were different, resulting in three amino acid changes between AKR MCF247 MuLV DNA and the endogenous MuLV DNA present in clone A-12. In addition, nucleotide sequence comparison indicated that Moloney-and Friend-MCF MuLVs were also highly related in the 3' pol and 5' env regions to the cloned endogenous MuLV DNA. These results establish the role of endogenous MuLV DNA segments in generation of recombinant MCF viruses. PMID:6328017

  8. Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions.

    PubMed

    Nishizawa, M; Nishizawa, K

    2000-10-01

    The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the 'between gene' GC content heterogeneity, which is linked to 'isochores', is a principal factor associated with the bias in substitution patterns in human, 'within gene' heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed.

  9. Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions

    PubMed Central

    Nishizawa, Manami; Nishizawa, Kazuhisa

    2000-01-01

    The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the ‘between gene’ GC content heterogeneity, which is linked to ‘isochores’, is a principal factor associated with the bias in substitution patterns in human, ‘within gene’ heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed. PMID:11000273

  10. Tidying Up International Nucleotide Sequence Databases: Ecological, Geographical and Sequence Quality Annotation of ITS Sequences of Mycorrhizal Fungi

    PubMed Central

    Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R. Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M.; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

    2011-01-01

    Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi. PMID:21949797

  11. Typing of canine parvovirus isolates using mini-sequencing based single nucleotide polymorphism analysis.

    PubMed

    Naidu, Hariprasad; Subramanian, B Mohana; Chinchkar, Shankar Ramchandra; Sriraman, Rajan; Rana, Samir Kumar; Srinivasan, V A

    2012-05-01

    The antigenic types of canine parvovirus (CPV) are defined based on differences in the amino acids of the major capsid protein VP2. Type specificity is conferred by a limited number of amino acid changes and in particular by few nucleotide substitutions. PCR based methods are not particularly suitable for typing circulating variants which differ in a few specific nucleotide substitutions. Assays for determining SNPs can detect efficiently nucleotide substitutions and can thus be adapted to identify CPV types. In the present study, CPV typing was performed by single nucleotide extension using the mini-sequencing technique. A mini-sequencing signature was established for all the four CPV types (CPV2, 2a, 2b and 2c) and feline panleukopenia virus. The CPV typing using the mini-sequencing reaction was performed for 13 CPV field isolates and the two vaccine strains available in our repository. All the isolates had been typed earlier by full-length sequencing of the VP2 gene. The typing results obtained from mini-sequencing matched completely with that of sequencing. Typing could be achieved with less than 100 copies of standard plasmid DNA constructs or ≤10¹ FAID₅₀ of virus by mini-sequencing technique. The technique was also efficient for detecting multiple types in mixed infections. Copyright © 2012 Elsevier B.V. All rights reserved.

  12. The Nucleotide Sequence and Spliced pol mRNA Levels of the Nonprimate Spumavirus Bovine Foamy Virus

    PubMed Central

    Holzschu, Donald L.; Delaney, Mari A.; Renshaw, Randall W.; Casey, James W.

    1998-01-01

    We have determined the complete nucleotide sequence of a replication-competent clone of bovine foamy virus (BFV) and have quantitated the amount of splice pol mRNA processed early in infection. The 544-amino-acid Gag protein precursor has little sequence similarity with its primate foamy virus homologs, but the putative nucleocapsid (NC) protein, like the primate NCs, contains the three glycine-arginine-rich regions that are postulated to bind genomic RNA during virion assembly. The BFV gag and pol open reading frames overlap, with pro and pol in the same translational frame. As with the human foamy virus (HFV) and feline foamy virus, we have detected a spliced pol mRNA by PCR. Quantitatively, this mRNA approximates the level of full-length genomic RNA early in infection. The integrase (IN) domain of reverse transcriptase does not contain the canonical HH-CC zinc finger motif present in all characterized retroviral INs, but it does contain a nearby histidine residue that could conceivably participate as a member of the zinc finger. The env gene encodes a protein that is over 40% identical in sequence to the HFV Env. By comparison, the Gag precursor of BFV is predicted to be only 28% identical to the HFV protein. PMID:9499074

  13. Extension of the COG and arCOG databases by amino acid and nucleotide sequences

    PubMed Central

    Meereis, Florian; Kaufmann, Michael

    2008-01-01

    Background The current versions of the COG and arCOG databases, both excellent frameworks for studies in comparative and functional genomics, do not contain the nucleotide sequences corresponding to their protein or protein domain entries. Results Using sequence information obtained from GenBank flat files covering the completely sequenced genomes of the COG and arCOG databases, we constructed NUCOCOG (nucleotide sequences containing COG databases) as an extended version including all nucleotide sequences and in addition the amino acid sequences originally utilized to construct the current COG and arCOG databases. We make available three comprehensive single XML files containing the complete databases including all sequence information. In addition, we provide a web interface as a utility suitable to browse the NUCOCOG database for sequence retrieval. The database is accessible at . Conclusion NUCOCOG offers the possibility to analyze any sequence related property in the context of the COG and arCOG framework simply by using script languages such as PERL applied to a large but single XML document. PMID:19014535

  14. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  15. Nucleotide sequence composition and method for detection of neisseria gonorrhoeae

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lo, A.; Yang, H.L.

    1990-02-13

    This patent describes a composition of matter that is specific for {ital Neisseria gonorrhoeae}. It comprises: at least one nucleotide sequence for which the ratio of the amount of the sequence which hybridizes to chromosomal DNA of {ital Neisseria gonorrhoeae} to the amount of the sequence which hybridizes to chromosomal DNA of {ital Neisseria meningitidis} is greater than about five. The ratio being obtained by a method described.

  16. Statistical analysis of nucleotide sequences of the hemagglutinin gene of human influenza A viruses.

    PubMed Central

    Ina, Y; Gojobori, T

    1994-01-01

    To examine whether positive selection operates on the hemagglutinin 1 (HA1) gene of human influenza A viruses (H1 subtype), 21 nucleotide sequences of the HA1 gene were statistically analyzed. The nucleotide sequences were divided into antigenic and nonantigenic sites. The nucleotide diversities for antigenic and nonantigenic sites of the HA1 gene were computed at synonymous and nonsynonymous sites separately. For nonantigenic sites, the nucleotide diversities were larger at synonymous sites than at nonsynonymous sites. This is consistent with the neutral theory of molecular evolution. For antigenic sites, however, the nucleotide diversities at nonsynonymous sites were larger than those at synonymous sites. These results suggest that positive selection operates on antigenic sites of the HA1 gene of human influenza A viruses (H1 subtype). PMID:8078892

  17. NGSCheckMate: software for validating sample identity in next-generation sequencing studies within and across data types.

    PubMed

    Lee, Sejoon; Lee, Soohyun; Ouellette, Scott; Park, Woong-Yang; Lee, Eunjung A; Park, Peter J

    2017-06-20

    In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (>0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. https://github.com/parklab/NGSCheckMate. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Precursors of vertebrate peptide antibiotics dermaseptin b and adenoregulin have extensive sequence identities with precursors of opioid peptides dermorphin, dermenkephalin, and deltorphins.

    PubMed

    Amiche, M; Ducancel, F; Mor, A; Boulain, J C; Menez, A; Nicolas, P

    1994-07-08

    The dermaseptins are a family of broad spectrum antimicrobial peptides, 27-34 amino acids long, involved in the defense of the naked skin of frogs against microbial invasion. They are the first vertebrate peptides to show lethal effects against the filamentous fungi responsible for severe opportunistic infections accompanying immunodeficiency syndrome and the use of immunosuppressive agents. A cDNA library was constructed from skin poly(A+) RNA of the arboreal frog Phyllomedusa bicolor and screened with an oligonucleotide probe complementary to the COOH terminus of dermaseptin b. Several clones contained a full-length DNA copy of a 443-nucleotide mRNA that encoded a 78-residue dermaseptin b precursor protein. The deduced precursor contained a putative signal sequence at the NH2 terminus, a 20-residue spacer sequence extremely rich (60%) in glutamic and aspartic acids, and a single copy of a dermaseptin b progenitor sequence at the COOH terminus. One clone contained a complete copy of adenoregulin, a 33-residue peptide reported to enhance the binding of agonists to the A1 adenosine receptor. The mRNAs encoding adenoregulin and dermaseptin b were very similar: 70 and 75% nucleotide identities between the 5'- and 3'-untranslated regions, respectively; 91% amino acid identity between the signal peptides; 82% identity between the acidic spacer sequences; and 38% identity between adenoregulin and dermaseptin b. Because adenoregulin and dermaseptin b have similar precursor designs and antimicrobial spectra, adenoregulin should be considered as a new member of the dermaseptin family and alternatively named dermaseptin b II. Preprodermaseptin b and preproadenoregulin have considerable sequence identities to the precursors encoding the opioid heptapeptides dermorphin, dermenkephalin, and deltorphins. This similarity extended into the 5'-untranslated regions of the mRNAs. These findings suggest that the genes encoding the four preproproteins are all members of the same family

  19. TIA: algorithms for development of identity-linked SNP islands for analysis by massively parallel DNA sequencing.

    PubMed

    Farris, M Heath; Scott, Andrew R; Texter, Pamela A; Bartlett, Marta; Coleman, Patricia; Masters, David

    2018-04-11

    Single nucleotide polymorphisms (SNPs) located within the human genome have been shown to have utility as markers of identity in the differentiation of DNA from individual contributors. Massively parallel DNA sequencing (MPS) technologies and human genome SNP databases allow for the design of suites of identity-linked target regions, amenable to sequencing in a multiplexed and massively parallel manner. Therefore, tools are needed for leveraging the genotypic information found within SNP databases for the discovery of genomic targets that can be evaluated on MPS platforms. The SNP island target identification algorithm (TIA) was developed as a user-tunable system to leverage SNP information within databases. Using data within the 1000 Genomes Project SNP database, human genome regions were identified that contain globally ubiquitous identity-linked SNPs and that were responsive to targeted resequencing on MPS platforms. Algorithmic filters were used to exclude target regions that did not conform to user-tunable SNP island target characteristics. To validate the accuracy of TIA for discovering these identity-linked SNP islands within the human genome, SNP island target regions were amplified from 70 contributor genomic DNA samples using the polymerase chain reaction. Multiplexed amplicons were sequenced using the Illumina MiSeq platform, and the resulting sequences were analyzed for SNP variations. 166 putative identity-linked SNPs were targeted in the identified genomic regions. Of the 309 SNPs that provided discerning power across individual SNP profiles, 74 previously undefined SNPs were identified during evaluation of targets from individual genomes. Overall, DNA samples of 70 individuals were uniquely identified using a subset of the suite of identity-linked SNP islands. TIA offers a tunable genome search tool for the discovery of targeted genomic regions that are scalable in the population frequency and numbers of SNPs contained within the SNP island regions

  20. Nucleotide-Specific Contrast for DNA Sequencing by Electron Spectroscopy.

    PubMed

    Mankos, Marian; Persson, Henrik H J; N'Diaye, Alpha T; Shadman, Khashayar; Schmid, Andreas K; Davis, Ronald W

    2016-01-01

    DNA sequencing by imaging in an electron microscope is an approach that holds promise to deliver long reads with low error rates and without the need for amplification. Earlier work using transmission electron microscopes, which use high electron energies on the order of 100 keV, has shown that low contrast and radiation damage necessitates the use of heavy atom labeling of individual nucleotides, which increases the read error rates. Other prior work using scattering electrons with much lower energy has shown to suppress beam damage on DNA. Here we explore possibilities to increase contrast by employing two methods, X-ray photoelectron and Auger electron spectroscopy. Using bulk DNA samples with monomers of each base, both methods are shown to provide contrast mechanisms that can distinguish individual nucleotides without labels. Both spectroscopic techniques can be readily implemented in a low energy electron microscope, which may enable label-free DNA sequencing by direct imaging.

  1. Nucleotide-Specific Contrast for DNA Sequencing by Electron Spectroscopy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mankos, Marian; Persson, Henrik H. J.; N’Diaye, Alpha T.

    DNA sequencing by imaging in an electron microscope is an approach that holds promise to deliver long reads with low error rates and without the need for amplification. Earlier work using transmission electron microscopes, which use high electron energies on the order of 100 keV, has shown that low contrast and radiation damage necessitates the use of heavy atom labeling of individual nucleotides, which increases the read error rates. Other prior work using scattering electrons with much lower energy has shown to suppress beam damage on DNA. Here we explore possibilities to increase contrast by employing two methods, X-ray photoelectronmore » and Auger electron spectroscopy. Using bulk DNA samples with monomers of each base, both methods are shown to provide contrast mechanisms that can distinguish individual nucleotides without labels. In conclusion, both spectroscopic techniques can be readily implemented in a low energy electron microscope, which may enable label-free DNA sequencing by direct imaging.« less

  2. Nucleotide-Specific Contrast for DNA Sequencing by Electron Spectroscopy

    DOE PAGES

    Mankos, Marian; Persson, Henrik H. J.; N’Diaye, Alpha T.; ...

    2016-05-05

    DNA sequencing by imaging in an electron microscope is an approach that holds promise to deliver long reads with low error rates and without the need for amplification. Earlier work using transmission electron microscopes, which use high electron energies on the order of 100 keV, has shown that low contrast and radiation damage necessitates the use of heavy atom labeling of individual nucleotides, which increases the read error rates. Other prior work using scattering electrons with much lower energy has shown to suppress beam damage on DNA. Here we explore possibilities to increase contrast by employing two methods, X-ray photoelectronmore » and Auger electron spectroscopy. Using bulk DNA samples with monomers of each base, both methods are shown to provide contrast mechanisms that can distinguish individual nucleotides without labels. In conclusion, both spectroscopic techniques can be readily implemented in a low energy electron microscope, which may enable label-free DNA sequencing by direct imaging.« less

  3. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2014-07-01 2014-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...

  4. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...

  5. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2012-07-01 2012-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...

  6. The complete nucleotide sequence of the glnALG operon of Escherichia coli K12.

    PubMed Central

    Miranda-Ríos, J; Sánchez-Pescador, R; Urdea, M; Covarrubias, A A

    1987-01-01

    The nucleotide sequence of the E. coli glnALG operon has been determined. The glnL (ntrB) and glnG (ntrC) genes present a high homology, at the nucleotide and aminoacid levels, with the corresponding genes of Klebsiella pneumoniae. The predicted aminoacid sequence for glutamine synthetase allowed us to locate some of the enzyme domains. The structure of this operon is discussed. PMID:2882477

  7. The nucleotide sequence of 5S ribosomal RNA from Micrococcus lysodeikticus.

    PubMed Central

    Hori, H; Osawa, S; Murao, K; Ishikura, H

    1980-01-01

    The nucleotide sequence of ribosomal 5S RNA from Micrococcus lysodeikticus is pGUUACGGCGGCUAUAGCGUGGGGGAAACGCCCGGCCGUAUAUCGAACCCGGAAGCUAAGCCCCAUAGCGCCGAUGGUUACUGUAACCGGGAGGUUGUGGGAGAGUAGGUCGCCGCCGUGAOH. When compared to other 5S RNAs, the sequence homology is greatest with Thermus aquaticus, and these two 5S RNAs reveal several features intermediate between those of typical gram-positive bacteria and gram-negative bacteria. PMID:6780979

  8. Nucleotide sequence of Hungarian grapevine chrome mosaic nepovirus RNA1.

    PubMed Central

    Le Gall, O; Candresse, T; Brault, V; Dunez, J

    1989-01-01

    The nucleotide sequence of the RNA1 of hungarian grapevine chrome mosaic virus, a nepovirus very closely related to tomato black ring virus, has been determined from cDNA clones. It is 7212 nucleotides in length excluding the 3' terminal poly(A) tail and contains a large open reading frame extending from nucleotides 216 to 6971. The presumably encoded polyprotein is 2252 amino acids in length with a molecular weight of 250 kDa. The primary structure of the polyprotein was compared with that of other viral polyproteins, revealing the same general genetic organization as that of other picorna-like viruses (comoviruses, potyviruses and picornaviruses), except that an additional protein is suspected to occupy the N-terminus of the polyprotein. PMID:2798128

  9. Comparative genomic sequence analysis of novel Helicoverpa armigera nucleopolyhedrovirus (NPV) isolated from Kenya and three other previously sequenced Helicoverpa spp. NPVs.

    PubMed

    Ogembo, Javier Gordon; Caoili, Barbara L; Shikata, Masamitsu; Chaeychomsri, Sudawan; Kobayashi, Michihiro; Ikeda, Motoko

    2009-10-01

    A newly cloned Helicoverpa armigera nucleopolyhedrovirus (HearNPV) from Kenya, HearNPV-NNg1, has a higher insecticidal activity than HearNPV-G4, which also exhibits lower insecticidal activity than HearNPV-C1. In the search for genes and/or nucleotide sequences that might be involved in the observed virulence differences among Helicoverpa spp. NPVs, the entire genome of NNg1 was sequenced and compared with previously sequenced genomes of G4, C1 and Helicoverpa zea single-nucleocapsid NPV (Hz). The NNg1 genome was 132,425 bp in length, with a total of 143 putative open reading frames (ORFs), and shared high levels of overall amino acid and nucleotide sequence identities with G4, C1 and Hz. Three NNg1 ORFs, ORF5, ORF100 and ORF124, which were shared with C1, were absent in G4 and Hz, while NNg1 and C1 were missing a homologue of G4/Hz ORF5. Another three ORFs, ORF60 (bro-b), ORF119 and ORF120, and one direct repeat sequence (dr) were unique to NNg1. Relative to the overall nucleotide sequence identity, lower sequence identities were observed between NNg1 hrs and the homologous hrs in the other three Helicoverpa spp. NPVs, despite containing the same number of hrs located at essentially the same positions on the genomes. Differences were also observed between NNg1 and each of the other three Helicoverpa spp. NPVs in the diversity of bro genes encoded on the genomes. These results indicate several putative genes and nucleotide sequences that may be responsible for the virulence differences observed among Helicoverpa spp., yet the specific genes and/or nucleotide sequences responsible have not been identified.

  10. Complete sequence analysis reveals two distinct poleroviruses infecting cucurbits in China.

    PubMed

    Xiang, Hai-ying; Shang, Qiao-xia; Han, Cheng-gui; Li, Da-wei; Yu, Jia-lin

    2008-01-01

    The complete RNA genomes of a Chinese isolate of cucurbit aphid-borne yellows virus (CABYV-CHN) and a new polerovirus tentatively referred to as melon aphid-borne yellows virus (MABYV) were determined. The entire genome of CABYV-CHN shared 89.0% nucleotide sequence identity with the French CABYV isolate. In contrast, nucleotide sequence identities between MABYV and CABYV and other poleroviruses were in the range of 50.7-74.2%, with amino acid sequence identities ranging from 24.8 to 82.9% for individual gene products. We propose that CABYV-CHN is a strain of CABYV and that MABYV is a member of a tentative distinct species within the genus Polerovirus.

  11. Complete genome sequences of two strains of Treponema pallidum subsp. pertenue from Ghana, Africa: Identical genome sequences in samples isolated more than 7 years apart.

    PubMed

    Strouhal, Michal; Mikalová, Lenka; Havlíčková, Pavla; Tenti, Paolo; Čejková, Darina; Rychlík, Ivan; Bruisten, Sylvia; Šmajs, David

    2017-09-01

    Treponema pallidum subsp. pertenue (TPE) is the causative agent of yaws, a multi-stage disease, endemic in tropical regions of Africa, Asia, Oceania, and South America. To date, four TPE strains have been completely sequenced including three TPE strains of human origin (Samoa D, CDC-2, and Gauthier) and one TPE strain (Fribourg-Blanc) isolated from a baboon. All TPE strains are highly similar to T. pallidum subsp. pallidum (TPA) strains. The mutation rate in syphilis and related treponemes has not been experimentally determined yet. Complete genomes of two TPE strains, CDC 2575 and Ghana-051, that infected patients in Ghana and were isolated in 1980 and 1988, respectively, were sequenced and analyzed. Both strains had identical consensus genome nucleotide sequences raising the question whether TPE CDC 2575 and Ghana-051 represent two different strains. Several lines of evidence support the fact that both strains represent independent samples including regions showing intrastrain heterogeneity (13 and 5 intrastrain heterogeneous sites in TPE Ghana-051 and TPE CDC 2575, respectively). Four of these heterogeneous sites were found in both genomes but the frequency of alternative alleles differed. The identical consensus genome sequences were used to estimate the upper limit of the yaws treponeme evolution rate, which was 4.1 x 10-10 nucleotide changes per site per generation. The estimated upper limit for the mutation rate of TPE was slightly lower than the mutation rate of E. coli, which was determined during a long-term experiment. Given the known diversity between TPA and TPE genomes and the assumption that both TPA and TPE have a similar mutation rate, the most recent common ancestor of syphilis and yaws treponemes appears to be more than ten thousand years old and likely even older.

  12. Control of total GFP expression by alterations to the 3′ region nucleotide sequence

    PubMed Central

    2013-01-01

    Background Previously, we distinguished the Escherichia coli type II cytoplasmic membrane translocation pathways of Tat, Yid, and Sec for unfolded and folded soluble target proteins. The translocation of folded protein to the periplasm for soluble expression via the Tat pathway was controlled by an N-terminal hydrophilic leader sequence. In this study, we investigated the effect of the hydrophilic C-terminal end and its nucleotide sequence on total and soluble protein expression. Results The native hydrophilic C-terminal end of GFP was obtained by deleting the C-terminal peptide LeuGlu-6×His, derived from pET22b(+). The corresponding clones induced total and soluble GFP expression that was either slightly increased or dramatically reduced, apparently through reconstruction of the nucleotide sequence around the stop codon in the 3′ region. In the expression-induced clones, the hydrophilic C-terminus showed increased Tat pathway specificity for soluble expression. However, in the expression-reduced clone, after analyzing the role of the 5′ poly(A) coding sequence with a substituted synonymous codon, we proved that the longer 5′ poly(A) coding sequence interacted with the reconstructed 3′ region nucleotide sequence to create a new mRNA tertiary structure between the 5′ and 3′ regions, which resulted in reduced total GFP expression. Further, to recover the reduced expression by changing the 3′ nucleotide sequence, after replacing selected C-terminal 5′ codons and the stop codon in the ORF with synonymous codons, total GFP expression in most of the clones was recovered to the undeleted control level. The insertion of trinucleotides after the stop codon in the 3′-UTR recovered or reduced total GFP expression. RT-PCR revealed that the level of total protein expression was controlled by changes in translational or transcriptional regulation, which were induced or reduced by the substitution or insertion of 3′ region nucleotides. Conclusions We found

  13. Erwinia carotovora subsp. carotovora extracellular protease: characterization and nucleotide sequence of the gene.

    PubMed Central

    Kyöstiö, S R; Cramer, C L; Lacy, G H

    1991-01-01

    The prt1 gene encoding extracellular protease from Erwinia carotovora subsp. carotovora EC14 in cosmid pCA7 was subcloned to create plasmid pSK1. The partial nucleotide sequence of the insert in pSK1 (1,878 bp) revealed a 1,041-bp open reading frame (ORF1) that correlated with protease activity in deletion mutants. ORF1 encodes a polypeptide of 347 amino acids with a calculated molecular mass of 38,826 Da. Escherichia coli transformed with pSK1 or pSK23, a subclone of pSK1, produces a protease (Prt1) intracellularly with a molecular mass of 38 kDa and a pI of 4.8. Prt1 activity was inhibited by phenanthroline, suggesting that it is a metalloprotease. The prt1 promoter was localized between 173 and 1,173 bp upstream of ORF1 by constructing transcriptional lacZ fusions. Primer extension identified the prt1 transcription start site 205 bp upstream of ORF1. The deduced amino acid sequence of ORF1 showed significant sequence identity to metalloproteases from Bacillus thermoproteolyticus (thermolysin), B. subtilis (neutral protease), Legionella pneumophila (metalloprotease), and Pseudomonas aeruginosa (elastase). It has less sequence similarity to metalloproteases from Serratia marcescens and Erwinia chrysanthemi. Locations for three zinc ligands and the active site for E. carotovora subsp. carotovora protease were predicted from thermolysin. Images FIG. 2 FIG. 5 FIG. 6 FIG. 8 FIG. 9 PMID:1917878

  14. Nucleotide sequence of the ribosomal RNA gene of Physarum polycephalum: intron 2 and its flanking regions of the 26S rRNA gene.

    PubMed Central

    Nomiyama, H; Kuhara, S; Kukita, T; Otsuka, T; Sakaki, Y

    1981-01-01

    The 26S ribosomal RNA gene of Physarum polycephalum is interrupted by two introns, and we have previously determined the sequence of one of them (intron 1) (Nomiyama et al. Proc.Natl.Acad.Sci.USA 78, 1376-1380, 1981). In this study we sequenced the second intron (intron 2) of about 0.5 kb length and its flanking regions, and found that one nucleotide at each junction is identical in intron 1 and intron 2, though the junction regions share no other sequence homology. Comparison of the flanking exon sequences to E. coli 23S rRNA sequences shows that conserved sequences are interspersed with tracts having little homology. In particular, the region encompassing the intron 2 interruption site is highly conserved. The E. coli ribosomal protein L1 binding region is also conserved. Images PMID:6171776

  15. Effects of polymerization and nucleotide identity on the conformational dynamics of the bacterial actin homolog MreB

    PubMed Central

    Colavin, Alexandre; Hsin, Jen; Huang, Kerwyn Casey

    2014-01-01

    The assembly of protein filaments drives many cellular processes, from nucleoid segregation, growth, and division in single cells to muscle contraction in animals. In eukaryotes, shape and motility are regulated through cycles of polymerization and depolymerization of actin cytoskeletal networks. In bacteria, the actin homolog MreB forms filaments that coordinate the cell-wall synthesis machinery to regulate rod-shaped growth and contribute to cellular stiffness through unknown mechanisms. Like actin, MreB is an ATPase and requires ATP to polymerize, and polymerization promotes nucleotide hydrolysis. However, it is unclear whether other similarities exist between MreB and actin because the two proteins share low sequence identity and have distinct cellular roles. Here, we use all-atom molecular dynamics simulations to reveal surprising parallels between MreB and actin structural dynamics. We observe that MreB exhibits actin-like polymerization-dependent structural changes, wherein polymerization induces flattening of MreB subunits, which restructures the nucleotide-binding pocket to favor hydrolysis. MreB filaments exhibited nucleotide-dependent intersubunit bending, with hydrolyzed polymers favoring a straighter conformation. We use steered simulations to demonstrate a coupling between intersubunit bending and the degree of flattening of each subunit, suggesting cooperative bending along a filament. Taken together, our results provide molecular-scale insight into the diversity of structural states of MreB and the relationships among polymerization, hydrolysis, and filament properties, which may be applicable to other members of the broad actin family. PMID:24550504

  16. Effects of polymerization and nucleotide identity on the conformational dynamics of the bacterial actin homolog MreB.

    PubMed

    Colavin, Alexandre; Hsin, Jen; Huang, Kerwyn Casey

    2014-03-04

    The assembly of protein filaments drives many cellular processes, from nucleoid segregation, growth, and division in single cells to muscle contraction in animals. In eukaryotes, shape and motility are regulated through cycles of polymerization and depolymerization of actin cytoskeletal networks. In bacteria, the actin homolog MreB forms filaments that coordinate the cell-wall synthesis machinery to regulate rod-shaped growth and contribute to cellular stiffness through unknown mechanisms. Like actin, MreB is an ATPase and requires ATP to polymerize, and polymerization promotes nucleotide hydrolysis. However, it is unclear whether other similarities exist between MreB and actin because the two proteins share low sequence identity and have distinct cellular roles. Here, we use all-atom molecular dynamics simulations to reveal surprising parallels between MreB and actin structural dynamics. We observe that MreB exhibits actin-like polymerization-dependent structural changes, wherein polymerization induces flattening of MreB subunits, which restructures the nucleotide-binding pocket to favor hydrolysis. MreB filaments exhibited nucleotide-dependent intersubunit bending, with hydrolyzed polymers favoring a straighter conformation. We use steered simulations to demonstrate a coupling between intersubunit bending and the degree of flattening of each subunit, suggesting cooperative bending along a filament. Taken together, our results provide molecular-scale insight into the diversity of structural states of MreB and the relationships among polymerization, hydrolysis, and filament properties, which may be applicable to other members of the broad actin family.

  17. Real-time single-molecule electronic DNA sequencing by synthesis using polymer-tagged nucleotides on a nanopore array

    PubMed Central

    Fuller, Carl W.; Kumar, Shiv; Porel, Mintu; Chien, Minchen; Bibillo, Arek; Stranges, P. Benjamin; Dorwart, Michael; Tao, Chuanjuan; Li, Zengmin; Guo, Wenjing; Shi, Shundi; Korenblum, Daniel; Trans, Andrew; Aguirre, Anne; Liu, Edward; Harada, Eric T.; Pollard, James; Bhat, Ashwini; Cech, Cynthia; Yang, Alexander; Arnold, Cleoma; Palla, Mirkó; Hovis, Jennifer; Chen, Roger; Morozova, Irina; Kalachikov, Sergey; Russo, James J.; Kasianowicz, John J.; Davis, Randy; Roever, Stefan; Church, George M.; Ju, Jingyue

    2016-01-01

    DNA sequencing by synthesis (SBS) offers a robust platform to decipher nucleic acid sequences. Recently, we reported a single-molecule nanopore-based SBS strategy that accurately distinguishes four bases by electronically detecting and differentiating four different polymer tags attached to the 5′-phosphate of the nucleotides during their incorporation into a growing DNA strand catalyzed by DNA polymerase. Further developing this approach, we report here the use of nucleotides tagged at the terminal phosphate with oligonucleotide-based polymers to perform nanopore SBS on an α-hemolysin nanopore array platform. We designed and synthesized several polymer-tagged nucleotides using tags that produce different electrical current blockade levels and verified they are active substrates for DNA polymerase. A highly processive DNA polymerase was conjugated to the nanopore, and the conjugates were complexed with primer/template DNA and inserted into lipid bilayers over individually addressable electrodes of the nanopore chip. When an incoming complementary-tagged nucleotide forms a tight ternary complex with the primer/template and polymerase, the tag enters the pore, and the current blockade level is measured. The levels displayed by the four nucleotides tagged with four different polymers captured in the nanopore in such ternary complexes were clearly distinguishable and sequence-specific, enabling continuous sequence determination during the polymerase reaction. Thus, real-time single-molecule electronic DNA sequencing data with single-base resolution were obtained. The use of these polymer-tagged nucleotides, combined with polymerase tethering to nanopores and multiplexed nanopore sensors, should lead to new high-throughput sequencing methods. PMID:27091962

  18. Species composition of the genus Saprolegnia in fin fish aquaculture environments, as determined by nucleotide sequence analysis of the nuclear rDNA ITS regions.

    PubMed

    de la Bastide, Paul Y; Leung, Wai Lam; Hintz, William E

    2015-01-01

    The ITS region of the rDNA gene was compared for Saprolegnia spp. in order to improve our understanding of nucleotide sequence variability within and between species of this genus, determine species composition in Canadian fin fish aquaculture facilities, and to assess the utility of ITS sequence variability in genetic marker development. From a collection of more than 400 field isolates, ITS region nucleotide sequences were studied and it was determined that there was sufficient consistent inter-specific variation to support the designation of species identity based on ITS sequence data. This non-subjective approach to species identification does not rely upon transient morphological features. Phylogenetic analyses comparing our ITS sequences and species designations with data from previous studies generally supported the clade scheme of Diéguez-Uribeondo et al. (2007) and found agreement with the molecular taxonomic cluster system of Sandoval-Sierra et al. (2014). Our Canadian ITS sequence collection will thus contribute to the public database and assist the clarification of Saprolegnia spp. taxonomy. The analysis of ITS region sequence variability facilitated genus- and species-level identification of unknown samples from aquaculture facilities and provided useful information on species composition. A unique ITS-RFLP for the identification of S. parasitica was also described. Copyright © 2014 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.

  19. Complete nucleotide sequence and genome organization of a novel allexivirus from alfalfa (Medicago sativa)

    USDA-ARS?s Scientific Manuscript database

    A new species of the family Alphaflexiviridae provisionally named Alfalfa virus S (AVS) was diagnosed in alfalfa samples originating from Sudan. A complete nucleotide sequence of the viral genome consisting of 8,349 nucleotides excluding the 3’ poly(A) tail was determined by Illumina NGS technology ...

  20. The C-terminal Helix of Pseudomonas aeruginosa Elongation Factor Ts Tunes EF-Tu Dynamics to Modulate Nucleotide Exchange.

    PubMed

    De Laurentiis, Evelina Ines; Mercier, Evan; Wieden, Hans-Joachim

    2016-10-28

    Little is known about the conservation of critical kinetic parameters and the mechanistic strategies of elongation factor (EF) Ts-catalyzed nucleotide exchange in EF-Tu in bacteria and particularly in clinically relevant pathogens. EF-Tu from the clinically relevant pathogen Pseudomonas aeruginosa shares over 84% sequence identity with the corresponding elongation factor from Escherichia coli Interestingly, the functionally closely linked EF-Ts only shares 55% sequence identity. To identify any differences in the nucleotide binding properties, as well as in the EF-Ts-mediated nucleotide exchange reaction, we performed a comparative rapid kinetics and mutagenesis analysis of the nucleotide exchange mechanism for both the E. coli and P. aeruginosa systems, identifying helix 13 of EF-Ts as a previously unnoticed regulatory element in the nucleotide exchange mechanism with species-specific elements. Our findings support the base side-first entry of the nucleotide into the binding pocket of the EF-Tu·EF-Ts binary complex, followed by displacement of helix 13 and rapid binding of the phosphate side of the nucleotide, ultimately leading to the release of EF-Ts. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  1. Hop stunt viroid: molecular cloning and nucleotide sequence of the complete cDNA copy.

    PubMed Central

    Ohno, T; Takamatsu, N; Meshi, T; Okada, Y

    1983-01-01

    The complete cDNA of hop stunt viroid (HSV) has been cloned by the method of Okayama and Berg (Mol.Cell.Biol.2,161-170. (1982] and the complete nucleotide sequence has been established. The covalently closed circular single-stranded HSV RNA consists of 297 nucleotides. The secondary structure predicted for HSV contains 67% of its residues base-paired. The native HSV can possess an extended rod-like structure characteristic of viroids previously established. The central region of the native HSV has a similar structure to the conserved region found in all viroids sequenced so far except for avocado sunblotch viroid. The sequence homologous to the 5'-end of U1a RNA is also found in the sequence of HSV but not in the central conserved region. Images PMID:6312412

  2. The nucleotide sequence of the putative transcription initiation site of a cloned ribosomal RNA gene of the mouse.

    PubMed Central

    Urano, Y; Kominami, R; Mishima, Y; Muramatsu, M

    1980-01-01

    Approximately one kilobase pairs surrounding and upstream the transcription initiation site of a cloned ribosomal DNA (rDNA) of the mouse were sequenced. The putative transcription initiation site was determined by two independent methods: one nuclease S1 protection and the other reverse transcriptase elongation mapping using isolated 45S ribosomal RNA precursor (45S RNA) and appropriate restriction fragments of rDNA. Both methods gave an identical result; 45S RNA had a structure starting from ACTCTTAG---. Characteristically, mouse rDNA had many T clusters (greater than or equal to 5) upstream the initiation site, the longest being 21 consecutive T's. A pentadecanucleotide, TGCCTCCCGAGTGCA, appeared twice within 260 nucleotides upstream the putative initiation site. No such characteristic sequences were found downstream this site. Little similarity was found in the upstream of the transcription initiation site between the mouse, Xenopus laevis and Saccharomyces cerevisiae rDNA. Images PMID:6162156

  3. Characterization of a dam Mutant of Serratia marcescens and Nucleotide Sequence of the dam Region

    PubMed Central

    Ostendorf, Tammo; Cherepanov, Peter; de Vries, Johann; Wackernagel, Wilfried

    1999-01-01

    The DNA of Serratia marcescens has N6-adenine methylation in GATC sequences. Among 2-aminopurine-sensitive mutants isolated from S. marcescens Sr41, one was identified which lacked GATC methylation. The mutant showed up to 30-fold increased spontaneous mutability and enhanced mutability after treatment with 2-aminopurine, ethyl methanesulfonate, or UV light. The gene (dam) coding for the adenine methyltransferase (Dam enzyme) of S. marcescens was identified on a gene bank plasmid which alleviated the 2-aminopurine sensitivity and the higher mutability of a dam-13::Tn9 mutant of Escherichia coli. Nucleotide sequencing revealed that the deduced amino acid sequence of Dam (270 amino acids; molecular mass, 31.3 kDa) has 72% identity to the Dam enzyme of E. coli. The dam gene is located between flanking genes which are similar to those found to the sides of the E. coli dam gene. The results of complementation studies indicated that like Dam of E. coli and unlike Dam of Vibrio cholerae, the Dam enzyme of S. marcescens plays an important role in mutation avoidance by allowing the mismatch repair enzymes to discriminate between the parental and newly synthesized strands during correction of replication errors. PMID:10383952

  4. Analysis of Complete Nucleotide Sequences of 12 Gossypium Chloroplast Genomes: Origin and Evolution of Allotetraploids

    PubMed Central

    Xu, Qin; Xiong, Guanjun; Li, Pengbo; He, Fei; Huang, Yi; Wang, Kunbo; Li, Zhaohu; Hua, Jinping

    2012-01-01

    Background Cotton (Gossypium spp.) is a model system for the analysis of polyploidization. Although ascertaining the donor species of allotetraploid cotton has been intensively studied, sequence comparison of Gossypium chloroplast genomes is still of interest to understand the mechanisms underlining the evolution of Gossypium allotetraploids, while it is generally accepted that the parents were A- and D-genome containing species. Here we performed a comparative analysis of 13 Gossypium chloroplast genomes, twelve of which are presented here for the first time. Methodology/Principal Findings The size of 12 chloroplast genomes under study varied from 159,959 bp to 160,433 bp. The chromosomes were highly similar having >98% sequence identity. They encoded the same set of 112 unique genes which occurred in a uniform order with only slightly different boundary junctions. Divergence due to indels as well as substitutions was examined separately for genome, coding and noncoding sequences. The genome divergence was estimated as 0.374% to 0.583% between allotetraploid species and A-genome, and 0.159% to 0.454% within allotetraploids. Forty protein-coding genes were completely identical at the protein level, and 20 intergenic sequences were completely conserved. The 9 allotetraploids shared 5 insertions and 9 deletions in whole genome, and 7-bp substitutions in protein-coding genes. The phylogenetic tree confirmed a close relationship between allotetraploids and the ancestor of A-genome, and the allotetraploids were divided into four separate groups. Progenitor allotetraploid cotton originated 0.43–0.68 million years ago (MYA). Conclusion Despite high degree of conservation between the Gossypium chloroplast genomes, sequence variations among species could still be detected. Gossypium chloroplast genomes preferred for 5-bp indels and 1–3-bp indels are mainly attributed to the SSR polymorphisms. This study supports that the common ancestor of diploid A-genome species in

  5. The human myelin oligodendrocyte glycoprotein (MOG) gene: Complete nucleotide sequence and structural characterization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Paule Roth, M.; Malfroy, L.; Offer, C.

    1995-07-20

    Human myelin oligodendrocyte glycoprotein (MOG), a myelin component of the central nervous system, is a candidate target antigen for autoimmune-mediated demyelination. We have isolated and sequenced part of a cosmid clone that contains the entire human MOG gene. The primary nuclear transcript, extending from the putative start of transcription to the site of poly(A) addition, is 15,561 nucleotides in length. The human MOG gene contains 8 exons, separated by 7 introns; canonical intron/exon boundary sites are observed at each junction. The introns vary in size from 242 to 6484 bp and contain numerous repetitive DNA elements, including 14 Alu sequencesmore » within 3 introns. Another Alu element is located in the 3{prime}-untranslated region of the gene. Alu sequences were classified with respect to subfamily assignment. Seven hundred sixty-three nucleotides 5{prime} of the transcription start and 1214 nucleotides 3{prime} of the poly(A) addition sites were also sequenced. The 5{prime}-flanking region revealed the presence of several consensus sequences that could be relevant in the transcription of the MOG gene, in particular binding sites in common with other myelin gene promoters. Two polymorphic intragenic dinucleotide (CA){sub n} and tetranucleotide (TAAA){sub n} repeats were identified and may provide genetic marker tools for association and linkage studies. 50 refs., 3 figs., 3 tabs.« less

  6. Nucleotide Sequence Diversity and Linkage Disequilibrium of Four Nuclear Loci in Foxtail Millet (Setaria italica).

    PubMed

    He, Shui-Lian; Yang, Yang; Morrell, Peter L; Yi, Ting-Shuang

    2015-01-01

    Foxtail millet (Setaria italica (L.) Beauv) is one of the earliest domesticated grains, which has been cultivated in northern China by 8,700 years before present (YBP) and across Eurasia by 4,000 YBP. Owing to a small genome and diploid nature, foxtail millet is a tractable model crop for studying functional genomics of millets and bioenergy grasses. In this study, we examined nucleotide sequence diversity, geographic structure, and levels of linkage disequilibrium at four nuclear loci (ADH1, G3PDH, IGS1 and TPI1) in representative samples of 311 landrace accessions across its cultivated range. Higher levels of nucleotide sequence and haplotype diversity were observed in samples from China relative to other sampled regions. Genetic assignment analysis classified the accessions into seven clusters based on nucleotide sequence polymorphisms. Intralocus LD decayed rapidly to half the initial value within ~1.2 kb or less.

  7. Nucleotide sequence analysis of the L gene of Newcastle disease virus: homologies with Sendai and vesicular stomatitis viruses.

    PubMed Central

    Yusoff, K; Millar, N S; Chambers, P; Emmerson, P T

    1987-01-01

    The nucleotide sequence of the L gene of the Beaudette C strain of Newcastle disease virus (NDV) has been determined. The L gene is 6704 nucleotides long and encodes a protein of 2204 amino acids with a calculated molecular weight of 248822. Mung bean nuclease mapping of the 5' terminus of the L gene mRNA indicates that the transcription of the L gene is initiated 11 nucleotides upstream of the translational start site. Comparison with the amino acid sequences of the L genes of Sendai virus and vesicular stomatitis virus (VSV) suggests that there are several regions of homology between the sequences. These data provide further evidence for an evolutionary relationship between the Paramyxoviridae and the Rhabdoviridae. A non-coding sequence of 46 nucleotides downstream of the presumed polyadenylation site of the L gene may be part of a negative strand leader RNA. Images PMID:3035486

  8. [Replication of Streptomyces plasmids: the DNA nucleotide sequence of plasmid pSB 24.2].

    PubMed

    Bolotin, A P; Sorokin, A V; Aleksandrov, N N; Danilenko, V N; Kozlov, Iu I

    1985-11-01

    The nucleotide sequence of DNA in plasmid pSB 24.2, a natural deletion derivative of plasmid pSB 24.1 isolated from S. cyanogenus was studied. The plasmid amounted by its size to 3706 nucleotide pairs. The G-C composition was equal to 73 per cent. The analysis of the DNA structure in plasmid pSB 24.2 revealed the protein-encoding sequence of DNA, the continuity of which was significant for replication of the plasmid containing more than 1300 nucleotide pairs. The analysis also revealed two A-T-rich areas of DNA, the G-C composition of which was less than 55 per cent and a DNA area with a branched pin structure. The results may be of value in investigation of plasmid replication in actinomycetes and experimental cloning of DNA with this plasmid as a vector.

  9. The nucleotide sequence and a first generation gene transfer vector of species B human adenovirus serotype 3.

    PubMed

    Sirena, Dominique; Ruzsics, Zsolt; Schaffner, Walter; Greber, Urs F; Hemmi, Silvio

    2005-12-20

    Human adenovirus (Ad) serotype 3 causes respiratory infections. It is considered highly virulent, accounting for about 13% of all Ad isolates. We report here the complete Ad3 DNA sequence of 35,343 base pairs (GenBank accession DQ086466). Ad3 shares 96.43% nucleotide identity with Ad7, another virulent subspecies B1 serotype, and 82.56 and 62.75% identity with the less virulent species B2 Ad11 and species C Ad5, respectively. The genomic organization of Ad3 is similar to the other human Ads comprising five early transcription units, E1A, E1B, E2, E3, and E4, two delayed early units IX and IVa2, and the major late unit, in total 39 putative and 7 hypothetical open reading frames. A recombinant E1-deleted Ad3 was generated on a bacterial artificial chromosome. This prototypic virus efficiently transduced CD46-positive rodent and human cells. Our results will help in clarifying the biology and pathology of adenoviruses and enhance therapeutic applications of viral vectors in clinical settings.

  10. Nucleotide sequence of wild-type hepatitis A virus GBM in comparison with two cell culture-adapted variants.

    PubMed Central

    Graff, J; Normann, A; Feinstone, S M; Flehmig, B

    1994-01-01

    In order to study cell tropism and attenuation of hepatitis A virus (HAV), the genome of HAV wild-type GBM and two cell culture-adapted variants, GBM/FRhK and GBM/HFS, were cloned and sequenced after amplification by reverse transcriptase-PCR. During virus cultivation, the HAV variant GBM/FRhK had a strict host range for FRhK-4 cells, in contrast to GBM/HFS, which can be grown in HFS and FRhK-4 cells. The HAV variant GBM/HFS was shown to be attenuated when inoculated into chimpanzees (B. Flehmig, R. F. Mauler, G. Noll, E. Weinmann, and J. P. Gregerson, p. 87-90, in A. Zuckerman, ed., Viral Hepatitis and Liver Disease, 1988). On the basis of this biological background, the comparison of the nucleotide sequences of these three HAV GBM variants should elucidate differences which may be of importance for cell tropism and attenuation. The comparison of the genome between the GBM wild type and HAV wild types HM175 (J. I. Cohen, J. R. Ticehurst, R. H. Purcell, A. Buckler-White, and B. M. Baroudy, J. Virol. 61:50-59, 1987) and HAV-LA (R. Najarian, O. Caput, W. Gee, S. J. Potter, A. Renard, J. Merryweather, G. Van Nest, and D. Dina, Proc. Natl. Acad. Sci. USA 82:2627-2631, 1985) showed a 92 to 96.3% identity, whereas the identity was 99.3 to 99.6% between the GBM variants. Nucleotide differences between the wild-type and the cell culture-adapted variants, which were identical in both cell culture-adapted GBM variants, were localized in the 5' noncoding region; in 2B, 3B, and 3D; and in the 3' noncoding region. Our result concerning the 2B/2C region confirms a mutation at position 3889 (C-->T, alanine to valine), which had been shown to be of importance for cell culture adaptation (S. U. Emerson, C. McRill, B. Rosenblum, S. M. Feinstone, and R. H. Purcell, J. Virol. 65:4882-4886, 1991; S. U. Emerson, Y. K. Huang, C. McRill, M. Lewis, and R. H. Purcell, J. Virol. 66:650-654, 1992), whereas other mutations differ from published HAV sequence data and may be cell specific

  11. Detection and quantitation of single nucleotide polymorphisms, DNA sequence variations, DNA mutations, DNA damage and DNA mismatches

    DOEpatents

    McCutchen-Maloney, Sandra L.

    2002-01-01

    DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.

  12. Nucleotide sequence of the gene determining plasmid-mediated citrate utilization.

    PubMed Central

    Ishiguro, N; Sato, G

    1985-01-01

    The citrate utilization determinant from transposon Tn3411 has been cloned and sequenced, and its polypeptide products have been characterized in minicell experiments. The nucleotide sequence was determined for a 2,047-base-pair BglII restriction endonuclease fragment that includes the citrate determinant. This region contains an open reading frame that would encode a 431-amino-acid very hydrophobic polypeptide and which is preceded by a reasonable ribosomal binding site. However, the single polypeptide found in minicell experiments had an apparent molecular weight of 35,000 on sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Images PMID:2999087

  13. Characterization of Sri Lanka rabies virus isolates using nucleotide sequence analysis of nucleoprotein gene.

    PubMed

    Arai, Y T; Takahashi, H; Kameoka, Y; Shiino, T; Wimalaratne, O; Lodmell, D L

    2001-01-01

    Thirty-four suspected rabid brain samples from 2 humans, 24 dogs, 4 cats, 2 mongooses, I jackal and I water buffalo were collected in 1995-1996 in Sri Lanka. Total RNA was extracted directly from brain suspensions and examined using a one-step reverse transcription-polymerase chain reaction (RT-PCR) for the rabies virus nucleoprotein (N) gene. Twenty-eight samples were found positive for the virus N gene by RT-PCR and also for the virus antigens by fluorescent antibody (FA) test. Rabies virus isolates obtained from different animal species in different regions of Sri Lanka were genetically homogenous. Sequences of 203 nucleotides (nt)-long RT-PCR products obtained from 16 of 27 samples were found identical. Sequences of 1350 nt of N genes of 14 RT-PCR products were determined. The Sri Lanka isolates under study formed a specific cluster that included also an earlier isolate from India but did not include the known isolates from China, Thailand, Malaysia, Israel, Iran, Oman, Saudi Arabia, Russia, Nepal, Philippines, Japan and from several other countries. These results suggest that one type of rabies virus is circulating among human, dog, cat, mongoose, jackal and water buffalo living near Colombo City and in other five remote regions in Sri Lanka.

  14. Predicted stem-loop structures and variation in nucleotide sequence of 3' noncoding regions among animal calicivirus genomes.

    PubMed

    Seal, B S; Neill, J D; Ridpath, J F

    1994-07-01

    Caliciviruses are nonenveloped with a polyadenylated genome of approximately 7.6 kb and a single capsid protein. The "RNA Fold" computer program was used to analyze 3'-terminal noncoding sequences of five feline calicivirus (FCV), rabbit hemorrhagic disease virus (RHDV), and two San Miguel sea lion virus (SMSV) isolates. The FCV 3'-terminal sequences are 40-46 nucleotides in length and 72-91% similar. The FCV sequences were predicted to contain two possible duplex structures and one stem-loop structure with free energies of -2.1 to -18.2 kcal/mole. The RHDV genomic 3'-terminal RNA sequences are 54 nucleotides in length and share 49% sequence similarity to homologous regions of the FCV genome. The RHDV sequence was predicted to form two duplex structures in the 3'-terminal noncoding region with a single stem-loop structure, resembling that of FCV. In contrast, the SMSV 1 and 4 genomic 3'-terminal noncoding sequences were 185 and 182 nucleotides in length, respectively. Ten possible duplex structures were predicted with an average structural free energy of -35 kcal/mole. Sequence similarity between the two SMSV isolates was 75%. Furthermore, extensive cloverleaflike structures are predicted in the 3' noncoding region of the SMSV genome, in contrast to the predicted single stem-loop structures of FCV or RHDV.

  15. Nucleotide sequence of an exceptionally long 5.8S ribosomal RNA from Crithidia fasciculata.

    PubMed Central

    Schnare, M N; Gray, M W

    1982-01-01

    In Crithidia fasciculata, a trypanosomatid protozoan, the large ribosomal subunit contains five small RNA species (e, f, g, i, j) in addition to 5S rRNA [Gray, M.W. (1981) Mol. Cell. Biol. 1, 347-357]. The complete primary sequence of species i is shown here to be pAACGUGUmCGCGAUGGAUGACUUGGCUUCCUAUCUCGUUGA ... AGAmACGCAGUAAAGUGCGAUAAGUGGUApsiCAAUUGmCAGAAUCAUUCAAUUACCGAAUCUUUGAACGAAACGG ... CGCAUGGGAGAAGCUCUUUUGAGUCAUCCCCGUGCAUGCCAUAUUCUCCAmGUGUCGAA(C)OH. This sequence establishes that species i is a 5.8S rRNA, despite its exceptional length (171-172 nucleotides). The extra nucleotides in C. fasciculata 5.8S rRNA are located in a region whose primary sequence and length are highly variable among 5.8S rRNAs, but which is capable of forming a stable hairpin loop structure (the "G+C-rich hairpin"). The sequence of C. fasciculata 5.8S rRNA is no more closely related to that of another protozoan, Acanthamoeba castellanii, than it is to representative 5.8S rRNA sequences from the other eukaryotic kingdoms, emphasizing the deep phylogenetic divisions that seem to exist within the Kingdom Protista. Images PMID:7079176

  16. Nucleotide Sequence Analysis of RNA Synthesized from Rabbit Globin Complementary DNA

    PubMed Central

    Poon, Raymond; Paddock, Gary V.; Heindell, Howard; Whitcome, Philip; Salser, Winston; Kacian, Dan; Bank, Arthur; Gambino, Roberto; Ramirez, Francesco

    1974-01-01

    Rabbit globin complementary DNA made with RNA-dependent DNA polymerase (reverse transcriptase) was used as template for in vitro synthesis of 32P-labeled RNA. The sequences of the nucleotides in most of the fragments resulting from combined ribonuclease T1 and alkaline phosphatase digestion have been determined. Several fragments were long enough to fit uniquely with the α or β globin amino-acid sequences. These data demonstrate that the cDNA was copied from globin mRNA and contained no detectable contaminants. Images PMID:4139714

  17. Nucleotide sequence of the gag gene and gag-pol junction of feline leukemia virus.

    PubMed Central

    Laprevotte, I; Hampe, A; Sherr, C J; Galibert, F

    1984-01-01

    The nucleotide sequence of the gag gene of feline leukemia virus and its flanking sequences were determined and compared with the corresponding sequences of two strains of feline sarcoma virus and with that of the Moloney strain of murine leukemia virus. A high degree of nucleotide sequence homology between the feline leukemia virus and murine leukemia virus gag genes was observed, suggesting that retroviruses of domestic cats and laboratory mice have a common, proximal evolutionary progenitor. The predicted structure of the complete feline leukemia virus gag gene precursor suggests that the translation of nonglycosylated and glycosylated gag gene polypeptides is initiated at two different AUG codons. These initiator codons fall in the same reading frame and are separated by a 222-base-pair segment which encodes an amino terminal signal peptide. The nucleotide sequence predicts the order of amino acids in each of the individual gag-coded proteins (p15, p12, p30, p10), all of which derive from the gag gene precursor. Stable stem-and-loop secondary structures are proposed for two regions of viral RNA. The first falls within sequences at the 5' end of the viral genome, together with adjacent palindromic sequences which may play a role in dimer linkage of RNA subunits. The second includes coding sequences at the gag-pol junction and is proposed to be involved in translation of the pol gene product. Sequence analysis of the latter region shows that the gag and pol genes are translated in different reading frames. Classical consensus splice donor and acceptor sequences could not be localized to regions which would permit synthesis of the expected gag-pol precursor protein. Alternatively, we suggest that the pol gene product (RNA-dependent DNA polymerase) could be translated by a frameshift suppressing mechanism which could involve cleavage modification of stems and loops in a manner similar to that observed in tRNA processing. PMID:6328019

  18. Detection of a divergent variant of grapevine virus F by next-generation sequencing.

    PubMed

    Molenaar, Nicholas; Burger, Johan T; Maree, Hans J

    2015-08-01

    The complete genome sequence of a South African isolate of grapevine virus F (GVF) is presented. It was first detected by metagenomic next-generation sequencing of field samples and validated through direct Sanger sequencing. The genome sequence of GVF isolate V5 consists of 7539 nucleotides and contains a poly(A) tail. It has a typical vitivirus genome arrangement that comprises five open reading frames (ORFs), which share only 88.96 % nucleotide sequence identity with the existing complete GVF genome sequence (JX105428).

  19. Iterative Correction of Reference Nucleotides (iCORN) using second generation sequencing technology.

    PubMed

    Otto, Thomas D; Sanders, Mandy; Berriman, Matthew; Newbold, Chris

    2010-07-15

    The accuracy of reference genomes is important for downstream analysis but a low error rate requires expensive manual interrogation of the sequence. Here, we describe a novel algorithm (Iterative Correction of Reference Nucleotides) that iteratively aligns deep coverage of short sequencing reads to correct errors in reference genome sequences and evaluate their accuracy. Using Plasmodium falciparum (81% A + T content) as an extreme example, we show that the algorithm is highly accurate and corrects over 2000 errors in the reference sequence. We give examples of its application to numerous other eukaryotic and prokaryotic genomes and suggest additional applications. The software is available at http://icorn.sourceforge.net

  20. Inferring epidemiological dynamics of infectious diseases using Tajima's D statistic on nucleotide sequences of pathogens.

    PubMed

    Kim, Kiyeon; Omori, Ryosuke; Ito, Kimihito

    2017-12-01

    The estimation of the basic reproduction number is essential to understand epidemic dynamics, and time series data of infected individuals are usually used for the estimation. However, such data are not always available. Methods to estimate the basic reproduction number using genealogy constructed from nucleotide sequences of pathogens have been proposed so far. Here, we propose a new method to estimate epidemiological parameters of outbreaks using the time series change of Tajima's D statistic on the nucleotide sequences of pathogens. To relate the time evolution of Tajima's D to the number of infected individuals, we constructed a parsimonious mathematical model describing both the transmission process of pathogens among hosts and the evolutionary process of the pathogens. As a case study we applied this method to the field data of nucleotide sequences of pandemic influenza A (H1N1) 2009 viruses collected in Argentina. The Tajima's D-based method estimated basic reproduction number to be 1.55 with 95% highest posterior density (HPD) between 1.31 and 2.05, and the date of epidemic peak to be 10th July with 95% HPD between 22nd June and 9th August. The estimated basic reproduction number was consistent with estimation by birth-death skyline plot and estimation using the time series of the number of infected individuals. These results suggested that Tajima's D statistic on nucleotide sequences of pathogens could be useful to estimate epidemiological parameters of outbreaks. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  1. The Complete Nucleotide Sequence of the Mitochondrial Genome of Bactrocera minax (Diptera: Tephritidae)

    PubMed Central

    Zhang, Bin; Nardi, Francesco; Hull-Sanders, Helen; Wan, Xuanwu; Liu, Yinghong

    2014-01-01

    The complete 16,043 bp mitochondrial genome (mitogenome) of Bactrocera minax (Diptera: Tephritidae) has been sequenced. The genome encodes 37 genes usually found in insect mitogenomes. The mitogenome information for B. minax was compared to the homologous sequences of Bactrocera oleae, Bactrocera tryoni, Bactrocera philippinensis, Bactrocera carambolae, Bactrocera papayae, Bactrocera dorsalis, Bactrocera correcta, Bactrocera cucurbitae and Ceratitis capitata. The analysis indicated the structure and organization are typical of, and similar to, the nine closely related species mentioned above, although it contains the lowest genome-wide A+T content (67.3%). Four short intergenic spacers with a high degree of conservation among the nine tephritid species mentioned above and B. minax were observed, which also have clear counterparts in the control regions (CRs). Correlation analysis among these ten tephritid species revealed close positive correlation between the A+T content of zero-fold degenerate sites (P0FD), the ratio of nucleotide substitution frequency at P0FD sites to all degenerate sites (zero-fold degenerate sites, two-fold degenerate sites and four-fold degenerate sites) and amino acid sequence distance (ASD) were found. Further, significant positive correlation was observed between the A+T content of four-fold degenerate sites (P4FD) and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites; however, we found significant negative correlation between ASD and the A+T content of P4FD, and the ratio of nucleotide substitution frequency at P4FD sites to all degenerate sites. A higher nucleotide substitution frequency at non-synonymous sites compared to synonymous sites was observed in nad4, the first time that has been observed in an insect mitogenome. A poly(T) stretch at the 5′ end of the CR followed by a [TA(A)]n-like stretch was also found. In addition, a highly conserved G+A-rich sequence block was observed in front of the

  2. A new single-nucleotide polymorphism database for rainbow trout generated through whole genome re-sequencing

    USDA-ARS?s Scientific Manuscript database

    Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout, SNP discovery has been done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL), RNA sequencing, and whole...

  3. Synthesis and evaluations of an acid-cleavable, fluorescently labeled nucleotide as a reversible terminator for DNA sequencing.

    PubMed

    Tan, Lianjiang; Liu, Yazhi; Li, Xiaowei; Wu, Xin-Yan; Gong, Bing; Shen, Yu-Mei; Shao, Zhifeng

    2016-02-11

    An acid-cleavable linker based on a dimethylketal moiety was synthesized and used to connect a nucleotide with a fluorophore to produce a 3'-OH unblocked nucleotide analogue as an excellent reversible terminator for DNA sequencing by synthesis.

  4. Complete nucleotide sequences of the coat protein messenger RNAs of brome mosaic virus and cowpea chlorotic mottle virus.

    PubMed Central

    Dasgupta, R; Kaesberg, P

    1982-01-01

    The nucleotide sequences of the subgenomic coat protein messengers (RNA4's) of two related bromoviruses, brome mosaic virus (BMV) and cowpea chlorotic mottle virus (CCMV), have been determined by direct RNA and CDNA sequencing without cloning. BMV RNA4 is 876 b long including a 5' noncoding region of nine nucleotides and a 3' noncoding region of 300 nucleotides. CCMV RNA 4 is 824 b long, including a 5' noncoding region of 10 nucleotides and a 3' noncoding region of 244 nucleotides. The encoded coat proteins are similar in length (188 amino acids for BMV and 189 amino acids for CCMV) and display about 70% homology in their amino acid sequences. Length difference between the two RNAs is due mostly to a single deletion, in CCMV with respect to BMV, of about 57 b immediately following the coding region. Allowing for this deletion the RNAs are indicate that mutations leading to divergence were constrained in the coding region primarily by the requirement of maintaining a favorable coat protein structure and in the 3' noncoding region primarily by the requirement of maintaining a favorable RNA spatial configuration. PMID:6895941

  5. PCV: An Alignment Free Method for Finding Homologous Nucleotide Sequences and its Application in Phylogenetic Study.

    PubMed

    Kumar, Rajnish; Mishra, Bharat Kumar; Lahiri, Tapobrata; Kumar, Gautam; Kumar, Nilesh; Gupta, Rahul; Pal, Manoj Kumar

    2017-06-01

    Online retrieval of the homologous nucleotide sequences through existing alignment techniques is a common practice against the given database of sequences. The salient point of these techniques is their dependence on local alignment techniques and scoring matrices the reliability of which is limited by computational complexity and accuracy. Toward this direction, this work offers a novel way for numerical representation of genes which can further help in dividing the data space into smaller partitions helping formation of a search tree. In this context, this paper introduces a 36-dimensional Periodicity Count Value (PCV) which is representative of a particular nucleotide sequence and created through adaptation from the concept of stochastic model of Kolekar et al. (American Institute of Physics 1298:307-312, 2010. doi: 10.1063/1.3516320 ). The PCV construct uses information on physicochemical properties of nucleotides and their positional distribution pattern within a gene. It is observed that PCV representation of gene reduces computational cost in the calculation of distances between a pair of genes while being consistent with the existing methods. The validity of PCV-based method was further tested through their use in molecular phylogeny constructs in comparison with that using existing sequence alignment methods.

  6. Complete nucleotide sequences of a new bipartite begomovirus from Malvastrum sp. plants with bright yellow mosaic symptoms in South Texas.

    PubMed

    Alabi, Olufemi J; Villegas, Cecilia; Gregg, Lori; Murray, K Daniel

    2016-06-01

    Two isolates of a novel bipartite begomovirus, tentatively named malvastrum bright yellow mosaic virus (MaBYMV), were molecularly characterized from naturally infected plants of the genus Malvastrum showing bright yellow mosaic disease symptoms in South Texas. Six complete DNA-A and five DNA-B genome sequences of MaBYMV obtained from the isolates ranged in length from 2,608 to 2,609 nucleotides (nt) and 2,578 to 2,605 nt, respectively. Both genome segments shared a 178- to 180-nt common region. In pairwise comparisons, the complete DNA-A and DNA-B sequences of MaBYMV were most similar (87-88 % and 79-81 % identity, respectively) and phylogenetically related to the corresponding sequences of sida mosaic Sinaloa virus-[MX-Gua-06]. Further analysis revealed that MaBYMV is a putative recombinant virus, thus supporting the notion that malvaceous hosts may be influencing the evolution of several begomoviruses. The design of new diagnostic primers enabled the detection of MaBYMV in cohorts of Bemisia tabaci collected from symptomatic Malvastrum sp. plants, thus implicating whiteflies as potential vectors of the virus.

  7. Complete genome sequence of a new maize-associated cytorhabdovirus

    USDA-ARS?s Scientific Manuscript database

    A new 11,877 nt cytorhabdovirus sequence with 6 open reading frames has been identified in a maize sample. It shares 50 and 51% genome-wide nucleotide sequence identity with northern cereal mosaic cytorhabdovirus (NCMV) and barley yellow striate mosaic cytorhabdovirus (BYSMV), respectively....

  8. Complete nucleotide sequence of the freshwater unicellular cyanobacterium Synechococcus elongatus PCC 6301 chromosome: gene content and organization.

    PubMed

    Sugita, Chieko; Ogata, Koretsugu; Shikata, Masamitsu; Jikuya, Hiroyuki; Takano, Jun; Furumichi, Miho; Kanehisa, Minoru; Omata, Tatsuo; Sugiura, Masahiro; Sugita, Mamoru

    2007-01-01

    The entire genome of the unicellular cyanobacterium Synechococcus elongatus PCC 6301 (formerly Anacystis nidulans Berkeley strain 6301) was sequenced. The genome consisted of a circular chromosome 2,696,255 bp long. A total of 2,525 potential protein-coding genes, two sets of rRNA genes, 45 tRNA genes representing 42 tRNA species, and several genes for small stable RNAs were assigned to the chromosome by similarity searches and computer predictions. The translated products of 56% of the potential protein-coding genes showed sequence similarities to experimentally identified and predicted proteins of known function, and the products of 35% of the genes showed sequence similarities to the translated products of hypothetical genes. The remaining 9% of genes lacked significant similarities to genes for predicted proteins in the public DNA databases. Some 139 genes coding for photosynthesis-related components were identified. Thirty-seven genes for two-component signal transduction systems were also identified. This is the smallest number of such genes identified in cyanobacteria, except for marine cyanobacteria, suggesting that only simple signal transduction systems are found in this strain. The gene arrangement and nucleotide sequence of Synechococcus elongatus PCC 6301 were nearly identical to those of a closely related strain Synechococcus elongatus PCC 7942, except for the presence of a 188.6 kb inversion. The sequences as well as the gene information shown in this paper are available in the Web database, CYORF (http://www.cyano.genome.jp/).

  9. Intercalation of XR5944 with the estrogen response element is modulated by the tri-nucleotide spacer sequence between half-sites

    PubMed Central

    Sidell, Neil; Mathad, Raveendra I.; Shu, Feng-jue; Zhang, Zhenjiang; Kallen, Caleb B.; Yang, Danzhou

    2011-01-01

    DNA-intercalating molecules can impair DNA replication, DNA repair, and gene transcription. We previously demonstrated that XR5944, a DNA bis-intercalator, specifically blocks binding of estrogen receptor-α (ERα) to the consensus estrogen response element (ERE). The consensus ERE sequence is AGGTCAnnnTGACCT, where nnn is known as the tri-nucleotide spacer. Recent work has shown that the tri-nucleotide spacer can modulate ERα-ERE binding affinity and ligand-mediated transcriptional responses. To further understand the mechanism by which XR5944 inhibits ERα-ERE binding, we tested its ability to interact with consensus EREs with variable tri-nucleotide spacer sequences and with natural but non-consensus ERE sequences using one dimensional nuclear magnetic resonance (1D 1H NMR) titration studies. We found that the tri-nucleotide spacer sequence significantly modulates the binding of XR5944 to EREs. Of the sequences that were tested, EREs with CGG and AGG spacers showed the best binding specificity with XR5944, while those spaced with TTT demonstrated the least specific binding. The binding stoichiometry of XR5944 with EREs was 2:1, which can explain why the spacer influences the drug-DNA interaction; each XR5944 spans four nucleotides (including portions of the spacer) when intercalating with DNA. To validate our NMR results, we conducted functional studies using reporter constructs containing consensus EREs with tri-nucleotide spacers CGG, CTG, and TTT. Results of reporter assays in MCF-7 cells indicated that XR5944 was significantly more potent in inhibiting the activity of CGG- than TTT-spaced EREs, consistent with our NMR results. Taken together, these findings predict that the anti-estrogenic effects of XR5944 will depend not only on ERE half-site composition but also on the tri-nucleotide spacer sequence of EREs located in the promoters of estrogen-responsive genes. PMID:21333738

  10. How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity

    NASA Technical Reports Server (NTRS)

    Fox, G. E.; Wisotzkey, J. D.; Jurtshuk, P. Jr

    1992-01-01

    16S rRNA (genes coding for rRNA) sequence comparisons were conducted with the following three psychrophilic strains: Bacillus globisporus W25T (T = type strain) and Bacillus psychrophilus W16AT, and W5. These strains exhibited more than 99.5% sequence identity and within experimental uncertainty could be regarded as identical. Their close taxonomic relationship was further documented by phenotypic similarities. In contrast, previously published DNA-DNA hybridization results have convincingly established that these strains do not belong to the same species if current standards are used. These results emphasize the important point that effective identity of 16S rRNA sequences is not necessarily a sufficient criterion to guarantee species identity. Thus, although 16S rRNA sequences can be used routinely to distinguish and establish relationships between genera and well-resolved species, very recently diverged species may not be recognizable.

  11. A set of tetra-nucleotide core motif SSR markers for efficient identification of potato (Solanum tuberosum) cultivars.

    PubMed

    Kishine, Masahiro; Tsutsumi, Katsuji; Kitta, Kazumi

    2017-12-01

    Simple sequence repeat (SSR) is a popular tool for individual fingerprinting. The long-core motif (e.g. tetra-, penta-, and hexa-nucleotide) simple sequence repeats (SSRs) are preferred because they make it easier to separate and distinguish neighbor alleles. In the present study, a new set of 8 tetra-nucleotide SSRs in potato ( Solanum tuberosum ) is reported. By using these 8 markers, 72 out of 76 cultivars obtained from Japan and the United States were clearly discriminated, while two pairs, both of which arose from natural variation, showed identical profiles. The combined probability of identity between two random cultivars for the set of 8 SSR markers was estimated to be 1.10 × 10 -8 , confirming the usefulness of the proposed SSR markers for fingerprinting analyses of potato.

  12. Finding similar nucleotide sequences using network BLAST searches.

    PubMed

    Ladunga, Istvan

    2009-06-01

    The Basic Local Alignment Search Tool (BLAST) is a keystone of bioinformatics due to its performance and user-friendliness. Beginner and intermediate users will learn how to design and submit blastn and Megablast searches on the Web pages at the National Center for Biotechnology Information. We map nucleic acid sequences to genomes, find identical or similar mRNA, expressed sequence tag, and noncoding RNA sequences, and run Megablast searches, which are much faster than blastn. Understanding results is assisted by taxonomy reports, genomic views, and multiple alignments. We interpret expected frequency thresholds, biological significance, and statistical significance. Weak hits provide no evidence, but hints for further analyses. We find genes that may code for homologous proteins by translated BLAST. We reduce false positives by filtering out low-complexity regions. Parsed BLAST results can be integrated into analysis pipelines. Links in the output connect to Entrez, PUBMED, structural, sequence, interaction, and expression databases. This facilitates integration with a wide spectrum of biological knowledge.

  13. Nucleotide sequence of the L1 ribosomal protein gene of Xenopus laevis: remarkable sequence homology among introns.

    PubMed Central

    Loreni, F; Ruberti, I; Bozzoni, I; Pierandrei-Amaldi, P; Amaldi, F

    1985-01-01

    Ribosomal protein L1 is encoded by two genes in Xenopus laevis. The comparison of two cDNA sequences shows that the two L1 gene copies (L1a and L1b) have diverged in many silent sites and very few substitution sites; moreover a small duplication occurred at the very end of the coding region of the L1b gene which thus codes for a product five amino acids longer than that coded by L1a. Quantitatively the divergence between the two L1 genes confirms that a whole genome duplication took place in Xenopus laevis approximately 30 million years ago. A genomic fragment containing one of the two L1 gene copies (L1a), with its nine introns and flanking regions, has been completely sequenced. The 5' end of this gene has been mapped within a 20-pyridimine stretch as already found for other vertebrate ribosomal protein genes. Four of the nine introns have a 60-nucleotide sequence with 80% homology; within this region some boxes, one of which is 16 nucleotides long, are 100% homologous among the four introns. This feature of L1a gene introns is interesting since we have previously shown that the activity of this gene is regulated at a post-transcriptional level and it involves the block of the normal splicing of some intron sequences. Images Fig. 3. Fig. 5. PMID:3841512

  14. The complete sequence of Cymbidium mosaic virus from Vanilla fragrans in Hainan, China.

    PubMed

    He, Zhen; Jiang, Dongmei; Liu, Aiqin; Sang, Liwei; Li, Wenfeng; Li, Shifang

    2011-06-01

    The complete nucleotide sequence of Cymbidium mosaic virus (CymMV) isolated from vanilla in Hainan province, China was determined for the first time. It comprised 6,224 nucleotides; sequence analysis suggested that the isolate we obtained was a member of the genus Potexvirus, and its sequence shared 86.67-96.61% identities with previously reported sequences. Phylogenetic analysis suggested that CymMV from vanilla fragrans was clustered into subgroup A and the isolates in this subgroup displayed little regional difference.

  15. Sequence diversity of wheat mosaic virus isolates.

    PubMed

    Stewart, Lucy R

    2016-02-02

    Wheat mosaic virus (WMoV), transmitted by eriophyid wheat curl mites (Aceria tosichella) is the causal agent of High Plains disease in wheat and maize. WMoV and other members of the genus Emaravirus evaded thorough molecular characterization for many years due to the experimental challenges of mite transmission and manipulating multisegmented negative sense RNA genomes. Recently, the complete genome sequence of a Nebraska isolate of WMoV revealed eight segments, plus a variant sequence of the nucleocapsid protein-encoding segment. Here, near-complete and partial consensus sequences of five more WMoV isolates are reported and compared to the Nebraska isolate: an Ohio maize isolate (GG1), a Kansas barley isolate (KS7), and three Ohio wheat isolates (H1, K1, W1). Results show two distinct groups of WMoV isolates: Ohio wheat isolate RNA segments had 84% or lower nucleotide sequence identity to the NE isolate, whereas GG1 and KS7 had 98% or higher nucleotide sequence identity to the NE isolate. Knowledge of the sequence variability of WMoV isolates is a step toward understanding virus biology, and potentially explaining observed biological variation. Published by Elsevier B.V.

  16. SEAN: SNP prediction and display program utilizing EST sequence clusters.

    PubMed

    Huntley, Derek; Baldo, Angela; Johri, Saurabh; Sergot, Marek

    2006-02-15

    SEAN is an application that predicts single nucleotide polymorphisms (SNPs) using multiple sequence alignments produced from expressed sequence tag (EST) clusters. The algorithm uses rules of sequence identity and SNP abundance to determine the quality of the prediction. A Java viewer is provided to display the EST alignments and predicted SNPs.

  17. Plastid: nucleotide-resolution analysis of next-generation sequencing and genomics data.

    PubMed

    Dunn, Joshua G; Weissman, Jonathan S

    2016-11-22

    Next-generation sequencing (NGS) informs many biological questions with unprecedented depth and nucleotide resolution. These assays have created a need for analytical tools that enable users to manipulate data nucleotide-by-nucleotide robustly and easily. Furthermore, because many NGS assays encode information jointly within multiple properties of read alignments - for example, in ribosome profiling, the locations of ribosomes are jointly encoded in alignment coordinates and length - analytical tools are often required to extract the biological meaning from the alignments before analysis. Many assay-specific pipelines exist for this purpose, but there remains a need for user-friendly, generalized, nucleotide-resolution tools that are not limited to specific experimental regimes or analytical workflows. Plastid is a Python library designed specifically for nucleotide-resolution analysis of genomics and NGS data. As such, Plastid is designed to extract assay-specific information from read alignments while retaining generality and extensibility to novel NGS assays. Plastid represents NGS and other biological data as arrays of values associated with genomic or transcriptomic positions, and contains configurable tools to convert data from a variety of sources to such arrays. Plastid also includes numerous tools to manipulate even discontinuous genomic features, such as spliced transcripts, with nucleotide precision. Plastid automatically handles conversion between genomic and feature-centric coordinates, accounting for splicing and strand, freeing users of burdensome accounting. Finally, Plastid's data models use consistent and familiar biological idioms, enabling even beginners to develop sophisticated analytical workflows with minimal effort. Plastid is a versatile toolkit that has been used to analyze data from multiple NGS assays, including RNA-seq, ribosome profiling, and DMS-seq. It forms the genomic engine of our ORF annotation tool, ORF-RATER, and is readily

  18. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  19. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  20. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  1. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  2. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  3. Complete nucleotide sequence of pig (Sus scrofa) mitochondrial genome and dating evolutionary divergence within Artiodactyla.

    PubMed

    Lin, C S; Sun, Y L; Liu, C Y; Yang, P C; Chang, L C; Cheng, I C; Mao, S J; Huang, M C

    1999-08-05

    The complete nucleotide sequence of the pig (Sus scrofa) mitochondrial genome, containing 16613bp, is presented in this report. The genome is not a specific length because of the presence of the variable numbers of tandem repeats, 5'-CGTGCGTACA in the displacement loop (D-loop). Genes responsible for 12S and 16S rRNAs, 22 tRNAs, and 13 protein-coding regions are found. The genome carries very few intergenic nucleotides with several instances of overlap between protein-coding or tRNA genes, except in the D-loop region. For evaluating the possible evolutionary relationships between Artiodactyla and Cetacea, the nucleotide substitutions and amino acid sequences of 13 protein-coding genes were aligned by pairwise comparisons of the pig, cow, and fin whale. By comparing these sequences, we suggest that there is a closer relationship between the pig and cow than that between either of these species and fin whale. In addition, the accumulation of transversions and gaps in pig 12S and 16S rRNA genes was compared with that in other eutherian species, including cow, fin whale, human, horse, and harbor seal. The results also reveal a close phylogenetic relationship between pig and cow, as compared to fin whale and others. Thus, according to the sequence differences of mitochondrial rRNA genes in eutherian species, the evolutionary separation of pig and cow occurred about 53-60 million years ago.

  4. Sequence determination and analysis of the NSs genes of two tospoviruses.

    PubMed

    Hallwass, Mariana; Leastro, Mikhail O; Lima, Mirtes F; Inoue-Nagata, Alice K; Resende, Renato O

    2012-03-01

    The tospoviruses groundnut ringspot virus (GRSV) and zucchini lethal chlorosis virus (ZLCV) cause severe losses in many crops, especially in solanaceous and cucurbit species. In this study, the non-structural NSs gene and the 5'UTRs of these two biologically distinct tospoviruses were cloned and sequenced. The NSs sequence of GRSV and ZLCV were both 1,404 nucleotides long. Pairwise comparison showed that the NSs amino acid sequence of GRSV shared 69.6% identity with that of ZLCV and 75.9% identity with that of TSWV, while the NSs sequence of ZLCV and TSWV shared 67.9% identity. Phylogenetic analysis based on NSs sequences confirmed that these viruses cluster in the American clade.

  5. Labeled nucleotide phosphate (NP) probes

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2009-02-03

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  6. Facilitated sequence counting and assembly by template mutagenesis

    PubMed Central

    Levy, Dan; Wigler, Michael

    2014-01-01

    Presently, inferring the long-range structure of the DNA templates is limited by short read lengths. Accurate template counts suffer from distortions occurring during PCR amplification. We explore the utility of introducing random mutations in identical or nearly identical templates to create distinguishable patterns that are inherited during subsequent copying. We simulate the applications of this process under assumptions of error-free sequencing and perfect mapping, using cytosine deamination as a model for mutation. The simulations demonstrate that within readily achievable conditions of nucleotide conversion and sequence coverage, we can accurately count the number of otherwise identical molecules as well as connect variants separated by long spans of identical sequence. We discuss many potential applications, such as transcript profiling, isoform assembly, haplotype phasing, and de novo genome assembly. PMID:25313059

  7. Nucleotide Sequence Database Comparison for Routine Dermatophyte Identification by Internal Transcribed Spacer 2 Genetic Region DNA Barcoding.

    PubMed

    Normand, A C; Packeu, A; Cassagne, C; Hendrickx, M; Ranque, S; Piarroux, R

    2018-05-01

    Conventional dermatophyte identification is based on morphological features. However, recent studies have proposed to use the nucleotide sequences of the rRNA internal transcribed spacer (ITS) region as an identification barcode of all fungi, including dermatophytes. Several nucleotide databases are available to compare sequences and thus identify isolates; however, these databases often contain mislabeled sequences that impair sequence-based identification. We evaluated five of these databases on a clinical isolate panel. We selected 292 clinical dermatophyte strains that were prospectively subjected to an ITS2 nucleotide sequence analysis. Sequences were analyzed against the databases, and the results were compared to clusters obtained via DNA alignment of sequence segments. The DNA tree served as the identification standard throughout the study. According to the ITS2 sequence identification, the majority of strains (255/292) belonged to the genus Trichophyton , mainly T. rubrum complex ( n = 184), T. interdigitale ( n = 40), T. tonsurans ( n = 26), and T. benhamiae ( n = 5). Other genera included Microsporum (e.g., M. canis [ n = 21], M. audouinii [ n = 10], Nannizzia gypsea [ n = 3], and Epidermophyton [ n = 3]). Species-level identification of T. rubrum complex isolates was an issue. Overall, ITS DNA sequencing is a reliable tool to identify dermatophyte species given that a comprehensive and correctly labeled database is consulted. Since many inaccurate identification results exist in the DNA databases used for this study, reference databases must be verified frequently and amended in line with the current revisions of fungal taxonomy. Before describing a new species or adding a new DNA reference to the available databases, its position in the phylogenetic tree must be verified. Copyright © 2018 American Society for Microbiology.

  8. LISTA, a comprehensive compilation of nucleotide sequences encoding proteins from the yeast Saccharomyces.

    PubMed Central

    Linder, P; Dölz, R; Mossé, M O; Lazowska, J; Slonimski, P P

    1993-01-01

    The amount of nucleotide sequence data is increasing exponentially. We therefore made an effort to make a comprehensive database (LISTA) for the yeast Saccharomyces cerevisiae. Each sequence has been attributed a single genetic name and in the case of allelic duplicated sequences, synonyms are given, if necessary. For the nomenclature we have introduced a standard principle for naming gene sequences based on priority rules. We have also applied a simple method to distinguish duplicated sequences of one and the same gene from non-allelic sequences of duplicated genes. By using these principles we have sorted out a lot of confusion in the literature and databanks. Along with the genetic name, the mnemonic from the EMBL databank, the codon bias, reference of the publication of the sequence and the EMBL accession numbers are included in each entry. PMID:8332521

  9. Open reading frames in a 4556 nucleotide sequence within MDV-1 BamHI-D DNA fragment: evidence for splicing of mRNA from a new viral glycoprotein gene.

    PubMed

    Becker, Y; Asher, Y; Tabor, E; Davidson, I; Malkinson, M

    1994-01-01

    A DNA segment of the MDV-1 BamHI-D fragment was sequenced, and the open reading frames (ORFs) present in the 4556 nucleotide fragment were analyzed by computer programs. Computer analysis identified 19 putative ORFs in the sequence ranging from a coding capacity of 37 amino acids (aa) (ORF-1a) to 684aa (ORF-1). The special properties of four ORFs (1a, 1, 2, and 3) were investigated. Two adjacent ORFs, ORF-1a and ORF-1, were found by computer analysis to have the properties of two introns encoding a glycoprotein: ORF-1a encodes an aa sequence with the properties of a signal peptide, and ORF-1 encodes a polypeptide with a membrane anchor domain and putative N-glycosylation sites in the aa sequence. ORF-1a and ORF-1 were found to be transcribed in MDV-1-infected cells. Two RNA transcripts were detected: a precursor RNA and its spliced form. Both are transcribed from a promoter located 5' to ORF-1a, and splice donor and acceptor sites are used to splice the mRNA after cleavage of a 71-nucleotide sequence. This finding suggest that ORF-1a and ORF-1 are two introns of a new MDV-1 glycoprotein gene. The DNA sequence containing ORF-1 was transiently expressed in COS-1 cells, and the viral protein produced in these cells was found to react with anti-MDV serotype-1 Antigen B-specific monoclonal antibodies. These studies indicate that the protein encoded by ORF-1 has antigenic properties resembling Antigen B of MDV-1. A gene homologous to ORF-1 was detected in the genome of both MDV-2(SB1) and MDV-3(HVT), which serve as commercial vaccine strains. Two additional ORFs were noted in the 4556 nucleotide sequence: ORF-2, which encodes a 333 aa polypeptide initiating in the UL and terminating in the TRL prior to the putative origin of replication, and ORF-3, which encodes a 155 aa polypeptide that is partly homologous to the phosphoprotein pp38 encoded by the BamHI-H sequence. The 65 N-terminal aa of the two gene products are identical, both being derived from the nucleotide

  10. Complete nucleotide sequence of a novel Hibiscus-infecting Cilevirus from Florida and its relationship with closely associated Cileviruses

    USDA-ARS?s Scientific Manuscript database

    The complete nucleotide sequence of a recently discovered Florida (FL) isolate of Hibiscus infecting Cilevirus (HiCV) was determined by Sanger sequencing. The movement- and coat- protein gene sequences of the HiCV-FL isolate are more divergent than other genes of the previously sequenced HiCV-HA (Ha...

  11. Phylogenetic and nucleotide sequence analysis of influenza A (H1N1) HA and NA genes of strains isolated from Saudi Arabia.

    PubMed

    Al-Qahtani, Ahmed Ali; Mubin, Muhammad; Dela Cruz, Damian M; Althawadi, Sahar Isa; Ul Rehman, Muhammad Shah Nawaz; Bohol, Marie Fe F; Al-Ahdal, Mohammed N

    2017-01-30

    In early 2009, a novel influenza A (H1N1) virus appeared in Mexico and rapidly disseminated worldwide. Little is known about the phylogeny and evolutionary dynamics of the H1N1 strain found in Saudi Arabia. Nucleotide sequencing and bioinformatics analyses were used to study molecular variation between the virus isolates. In this report, 72 hemagglutinin (HA) and 45 neuraminidase (NA) H1N1 virus gene sequences, isolated in 2009 from various regions of Saudi Arabia, were analyzed. Genetic characterization indicated that viruses from two different clades, 6 and 7, were circulating in the region, with clade 7, the most widely circulating H1N1 clade globally in 2009, being predominant. Sequence analysis of the HA and NA genes revealed a high degree of sequence identity with the corresponding genes from viruses circulating in the South East Asia region and with the A/California/7/2009 strain. New mutations in the HA gene of pandemic H1N1 (pH1N1) viruses, that could alter viral fitness, were identified. Relaxed-clock and Bayesian Skyline Plot analyses, based on the isolates used in this study and closely related globally representative strains, indicated marginally higher substitution rates than the type strain (5.14×10-3 and 4.18×10-3 substitutions/nucleotide/year in the HA and NA genes, respectively). The Saudi isolates were antigenically homogeneous and closely related to the prototype vaccine strain A/California/7/2009. The antigenic site of the HA gene had acquired novel mutations in some isolates, making continued monitoring of these viruses vital for the identification of potentially highly virulent and drug resistant variants.

  12. A clustering package for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Model.

    PubMed

    Bruneau, Marine; Mottet, Thierry; Moulin, Serge; Kerbiriou, Maël; Chouly, Franz; Chretien, Stéphane; Guyeux, Christophe

    2018-02-01

    In this article, a new Python package for nucleotide sequences clustering is proposed. This package, freely available on-line, implements a Laplacian eigenmap embedding and a Gaussian Mixture Model for DNA clustering. It takes nucleotide sequences as input, and produces the optimal number of clusters along with a relevant visualization. Despite the fact that we did not optimise the computational speed, our method still performs reasonably well in practice. Our focus was mainly on data analytics and accuracy and as a result, our approach outperforms the state of the art, even in the case of divergent sequences. Furthermore, an a priori knowledge on the number of clusters is not required here. For the sake of illustration, this method is applied on a set of 100 DNA sequences taken from the mitochondrially encoded NADH dehydrogenase 3 (ND3) gene, extracted from a collection of Platyhelminthes and Nematoda species. The resulting clusters are tightly consistent with the phylogenetic tree computed using a maximum likelihood approach on gene alignment. They are coherent too with the NCBI taxonomy. Further test results based on synthesized data are then provided, showing that the proposed approach is better able to recover the clusters than the most widely used software, namely Cd-hit-est and BLASTClust. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. The nucleotide sequence of 5S rRNA from a cellular slime mold Dictyostelium discoideum.

    PubMed Central

    Hori, H; Osawa, S; Iwabuchi, M

    1980-01-01

    The nucleotide sequence of ribosomal 5S rRNA from a cellular slime mold Dictyostelium discoideum is GUAUACGGCCAUACUAGGUUGGAAACACAUCAUCCCGUUCGAUCUGAUA AGUAAAUCGACCUCAGGCCUUCCAAGUACUCUGGUUGGAGACAACAGGGGAACAUAGGGUGCUGUAUACU. A model for the secondary structure of this 5S rRNA is proposed. The sequence is more similar to those of animals (62% similarity on the average) rather than those of yeasts (56%). Images PMID:7465421

  14. The complete nucleotide sequence of RNA 3 of a peach isolate of Prunus necrotic ringspot virus.

    PubMed

    Hammond, R W; Crosslin, J M

    1995-04-01

    The complete nucleotide sequence of RNA 3 of the PE-5 peach isolate of Prunus necrotic ringspot ilarvirus (PNRSV) was obtained from cloned cDNA. The RNA sequence is 1941 nucleotides and contains two open reading frames (ORFs). ORF 1 consisted of 284 amino acids with a calculated molecular weight of 31,729 Da and ORF 2 contained 224 amino acids with a calculated molecular weight of 25,018 Da. ORF 2 corresponds to the coat protein gene. Expression of ORF 2 engineered into a pTrcHis vector in Escherichia coli results in a fusion polypeptide of approximately 28 kDa which cross-reacts with PNRSV polyclonal antiserum. Analysis of the coat protein amino acid sequence reveals a putative "zinc-finger" domain at the amino-terminal portion of the protein. Two tetranucleotide AUGC motifs occur in the 3'-UTR of the RNA and may function in coat protein binding and genome activation. ORF 1 homologies to other ilarviruses and alfalfa mosaic virus are confined to limited regions of conserved amino acids. The translated amino acid sequence of the coat protein gene shows 92% similarity to one isolate of apple mosaic virus, a closely related member of the ilarvirus group of plant viruses, but only 66% similarity to the amino acid sequence of the coat protein gene of a second isolate. These relationships are also reflected at the nucleotide sequence level. These results in one instance confirm the close similarities observed at the biophysical and serological levels between these two viruses, but on the other hand call into question the nomenclature used to describe these viruses.

  15. Nucleotide cleaving agents and method

    DOEpatents

    Que, Jr., Lawrence; Hanson, Richard S.; Schnaith, Leah M. T.

    2000-01-01

    The present invention provides a unique series of nucleotide cleaving agents and a method for cleaving a nucleotide sequence, whether single-stranded or double-stranded DNA or RNA, using and a cationic metal complex having at least one polydentate ligand to cleave the nucleotide sequence phosphate backbone to yield a hydroxyl end and a phosphate end.

  16. ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

    PubMed

    Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

    2012-09-08

    The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  17. The nucleotide sequences of 5S rRNAs from a rotifer, Brachionus plicatilis, and two nematodes, Rhabditis tokai and Caenorhabditis elegans.

    PubMed Central

    Kumazaki, T; Hori, H; Osawa, S; Ishii, N; Suzuki, K

    1982-01-01

    The nucleotide sequences of 5S rRNAs from a rotifer, Brachionus plicatilis, and two nematodes, Rhabditis tokai and Caenorhabditis elegans have been determined. The rotifer has two 5S rRNA species that are composed of 120 and 121 nucleotides, respectively. The sequences of these two 5S rRNAs are the same except that the latter has an additional base at its 3'-terminus. The 5S rRNAs from the two nematode species are both 119 nucleotides long. The sequence similarity percents are 79% (Brachionus/Rhabditis), 80% (Brachionus/Caenorhabditis), and 95% (Rhabditis/Caenorhabditis) among these three species. Brachionus revealed the highest similarity to Lingula (89%), but not to the nematodes (79%). PMID:6891053

  18. The nucleotide sequences of 5S rRNAs from a rotifer, Brachionus plicatilis, and two nematodes, Rhabditis tokai and Caenorhabditis elegans.

    PubMed

    Kumazaki, T; Hori, H; Osawa, S; Ishii, N; Suzuki, K

    1982-11-11

    The nucleotide sequences of 5S rRNAs from a rotifer, Brachionus plicatilis, and two nematodes, Rhabditis tokai and Caenorhabditis elegans have been determined. The rotifer has two 5S rRNA species that are composed of 120 and 121 nucleotides, respectively. The sequences of these two 5S rRNAs are the same except that the latter has an additional base at its 3'-terminus. The 5S rRNAs from the two nematode species are both 119 nucleotides long. The sequence similarity percents are 79% (Brachionus/Rhabditis), 80% (Brachionus/Caenorhabditis), and 95% (Rhabditis/Caenorhabditis) among these three species. Brachionus revealed the highest similarity to Lingula (89%), but not to the nematodes (79%).

  19. Complete genome sequence of a novel genotype of squash mosaic virus

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of a novel genotype of Squash mosaic virus (SqMV) infecting squash plants in Spain was obtained using deep sequencing of small ribonucleic acids and assembly. The low nucleotide sequence identities, with 87-88% on RNA1 and 84-86% on RNA2 to known SqMV isolates, suggest a new...

  20. Complete genome sequence of a new begomovirus associated with yellow mosaic disease of Hemidesmus indicus in India.

    PubMed

    Reddy, M Sreekanth; Kanakala, S; Srinivas, K P; Hema, M; Malathi, V G; Sreenivasulu, P

    2014-05-01

    The complete DNA A genome of a virus isolate associated with yellow mosaic disease of a medicinal plant, Hemidesmus indicus, from India was cloned and sequenced. The length of DNA A was 2825 nucleotides, 35 nucleotides longer than the unit genome of monopartite begomoviruses. Comparison of the nucleotide sequence of DNA A of the virus isolate with those of other begomoviruses showed maximum sequence identity of 69 % to DNA A of ageratum yellow vein China virus (AYVCNV; AJ558120) and 68 % with tomato yellow leaf curl virus- LBa4 (TYLCV; EF185318), and it formed a distinct clade in phylogenetic analysis. The genome organization of the present virus isolate was found to be similar to that of Old World monopartite begomoviruses. The genome was considered to be monopartite, because association of DNA B and β satellite DNA components was not detected. Based on its sequence identity (<70 %) to all other begomoviruses known to date and ICTV (International Committee on Taxonomy of Viruses) species demarcating criteria (<89 % identity), it is considered a member of a novel begomovirus species, and the tentative name "Hemidesmus yellow mosaic virus" (HeYMV) is proposed.

  1. Sequencing and phylogenetic analysis of tobacco virus 2, a polerovirus from Nicotiana tabacum.

    PubMed

    Zhou, Benguo; Wang, Fang; Zhang, Xuesong; Zhang, Lina; Lin, Huafeng

    2017-07-01

    The complete genome sequence of a new virus, provisionally named tobacco virus 2 (TV2), was determined and identified from leaves of tobacco (Nicotiana tabacum) exhibiting leaf mosaic, yellowing, and deformity, in Anhui Province, China. The genome sequence of TV2 comprises 5,979 nucleotides, with 87% nucleotide sequence identity to potato leafroll virus (PLRV). Its genome organization is similar to that of PLRV, containing six open reading frames (ORFs) that potentially encode proteins with putative functions in cell-to-cell movement and suppression of RNA silencing. Phylogenetic analysis of the nucleotide sequence placed TV2 alongside members of the genus Polerovirus in the family Luteoviridae. To the best our knowledge, this study is the first report of a complete genome sequence of a new polerovirus identified in tobacco.

  2. A first report and complete genome sequence of alfalfa enamovirus from Sudan

    USDA-ARS?s Scientific Manuscript database

    A full genome sequence of a viral pathogen, provisionally named alfalfa enamovirus 2 (AEV-2), was reconstructed from short reads obtained by Illumina RNA sequencing of alfalfa sample originating from Sudan. Ambiguous nucleotides in the resultant consensus assembly and identity of the predicted virus...

  3. Nucleotide sequences of immunoglobulin eta genes of chimpanzee and orangutan: DNA molecular clock and hominoid evolution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sakoyama, Y.; Hong, K.J.; Byun, S.M.

    To determine the phylogenetic relationships among hominoids and the dates of their divergence, the complete nucleotide sequences of the constant region of the immunoglobulin eta-chain (C/sub eta1/) genes from chimpanzee and orangutan have been determined. These sequences were compared with the human eta-chain constant-region sequence. A molecular clock (silent molecular clock), measured by the degree of sequence divergence at the synonymous (silent) positions of protein-encoding regions, was introduced for the present study. From the comparison of nucleotide sequences of ..cap alpha../sub 1/-antitrypsin and ..beta..- and delta-globulin genes between humans and Old World monkeys, the silent molecular clock was calibrated: themore » mean evolutionary rate of silent substitution was determined to be 1.56 x 10/sup -9/ substitutions per site per year. Using the silent molecular clock, the mean divergence dates of chimpanzee and orangutan from the human lineage were estimated as 6.4 +/- 2.6 million years and 17.3 +/- 4.5 million years, respectively. It was also shown that the evolutionary rate of primate genes is considerably slower than those of other mammalian genes.« less

  4. Nucleotide sequences of bovine alpha S1- and kappa-casein cDNAs.

    PubMed Central

    Stewart, A F; Willis, I M; Mackinlay, A G

    1984-01-01

    The nucleotide sequences corresponding to bovine alpha S1- and kappa-casein mRNAs are presented. An unusual alpha S1-casein cDNA has been characterised whose 5' end commences upstream from its putative TATA box. The alpha S1-casein mRNA is compared to rat alpha-casein mRNA and two components of divergence are identified. Firstly, the two sequences have diverged at a high point mutation rate and the rate of amino acid replacement by this mechanism is at least as great as the rate of divergence of any other part of the mRNAs. Secondly, the protein coding sequence has been subjected to several insertion/deletion events, one of which may be an example of exon shuffling . The kappa-casein mRNA sequence verifies the proposition that it has arisen from a different ancestral gene to the other caseins. Images PMID:6328443

  5. DNA Sequence-Dependent Ionic Currents in Ultra-Small Solid-State Nanopores†

    PubMed Central

    Comer, Jeffrey

    2016-01-01

    Measurements of ionic currents through nanopores partially blocked by DNA have emerged as a powerful method for characterization of the DNA nucleotide sequence. Although the effect of the nucleotide sequence on the nanopore blockade current has been experimentally demonstrated, prediction and interpretation of such measurements remain a formidable challenge. Using atomic resolution computational approaches, here we show how the sequence, molecular conformation, and pore geometry affect the blockade ionic current in model solid-state nanopores. We demonstrate that the blockade current from a DNA molecule is determined by the chemical identities and conformations of at least three consecutive nucleotides. We find the blockade currents produced by the nucleotide triplets to vary considerably with their nucleotide sequence despite having nearly identical molecular conformations. Encouragingly, we find blockade current differences as large as 25% for single-base substitutions in ultra small (1.6 nm × 1.1 nm cross section; 2 nm length) solid-state nanopores. Despite the complex dependence of the blockade current on the sequence and conformation of the DNA triplets, we find that, under many conditions, the number of thymine bases is positively correlated with the current, whereas the number of purine bases and the presence of both purine and pyrimidines in the triplet are negatively correlated with the current. Based on these observations, we construct a simple theoretical model that relates the ion current to the base content of a solid-state nanopore. Furthermore, we show that compact conformations of DNA in narrow pores provide the greatest signal-to-noise ratio for single base detection, whereas reduction of the nanopore length increases the ionic current noise. Thus, the sequence dependence of nanopore blockade current can be theoretically rationalized, although the predictions will likely need to be customized for each nanopore type. PMID:27103233

  6. Mapping DNA methylation by transverse current sequencing: Reduction of noise from neighboring nucleotides

    NASA Astrophysics Data System (ADS)

    Alvarez, Jose; Massey, Steven; Kalitsov, Alan; Velev, Julian

    Nanopore sequencing via transverse current has emerged as a competitive candidate for mapping DNA methylation without needed bisulfite-treatment, fluorescent tag, or PCR amplification. By eliminating the error producing amplification step, long read lengths become feasible, which greatly simplifies the assembly process and reduces the time and the cost inherent in current technologies. However, due to the large error rates of nanopore sequencing, single base resolution has not been reached. A very important source of noise is the intrinsic structural noise in the electric signature of the nucleotide arising from the influence of neighboring nucleotides. In this work we perform calculations of the tunneling current through DNA molecules in nanopores using the non-equilibrium electron transport method within an effective multi-orbital tight-binding model derived from first-principles calculations. We develop a base-calling algorithm accounting for the correlations of the current through neighboring bases, which in principle can reduce the error rate below any desired precision. Using this method we show that we can clearly distinguish DNA methylation and other base modifications based on the reading of the tunneling current.

  7. Molecular characterization of a novel rhabdovirus infecting blackcurrant identified by high-throughput sequencing.

    PubMed

    Wu, L-P; Yang, T; Liu, H-W; Postman, J; Li, R

    2018-05-01

    A large contig with sequence similarities to several nucleorhabdoviruses was identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genome sequence of this new nucleorhabdovirus is 14,432 nucleotides long. Its genomic organization is very similar to those of unsegmented plant rhabdoviruses, containing six open reading frames in the order 3'-N-P-P3-M-G-L-5. The virus, which is provisionally named "black currant-associated rhabdovirus", is 41-52% identical in its genome nucleotide sequence to other nucleorhabdoviruses and may represent a new species in the genus Nucleorhabdovirus.

  8. Nucleotide sequence and further characterization of the Synechococcus sp. strain PCC 7002 recA gene: complementation of a cyanobacterial recA mutation by the Escherichia coli recA gene.

    PubMed Central

    Murphy, R C; Gasparich, G E; Bryant, D A; Porter, R D

    1990-01-01

    The nucleotide sequence and transcript initiation site of the Synechococcus sp. strain PCC 7002 recA gene have been determined. The deduced amino acid sequence of the RecA protein of this cyanobacterium is 56% identical and 73% similar to the Escherichia coli RecA protein. Northern (RNA) blot analysis indicates that the Synechococcus strain PCC 7002 recA gene is transcribed as a monocistronic transcript 1,200 bases in length. The 5' endpoint of the recA mRNA was mapped by primer extension by using synthetic oligonucleotides of 17 and 27 nucleotides as primers. The nucleotide sequence 5' to the mapped endpoint contained sequence motifs bearing a striking resemblance to the heat shock (sigma 32-specific) promoters of E. coli but did not contain sequences similar to the E. coli SOS operator recognized by the LexA repressor. An insertion mutation introduced into the recA locus of Synechococcus strain PCC 7002 via homologous recombination resulted in the formation of diploids carrying both mutant and wild-type recA alleles. A variety of growth regimens and transformation procedures failed to produce a recA Synechococcus strain PCC 7002 mutant. However, introduction into these diploid cells of the E. coli recA gene in trans on a biphasic shuttle vector resulted in segregation of the cyanobacterial recA alleles, indicating that the E. coli recA gene was able to provide a function required for growth of recA Synechococcus strain PCC 7002 cells. This interpretation is supported by the observation that the E. coli recA gene is maintained in these cells when antibiotic selection for the shuttle vector is removed. Images FIG. 3 FIG. 4 FIG. 6 PMID:2105307

  9. Nucleotide excision repair by dual incisions in plants.

    PubMed

    Canturk, Fazile; Karaman, Muhammet; Selby, Christopher P; Kemp, Michael G; Kulaksiz-Erkmen, Gulnihal; Hu, Jinchuan; Li, Wentao; Lindsey-Boltz, Laura A; Sancar, Aziz

    2016-04-26

    Plants use light for photosynthesis and for various signaling purposes. The UV wavelengths in sunlight also introduce DNA damage in the form of cyclobutane pyrimidine dimers (CPDs) and pyrimidine (6-4) pyrimidone photoproducts [(6-4)PPs] that must be repaired for the survival of the plant. Genome sequencing has revealed the presence of genes for both CPD and (6-4)PP photolyases, as well as genes for nucleotide excision repair in plants, such as Arabidopsis and rice. Plant photolyases have been purified, characterized, and have been shown to play an important role in plant survival. In contrast, even though nucleotide excision repair gene homologs have been found in plants, the mechanism of nucleotide excision repair has not been investigated. Here we used the in vivo excision repair assay developed in our laboratory to demonstrate that Arabidopsis removes CPDs and (6-4)PPs by a dual-incision mechanism that is essentially identical to the mechanism of dual incisions in humans and other eukaryotes, in which oligonucleotides with a mean length of 26-27 nucleotides are removed by incising ∼20 phosphodiester bonds 5' and 5 phosphodiester bonds 3' to the photoproduct.

  10. Nucleotide sequencing and serological evidence that the recently recognized deer tick virus is a genotype of Powassan virus.

    PubMed

    Beasley, D W; Suderman, M T; Holbrook, M R; Barrett, A D

    2001-11-05

    Deer tick virus (DTV) is a recently recognized North American virus isolated from Ixodes dammini ticks. Nucleotide sequencing of fragments of structural and non-structural protein genes suggested that this virus was most closely related to the tick-borne flavivirus Powassan (POW), which causes potentially fatal encephalitis in humans. To determine whether DTV represents a new and distinct member of the Flavivirus genus of the family Flaviviridae, we sequenced the structural protein genes and 5' and 3' non-coding regions of this virus. In addition, we compared the reactivity of DTV and POW in hemagglutination inhibition tests with a panel of polyclonal and monoclonal antisera, and performed cross-neutralization experiments using anti-DTV antisera. Nucleotide sequencing revealed a high degree of homology between DTV and POW at both nucleotide (>80% homology) and amino acid (>90% homology) levels, and the two viruses were indistinguishable in serological assays and mouse neuroinvasiveness. On the basis of these results, we suggest that DTV should be classified as a genotype of POW virus.

  11. Sequence analysis of the internal transcribed spacer (ITS) region reveals a novel clade of Ichthyophonus sp. from rainbow trout

    USGS Publications Warehouse

    Rasmussen, C.; Purcell, M.K.; Gregg, J.L.; LaPatra, S.E.; Winton, J.R.; Hershberger, P.K.

    2010-01-01

    The mesomycetozoean parasite Ichthyophonus hoferi is most commonly associated with marine fish hosts but also occurs in some components of the freshwater rainbow trout Oncorhynchus mykiss aquaculture industry in Idaho, USA. It is not certain how the parasite was introduced into rainbow trout culture, but it might have been associated with the historical practice of feeding raw, ground common carp Cyprinus carpio that were caught by commercial fisherman. Here, we report a major genetic division between west coast freshwater and marine isolates of Ichthyophonus hoferi. Sequence differences were not detected in 2 regions of the highly conserved small subunit (18S) rDNA gene; however, nucleotide variation was seen in internal transcribed spacer loci (ITS1 and ITS2), both within and among the isolates. Intra-isolate variation ranged from 2.4 to 7.6 nucleotides over a region consisting of ~740 bp. Majority consensus sequences from marine/anadromous hosts differed in only 0 to 3 nucleotides (99.6 to 100% nucleotide identity), while those derived from freshwater rainbow trout had no nucleotide substitutions relative to each other. However, the consensus sequences between isolates from freshwater rainbow trout and those from marine/anadromous hosts differed in 13 to 16 nucleotides (97.8 to 98.2% nucleotide identity).

  12. Nucleotide sequence of the gene for the Mr 32,000 thylakoid membrane protein from Spinacia oleracea and Nicotiana debneyi predicts a totally conserved primary translation product of Mr 38,950

    PubMed Central

    Zurawski, Gerard; Bohnert, Hans J.; Whitfeld, Paul R.; Bottomley, Warwick

    1982-01-01

    The gene for the so-called Mr 32,000 rapidly labeled photosystem II thylakoid membrane protein (here designated psbA) of spinach (Spinacia oleracea) chloroplasts is located on the chloroplast DNA in the large single-copy region immediately adjacent to one of the inverted repeat sequences. In this paper we show that the size of the mRNA for this protein is ≈ 1.25 kilobases and that the direction of transcription is towards the inverted repeat unit. The nucleotide sequence of the gene and its flanking regions is presented. The only large open reading frame in the sequence codes for a protein of Mr 38,950. The nucleotide sequence of psbA from Nicotiana debneyi also has been determined, and comparison of the sequences from the two species shows them to be highly conserved (>95% homology) throughout the entire reading frame. Conservation of the amino acid sequence is absolute, there being no changes in a total of 353 residues. This leads us to conclude that the primary translation product of psbA must be a protein of Mr 38,950. The protein is characterized by the complete absence of lysine residues and is relatively rich in hydrophobic amino acids, which tend to be clustered. Transcription of spinach psbA starts about 86 base pairs before the first ATG codon. Immediately upstream from this point there is a sequence typical of that found in E. coli promoters. An almost identical sequence occurs in the equivalent region of N. debneyi DNA. Images PMID:16593262

  13. T box transcription antitermination riboswitch: Influence of nucleotide sequence and orientation on tRNA binding by the antiterminator element

    PubMed Central

    Fauzi, Hamid; Agyeman, Akwasi; Hines, Jennifer V.

    2008-01-01

    Many bacteria utilize riboswitch transcription regulation to monitor and appropriately respond to cellular levels of important metabolites or effector molecules. The T box transcription antitermination riboswitch responds to cognate uncharged tRNA by specifically stabilizing an antiterminator element in the 5′-untranslated mRNA leader region and precluding formation of a thermodynamically more stable terminator element. Stabilization occurs when the tRNA acceptor end base pairs with the first four nucleotides in the seven nucleotide bulge of the highly conserved antiterminator element. The significance of the conservation of the antiterminator bulge nucleotides that do not base pair with the tRNA is unknown, but they are required for optimal function. In vitro selection was used to determine if the isolated antiterminator bulge context alone dictates the mode in which the tRNA acceptor end binds the bulge nucleotides. No sequence conservation beyond complementarity was observed and the location was not constrained to the first four bases of the bulge. The results indicate that formation of a structure that recognizes the tRNA acceptor end in isolation is not the determinant driving force for the high phylogenetic sequence conservation observed within the antiterminator bulge. Additional factors or T box leader features more likely influenced the phylogenetic sequence conservation. PMID:19152843

  14. The repeating nucleotide sequence in the repetitive mitochondrial DNA from a "low-density" petite mutant of yeast.

    PubMed Central

    Van Kreijl, C F; Bos, J L

    1977-01-01

    The repeating nucleotide sequence of 68 base pairs in the mtDNA from an ethidium-induced cytoplasmic petite mutant of yeast has been determined. For sequence analysis specifically primed and terminated RNA copies, obtained by in vitro transcription of the separated strands, were use. The sequence consists of 66 consecutive AT base pairs flanked by two GC pairs and comprises nearly all of the mutant mitochondrial genome. The sequence, moreover, also represents the first part of wild-type mtDNA sequence so far. Images PMID:198740

  15. Complete nucleotide sequence of spring beauty latent virus, a bromovirus infectious to Arabidopsis thaliana.

    PubMed

    Fujisaki, K; Hagihara, F; Kaido, M; Mise, K; Okuno, T

    2003-01-01

    Spring beauty latent virus (SBLV), a bromovirus, systemically and efficiently infected Arabidopsis thaliana, whereas the well-studied bromoviruses brome mosaic virus (BMV) and cowpea chlorotic mottle virus (CCMV) did not infect and poorly infected A. thaliana, respectively. We constructed biologically active cDNA clones of SBLV genomic RNAs and determined their complete nucleotide sequences. Interestingly, SBLV RNA3 contains both the box B motif in the intercistronic region, as does BMV, and the subgenomic promoter-like sequence in the 5' noncoding region, as does CCMV. Sequence comparisons of SBLV, BMV, CCMV, and broad bean mottle virus demonstrated that SBLV is closely related to BMV and CCMV.

  16. Nucleotide sequence of the Varkud mitochondrial plasmid of Neurospora and synthesis of a hybrid transcript with a 5' leader derived from mitochondrial RNA.

    PubMed

    Akins, R A; Grant, D M; Stohl, L L; Bottorff, D A; Nargang, F E; Lambowitz, A M

    1988-11-05

    The Mauriceville and Varkud mitochondrial plasmids of Neurospora are closely related, closed circular DNAs (3.6 and 3.7 kb, respectively; 1 kb = 10(3) bases or base-pairs), whose characteristics suggest relationships to mitochondrial DNA introns and retrotransposons. Here, we characterized the structure of the Varkud plasmid, determined its complete nucleotide sequence and mapped its major transcripts. The Mauriceville and Varkud plasmids have more than 97% positional identity. Both plasmids contain a 710 amino acid open reading frame that encodes a reverse transcriptase-like protein. The amino acid sequence of this open reading frame is strongly conserved between the two plasmids (701/710 amino acids) as expected for a functionally important protein. Both plasmids have a 0.4 kb region that contains five PstI palindromes and a direct repeat of approximately 160 base-pairs. Comparison of sequences in this region suggests that the Varkud plasmid has diverged less from a common ancestor than has the Mauriceville plasmid. Two major transcripts of the Varkud plasmid were detected by Northern hybridization experiments: a full-length linear RNA of 3.7 kb and an additional prominent transcript of 4.9 kb, 1.2 kb longer than monomer plasmid. Remarkably, we find that the 4.9 kb transcript is a hybrid RNA consisting of the full-length 3.7 kb Varkud plasmid transcript plus a 5' leader of 1.2 kb that is derived from the 5' end of the mitochondrial small rRNA. This and other findings suggest that the Varkud plasmid, like certain RNA viruses, has a mechanism for joining heterologous RNAs to the 5' end of its major transcript, and that, under some circumstances, nucleotide sequences in mitochondria may be recombined at the RNA level.

  17. Cloning and sequencing of the allophycocyanin genes from Spirulina maxima (Cyanophyta)

    NASA Astrophysics Data System (ADS)

    Qin, Song; Hiroyuki, Kojima; Yoshikazu, Kawata; Shin-Ichi, Yano; Zeng, Cheng-Kui

    1998-03-01

    The genes coding for the α-and β-subunit of allophycocyanin ( apcA and apcB) from the cyanophyte Spirulina maxima were cloned and sequenced. The results revealed 44.4% of nucleotide sequence similarity and 30.4% of similarity of deduced amino acid sequence between them. The amino acid sequence identities between S. maxima and S. platensis are 99.4% for α subunit and 100% for β subunit.

  18. Nucleotide sequence of RNA2 of Lettuce big-vein virus and evidence for a possible transcription termination/initiation strategy similar to that of rhabdoviruses.

    PubMed

    Sasaya, Takahide; Kusaba, Shinnosuke; Ishikawa, Koichi; Koganezawa, Hiroki

    2004-09-01

    Lettuce big-vein virus (LBVV) is the type species of the genus Varicosavirus and is a two-segmented negative-sense single-stranded RNA virus. The larger LBVV genome segment (RNA1) consists of 6797 nt and encodes an L polymerase that resembles that of rhabdoviruses. Here, the nucleotide sequence of the second LBVV genome segment (RNA2) is reported. LBVV RNA2 consisted of 6081 nt and contained antisense information for five major ORFs: ORF1 (nt 210-1403 on the viral RNA), ORF2 (nt 1493-2494), ORF3 (nt 2617-3489), ORF4 (nt 3843-4337) and ORF5 (nt 4530-5636), which had coding capacities of 44, 36, 32, 19 and 41 kDa, respectively. The gene at the 3' end of the viral RNA encoded a coat protein, while the other four genes encoded proteins of unknown functions. The 3'-terminal 11 nt of LBVV RNA2 were identical to those of LBVV RNA1, and the 5'-terminal regions of LBVV RNA1 and RNA2 contained a long common nucleotide stretch of about 100 nt. Northern blot analysis using probes specific to the individual ORFs revealed that LBVV transcribes monocistronic RNAs. Analysis of the terminal sequences, and primer extension and RNase H digestion analysis of LBVV mRNAs, suggested that LBVV utilizes a transcription termination/initiation strategy comparable with that of rhabdoviruses.

  19. Nucleotide Sequence and Genetic Structure of a Novel Carbaryl Hydrolase Gene (cehA) from Rhizobium sp. Strain AC100

    PubMed Central

    Hashimoto, Masayuki; Fukui, Mitsuru; Hayano, Kouichi; Hayatsu, Masahito

    2002-01-01

    Rhizobium sp. strain AC100, which is capable of degrading carbaryl (1-naphthyl-N-methylcarbamate), was isolated from soil treated with carbaryl. This bacterium hydrolyzed carbaryl to 1-naphthol and methylamine. Carbaryl hydrolase from the strain was purified to homogeneity, and its N-terminal sequence, molecular mass (82 kDa), and enzymatic properties were determined. The purified enzyme hydrolyzed 1-naphthyl acetate and 4-nitrophenyl acetate indicating that the enzyme is an esterase. We then cloned the carbaryl hydrolase gene (cehA) from the plasmid DNA of the strain and determined the nucleotide sequence of the 10-kb region containing cehA. No homologous sequences were found by a database homology search using the nucleotide and deduced amino acid sequences of the cehA gene. Six open reading frames including the cehA gene were found in the 10-kb region, and sequencing analysis shows that the cehA gene is flanked by two copies of insertion sequence-like sequence, suggesting that it makes part of a composite transposon. PMID:11872471

  20. Isolation of a full-length CC-NBS-LRR resistance gene analog candidate from sugar pine showing low nucleotide diversity.

    Treesearch

    K.D. Jermstad; L.A. Sheppard; B.B. Kinloch; A. Delfino-Mix; E.S. Ersoz; K.V. Krutovsky; D.B Neale

    2006-01-01

    The nucleotide-binding-site and leucine-rich-repeat (NBS–LRR) class of R proteins is abundant and widely distributed in plants. By using degenerate primers designed on the NBS domain in lettuce, we amplified sequences in sugar pine that shared sequence identity with many of the NBS–LRR class resistance genes catalogued in GenBank. The polymerase chain reaction products...

  1. First Complete Genome Sequence of an Isolate of Tomato Mottle Mosaic Virus Infecting Plants of Solanum lycopersicum in South America.

    PubMed

    Nagai, Alice; Duarte, Lígia M L; Chaves, Alexandre L R; Alexandre, Maria A V; Ramos-González, Pedro L; Chabi-Jesus, Camila; Harakava, Ricardo; Dos Santos, Déborah Y A C

    2018-05-10

    The complete nucleotide sequence of an isolate of tomato mottle mosaic virus (ToMMV) was determined. The virus, originally isolated from symptomatic tomato plants found in a county near the city of São Paulo, Brazil, has a genome with 99% nucleotide sequence identity with ToMMV from Mexico, China, Spain, and the United States. Copyright © 2018 Nagai et al.

  2. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    PubMed Central

    2012-01-01

    Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836

  3. Nucleotide sequencing and characterization of the genes encoding benzene oxidation enzymes of Pseudomonas putida.

    PubMed Central

    Irie, S; Doi, S; Yorifuji, T; Takagi, M; Yano, K

    1987-01-01

    The nucleotide sequence of the genes from Pseudomonas putida encoding oxidation of benzene to catechol was determined. Five open reading frames were found in the sequence. Four corresponding protein molecules were detected by a DNA-directed in vitro translation system. Escherichia coli cells containing the fragment with the four open reading frames transformed benzene to cis-benzene glycol, which is an intermediate of the oxidation of benzene to catechol. The relation between the product of each cistron and the components of the benzene oxidation enzyme system is discussed. Images PMID:3667527

  4. Overproduction and nucleotide sequence of the respiratory D-lactate dehydrogenase of Escherichia coli.

    PubMed Central

    Rule, G S; Pratt, E A; Chin, C C; Wold, F; Ho, C

    1985-01-01

    Recombinant DNA plasmids containing the gene for the membrane-bound D-lactate dehydrogenase (D-LDH) of Escherichia coli linked to the promoter PL from lambda were constructed. After induction, the levels of D-LDH were elevated 300-fold over that of the wild type and amounted to 35% of the total cellular protein. The nucleotide sequence of the D-LDH gene was determined and shown to agree with the amino acid composition and the amino-terminal sequence of the purified enzyme. Removal of the amino-terminal formyl-Met from D-LDH was not inhibited in cells which contained these high levels of D-LDH. Images PMID:3882663

  5. Comprehensive thermodynamic analysis of 3′ double-nucleotide overhangs neighboring Watson–Crick terminal base pairs

    PubMed Central

    O'Toole, Amanda S.; Miller, Stacy; Haines, Nathan; Zink, M. Coleen; Serra, Martin J.

    2006-01-01

    Thermodynamic parameters are reported for duplex formation of 48 self-complementary RNA duplexes containing Watson–Crick terminal base pairs (GC, AU and UA) with all 16 possible 3′ double-nucleotide overhangs; mimicking the structures of short interfering RNAs (siRNA) and microRNAs (miRNA). Based on nearest-neighbor analysis, the addition of a second dangling nucleotide to a single 3′ dangling nucleotide increases stability of duplex formation up to 0.8 kcal/mol in a sequence dependent manner. Results from this study in conjunction with data from a previous study [A. S. O'Toole, S. Miller and M. J. Serra (2005) RNA, 11, 512.] allows for the development of a refined nearest-neighbor model to predict the influence of 3′ double-nucleotide overhangs on the stability of duplex formation. The model improves the prediction of free energy and melting temperature when tested against five oligomers with various core duplex sequences. Phylogenetic analysis of naturally occurring miRNAs was performed to support our results. Selection of the effector miR strand of the mature miRNA duplex appears to be dependent upon the identity of the 3′ double-nucleotide overhang. Thermodynamic parameters for 3′ single terminal overhangs adjacent to a UA pair are also presented. PMID:16820533

  6. First complete genome sequence of an emerging cucumber green mottle mosaic virus isolate in North America

    USDA-ARS?s Scientific Manuscript database

    The complete genome sequence (6,423 nt) of an emerging Cucumber green mottle mosaic virus (CGMMV) isolate on cucumber in North America was determined through deep sequencing of sRNA and rapid amplification of cDNA ends. It shares 99% nucleotide sequence identity to the Asian genotype, but only 90% t...

  7. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms

    PubMed Central

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450

  8. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    PubMed

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  9. Update on Pneumocystis carinii f. sp. hominis Typing Based on Nucleotide Sequence Variations in Internal Transcribed Spacer Regions of rRNA Genes

    PubMed Central

    Lee, Chao-Hung; Helweg-Larsen, Jannik; Tang, Xing; Jin, Shaoling; Li, Baozheng; Bartlett, Marilyn S.; Lu, Jang-Jih; Lundgren, Bettina; Lundgren, Jens D.; Olsson, Mats; Lucas, Sebastian B.; Roux, Patricia; Cargnel, Antonietta; Atzori, Chiara; Matos, Olga; Smith, James W.

    1998-01-01

    Pneumocystis carinii f. sp. hominis isolates from 207 clinical specimens from nine countries were typed based on nucleotide sequence variations in the internal transcribed spacer regions I and II (ITS1 and ITS2, respectively) of rRNA genes. The number of ITS1 nucleotides has been revised from the previously reported 157 bp to 161 bp. Likewise, the number of ITS2 nucleotides has been changed from 177 to 192 bp. The number of ITS1 sequence types has increased from 2 to 15, and that of ITS2 has increased from 3 to 14. The 15 ITS1 sequence types are designated types A through O, and the 14 ITS2 types are named types a through n. A total of 59 types of P. carinii f. sp. hominis were found in this study. PMID:9508304

  10. A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution.

    PubMed

    Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme

    2013-07-01

    The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/. Supplementary data are available at Bioinformatics online.

  11. Random Amplification and Pyrosequencing for Identification of Novel Viral Genome Sequences

    PubMed Central

    Hang, Jun; Forshey, Brett M.; Kochel, Tadeusz J.; Li, Tao; Solórzano, Víctor Fiestas; Halsey, Eric S.; Kuschner, Robert A.

    2012-01-01

    ssRNA viruses have high levels of genomic divergence, which can lead to difficulty in genomic characterization of new viruses using traditional PCR amplification and sequencing methods. In this study, random reverse transcription, anchored random PCR amplification, and high-throughput pyrosequencing were used to identify orthobunyavirus sequences from total RNA extracted from viral cultures of acute febrile illness specimens. Draft genome sequence for the orthobunyavirus L segment was assembled and sequentially extended using de novo assembly contigs from pyrosequencing reads and orthobunyavirus sequences in GenBank as guidance. Accuracy and continuous coverage were achieved by mapping all reads to the L segment draft sequence. Subsequently, RT-PCR and Sanger sequencing were used to complete the genome sequence. The complete L segment was found to be 6936 bases in length, encoding a 2248-aa putative RNA polymerase. The identified L segment was distinct from previously published South American orthobunyaviruses, sharing 63% and 54% identity at the nucleotide and amino acid level, respectively, with the complete Oropouche virus L segment and 73% and 81% identity at the nucleotide and amino acid level, respectively, with a partial Caraparu virus L segment. The result demonstrated the effectiveness of a sequence-independent amplification and next-generation sequencing approach for obtaining complete viral genomes from total nucleic acid extracts and its use in pathogen discovery. PMID:22468136

  12. Nucleic acid analysis using terminal-phosphate-labeled nucleotides

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2008-04-22

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  13. First Complete Genome Sequence of Suakwa aphid-borne yellows virus from East Timor

    PubMed Central

    Maina, Solomon; Edwards, Owain R.; de Almeida, Luis; Ximenes, Abel

    2016-01-01

    We present here the first complete genomic RNA sequence of the polerovirus Suakwa aphid-borne yellows virus (SABYV), from East Timor. The isolate sequenced came from a virus-infected pumpkin plant. The East Timorese genome had a nucleotide identity of 86.5% with the only other SABYV genome available, which is from Taiwan. PMID:27469955

  14. Sequence analysis of porcine kobuvirus VP1 region detected in pigs in Japan and Thailand.

    PubMed

    Okitsu, Shoko; Khamrin, Pattara; Thongprachum, Aksara; Hidaka, Satoshi; Kongkaew, Sompreeya; Kongkaew, Apisek; Maneekarn, Niwat; Mizuguchi, Masashi; Hayakawa, Satoshi; Ushijima, Hiroshi

    2012-04-01

    Porcine kobuvirus is a new candidate species of the genus Kobuvirus in the family Picornaviridae, and information is still limited. The identification of porcine kobuvirus has been performed by the sequence analyses of the 3D region of the viruses. Therefore, the purpose of this study was to characterize the molecular properties of VP1 nucleotide sequences of the porcine kobuviruses isolated from porcine stool samples in Japan during 2009 and Thailand between 2006 and 2008. In addition, previous identification of a unique porcine kobuvirus; Japanese H023/2009/JP, which is a bovine kobuvirus-like strain based on sequence analysis of the 3D region, was also included in this study. All of the strains were amplified by the VP1-specific primer pair: the amplicons were subjected to direct sequencing and compared with the VP1 nucleotide sequences of reference strains. The VP1 sequences of strains from the GenBank database revealed high nucleotide sequence identity at 84.3-100%. On the other hand, the nucleotide identities among the 15 porcine kobuvirus strains analyzed in this study ranged from 78.8 to 99.8%. The results revealed that diversity of the strains in this study were higher than those of the strains in previous studies. Furthermore, it was found that the VP1 region of the bovine kobuvirus-like strain, H023/2009/JP, clustered with nine porcine kobuvirus strains that were isolated in Thailand and Japan. Since this strain was previously found to be closely related to bovine kobuviruses in the 3D gene region, it may be a natural recombinant.

  15. Nucleotide sequences of two genomic DNAs encoding peroxidase of Arabidopsis thaliana.

    PubMed

    Intapruk, C; Higashimura, N; Yamamoto, K; Okada, N; Shinmyo, A; Takano, M

    1991-02-15

    The peroxidase (EC 1.11.1.7)-encoding gene of Arabidopsis thaliana was screened from a genomic library using a cDNA encoding a neutral isozyme of horseradish, Armoracia rusticana, peroxidase (HRP) as a probe, and two positive clones were isolated. From the comparison with the sequences of the HRP-encoding genes, we concluded that two clones contained peroxidase-encoding genes, and they were named prxCa and prxEa. Both genes consisted of four exons and three introns; the introns had consensus nucleotides, GT and AG, at the 5' and 3' ends, respectively. The lengths of each putative exon of the prxEa gene were the same as those of the HRP-basic-isozyme-encoding gene, prxC3, and coded for 349 amino acids (aa) with a sequence homology of 89% to that encoded by prxC3. The prxCa gene was very close to the HRP-neutral-isozyme-encoding gene, prxC1b, and coded for 354 aa with 91% homology to that encoded by prxC1b. The aa sequence homology was 64% between the two peroxidases encoded by prxCa and prxEa.

  16. Pstl repeat: a family of short interspersed nucleotide element (SINE)-like sequences in the genomes of cattle, goat, and buffalo.

    PubMed

    Sheikh, Faruk G; Mukhopadhyay, Sudit S; Gupta, Prabhakar

    2002-02-01

    The PstI family of elements are short, highly repetitive DNA sequences interspersed throughout the genome of the Bovidae. We have cloned and sequenced some members of the PstI family from cattle, goat, and buffalo. These elements are approximately 500 bp, have a copy number of 2 x 10(5) - 4 x 10(5), and comprise about 4% of the haploid genome. Studies of nucleotide sequence homology indicate that the buffalo and goat PstI repeats (type II) are similar types of short interspersed nucleotide element (SINE) sequences, but the cattle PstI repeat (type I) is considerably more divergent. Additionally, the goat PstI sequence showed significant sequence homology with bovine serine tRNA, and is therefore likely derived from serine tRNA. Interestingly, Southern hybridization suggests that both types of SINEs (I and II) are present in all the species of Bovidae. Dendrogram analysis indicates that cattle PstI SINE is similar to bovine Alu-like SINEs. Goat and buffalo SINEs formed a separate cluster, suggesting that these two types of SINEs evolved separately in the genome of the Bovidae.

  17. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Form and format for... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... Code for Information Interchange (ASCII) text. No other formats shall be allowed. (3) The computer...

  18. Full-Genome Sequence of Infectious Laryngotracheitis Virus (Gallid Alphaherpesvirus 1) Strain VFAR-043, Isolated in Peru

    PubMed Central

    Bendezu Eguis, Jorge; Montesinos, Ricardo; Fernández-Díaz, Manolo

    2018-01-01

    ABSTRACT We report here the first genome sequence of infectious laryngotracheitis virus isolated in Peru from tracheal tissues of layer chickens. The genome showed 99.98% identity to the J2 strain genome sequence. Single nucleotide polymorphisms were detected in five gene-coding sequences related to vaccine development, virus attachment, and viral immune evasion. PMID:29519822

  19. Developing Single Nucleotide Polymorphism (SNP) markers from transcriptome sequences for the identification of longan (Dimocarpus longan) germplasm

    USDA-ARS?s Scientific Manuscript database

    Longan (Dimocarpus longan Lour.) is an important tropical fruit tree crop. Accurate varietal identification is essential for germplasm management and breeding. Using longan transcriptome sequences from public databases, we developed single nucleotide polymorphism (SNP) markers; validated 60 SNPs in...

  20. Human somatostatin I: sequence of the cDNA.

    PubMed Central

    Shen, L P; Pictet, R L; Rutter, W J

    1982-01-01

    RNA has been isolated from a human pancreatic somatostatinoma and used to prepare a cDNA library. After prescreening, clones containing somatostatin I sequences were identified by hybridization with an anglerfish somatostatin I-cloned cDNA probe. From the nucleotide sequence of two of these clones, we have deduced an essentially full-length mRNA sequence, including the preprosomatostatin coding region, 105 nucleotides from the 5' untranslated region and the complete 150-nucleotide 3' untranslated region. The coding region predicts a 116-amino acid precursor protein (Mr, 12.727) that contains somatostatin-14 and -28 at its COOH terminus. The predicted amino acid sequence of human somatostatin-28 is identical to that of somatostatin-28 isolated from the porcine and ovine species. A comparison of the amino acid sequences of human and anglerfish preprosomatostatin I indicated that the COOH-terminal region encoding somatostatin-14 and the adjacent 6 amino acids are highly conserved, whereas the remainder of the molecule, including the signal peptide region, is more divergent. However, many of the amino acid differences found in the pro region of the human and anglerfish proteins are conservative changes. This suggests that the propeptides have a similar secondary structure, which in turn may imply a biological function for this region of the molecule. Images PMID:6126875

  1. Deep Sequencing Reveals a Divergent Ugandan cassava brown streak virus Isolate from Malawi

    PubMed Central

    Winter, Stephan; Mukasa, Settumba; Tairo, Fred; Sseruwagi, Peter; Ndunguru, Joseph; Duffy, Siobain

    2017-01-01

    ABSTRACT Illumina sequencing of RNA from a cassava cutting from northern Malawi produced a genome of Ugandan cassava brown streak virus (UCBSV-MW-NB7_2013). Sequence comparisons revealed stronger similarity to an isolate from nearby Tanzania (93.4% pairwise nucleotide identity) than to those previously reported from Malawi (86.9 to 87.0%). PMID:28818908

  2. Nucleotide and deduced amino acid sequence of the envelope gene of the Vasilchenko strain of TBE virus; comparison with other flaviviruses.

    PubMed

    Gritsun, T S; Frolova, T V; Pogodina, V V; Lashkevich, V A; Venugopal, K; Gould, E A

    1993-02-01

    A strain of tick-borne encephalitis virus known as Vasilchenko (Vs) exhibits relatively low virulence characteristics in monkeys, Syrian hamsters and humans. The gene encoding the envelope glycoprotein of this virus was cloned and sequenced. Alignment of the sequence with those of other known tick-borne flaviviruses and identification of the recognised amino acid genetic marker EHLPTA confirmed its identity as a member of the TBE complex. However, Vs virus was distinguishable from eastern and western tick-borne serotypes by the presence of the sequence AQQ at amino acid positions 232-234 and also by the presence of other specific amino acid substitutions which may be genetic markers for these viruses and could determine their pathogenetic characteristics. When compared with other tick-borne flaviviruses, Vs virus had 12 unique amino acid substitutions including an additional potential glycosylation site at position (315-317). The Vs virus strain shared closest nucleotide and amino acid homology (84.5% and 95.5% respectively) with western and far eastern strains of tick-borne encephalitis virus. Comparison with the far eastern serotype of tick-borne encephalitis virus, by cross-immunoelectrophoresis of Vs virions and PAGE analysis of the extracted virion proteins, revealed differences in surface charge and virus stability that may account for the different virulence characteristics of Vs virus. These results support and enlarge upon previous data obtained from molecular and serological analysis.

  3. Prunus necrotic ringspot ilarvirus: nucleotide sequence of RNA3 and the relationship to other ilarviruses based on coat protein comparison.

    PubMed

    Guo, D; Maiss, E; Adam, G; Casper, R

    1995-05-01

    The RNA3 of prunus necrotic ringspot ilarvirus (PNRSV) has been cloned and its entire sequence determined. The RNA3 consists of 1943 nucleotides (nt) and possesses two large open reading frames (ORFs) separated by an intergenic region of 74 nt. The 5' proximal ORF is 855 nt in length and codes for a protein of molecular mass 31.4 kDa which has homologies with the putative movement protein of other members of the Bromoviridae. The 3' proximal ORF of 675 nt is the cistron for the coat protein (CP) and has a predicted molecular mass of 24.9 kDa. The sequence of the 3' non-coding region (NCR) of PNRSV RNA3 showed a high degree of similarity with those of tobacco streak virus (TSV), prune dwarf virus (PDV), apple mosaic virus (ApMV) and also alfalfa mosaic virus (AIMV). In addition it contained potential stem-loop structures with interspersed AUGC motifs characteristic for ilar- and alfamoviruses. This conserved primary and secondary structure in all 3' NCRs may be responsible for the interaction with homologous and heterologous CPs and subsequent activation of genome replication. The CP gene of an ApMV isolate (ApMV-G) of 657 nt has also been cloned and sequenced. Although ApMV and PNRSV have a distant serological relationship, the deduced amino acid sequences of their CPs have an identity of only 51.8%. The N termini of PNRSV and ApMV CPs have in common a zinc-finger motif and the potential to form an amphipathic helix.

  4. Complete genome sequence of maize yellow striate virus, a new cytorhabdovirus infecting maize and wheat crops in Argentina.

    PubMed

    Maurino, Fernanda; Dumón, Analía D; Llauger, Gabriela; Alemandri, Vanina; de Haro, Luis A; Mattio, M Fernanda; Del Vas, Mariana; Laguna, Irma Graciela; Giménez Pecci, María de la Paz

    2018-01-01

    A rhabdovirus infecting maize and wheat crops in Argentina was molecularly characterized. Through next-generation sequencing (NGS) of symptomatic leaf samples, the complete genome was obtained of two isolates of maize yellow striate virus (MYSV), a putative new rhabdovirus, differing by only 0.4% at the nucleotide level. The MYSV genome consists of 12,654 nucleotides for maize and wheat virus isolates, and shares 71% nucleotide sequence identity with the complete genome of barley yellow striate mosaic virus (BYSMV, NC028244). Ten open reading frames (ORFs) were predicted in the MYSV genome from the antigenomic strand and were compared with their BYSMV counterparts. The highest amino acid sequence identity of the MYSV and BYSMV proteins was 80% between the L proteins, and the lowest was 37% between the proteins 4. Phylogenetic analysis suggested that the MYSV isolates are new members of the genus Cytorhabdovirus, family Rhabdoviridae. Yellow striate, affecting maize and wheat crops in Argentina, is an emergent disease that presents a potential economic risk for these widely distributed crops.

  5. Unique nucleotide sequence-guided assembly of repetitive DNA parts for synthetic biology applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Torella, JP; Lienert, F; Boehm, CR

    2014-08-07

    Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts, and they hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies-for example, repeated terminator and insulator sequences-that complicate recombination-based assembly. We and others have recently developed DNA assembly methods, which we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked withmore » UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly assembled constructs, or into high-quality combinatorial libraries in only 2-3 d. If the DNA parts must be generated from scratch, an additional 2-5 d are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques.« less

  6. Nucleotide sequence and phylogenetic analysis of Cucurbit yellow stunting disorder virus RNA 2.

    PubMed

    Livieratos, Ioannis C; Coutts, Robert H A

    2002-06-01

    The complete nucleotide sequence of Cucurbit yellow stunting disorder virus (CYSDV) RNA 2, a whitefly (Bemisia tabaci)-transmitted closterovirus with a bi-partite genome, is reported. CYSDV RNA 2 is 7,281 nucleotides long and contains the closterovirus hallmark gene array with a similar arrangement to the prototype member of the genus Crinivirus, Lettuce infectious yellows virus (LIYV). CYSDV RNA 2 contains open reading frames (ORFs) potentially encoding in a 5' to 3' direction for proteins of 5 kDa (ORF 1; hydrophobic protein), 62 kDa (ORF 2; heat shock protein 70 homolog, HSP70h), 59 kDa (ORF 3; protein of unknown function), 9 kDa (ORF 4; protein of unknown function), 28.5 kDa (ORF 5; coat protein, CP), 53 kDa (ORF 6; coat protein minor, CPm), and 26.5 kDa (ORF 7; protein of unknown function). Pairwise comparisons of CYSDV RNA 2-encoded proteins (HSP70h, p59 and CPm) among the closteroviruses showed that CYSDV is closely related to LIYV. Phylogenetic analysis based on the amino acid sequence of the HSP70h, indicated that CYSDV clusters with other members of the genus Crinivirus, and it is related to Little cherry virus-1 (LChV-1), but is distinct from the aphid- or mealybug-transmitted closteroviruses.

  7. miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST

    PubMed Central

    Kim, You Jung; Boyd, Andrew; Athey, Brian D.; Patel, Jignesh M.

    2005-01-01

    A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Exis-ting tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is large. In this paper, we present a new algorithm, called miBLAST, that evaluates such batch workloads efficiently. At the core, miBLAST employs a q-gram filtering and an index join for efficiently detecting similarity between the query sequences and database sequences. This set-oriented technique, which indexes both the query and the database sets, results in substantial performance improvements over existing methods. Our results show that miBLAST is significantly faster than BLAST in many cases. For example, miBLAST aligned 247 965 oligonucleotide sequences in the Affymetrix probe set against the Human UniGene in 1.26 days, compared with 27.27 days with BLAST (an improvement by a factor of 22). The relative performance of miBLAST increases for larger word sizes; however, it decreases for longer queries. miBLAST employs the familiar BLAST statistical model and output format, guaranteeing the same accuracy as BLAST and facilitating a seamless transition for existing BLAST users. PMID:16061938

  8. Nucleotide sequence analysis of the recA gene and discrimination of the three isolates of urease-positive thermophilic Campylobacter (UPTC) isolated from seagulls (Larus spp.) in Northern Ireland.

    PubMed

    Matsuda, M; Tai, K; Moore, J E; Millar, B C; Murayama, O

    2004-01-01

    Nucleotide sequencing after TA cloning of the amplicon of the almost-full length recA gene from three strains of UPTC (A1, A2, and A3) isolated from seagulls in Northern Ireland, the phenotypical and genotypical characteristics of which have been demonstrated to be indistinguishable, clarified nucleotide differences at three nucleotide positions among the three strains. In conclusion, the nucleotide sequences of the recA gene were found to discriminate among the three strains of UPTC, A1, A2, and A3, which are indistinguishable phenotypically and genotypically. Thus, the present study strongly suggests that nucleotide sequence data of the amplicon of a suitable gene or region could aid in discriminating among isolates of the UPTC group, which are indistinguishable phenotypically and genotypically. Copyright 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

  9. Distinctive acceptor-end structure and other determinants of Escherichia coli tRNAPro identity.

    PubMed Central

    McClain, W H; Schneider, J; Gabriel, K

    1994-01-01

    The previously uncharacterized determinants of the specificity of tRNAPro for aminoacylation (tRNAPro identity) were defined by a computer comparison of all Escherichia coli tRNA sequences and tested by a functional analysis of amber suppressor tRNAs in vivo. We determined the amino acid specificity of tRNA by sequencing a suppressed protein and the aminoacylation efficiency of tRNA by examining the steady-state level of aminoacyl-tRNA. On substituting nucleotides derived from the acceptor end and variable pocket of tRNAPro for the corresponding nucleotides in a tRNAPhe gene, the identity of the resulting tRNA changed substantially but incompletely to that of tRNAPro. The redesigned tRNAPhe was weakly active and aminoacyl-tRNA was not detected. Ethyl methanesulfonate mutagenesis of the redesigned tRNAPhe gene produced a mutant with a wobble pair in place of a base pair in the end of the acceptor-stem helix of the transcribed tRNA. This mutant exhibited both a tRNAPro identity and substantial aminoacyl-tRNA. The results speak for the importance of a distinctive conformation in the acceptor-stem helix of tRNAPro for aminoacylation by the prolyl-tRNA synthetase. The anticodon also contributes to tRNAPro identity but is not necessary in vivo. Images PMID:8127693

  10. The complete genomic sequence of a tentative new polerovirus identified in barley in South Korea.

    PubMed

    Zhao, Fumei; Lim, Seungmo; Yoo, Ran Hee; Igori, Davaajargal; Kim, Sang-Min; Kwak, Do Yeon; Kim, Sun Lim; Lee, Bong Choon; Moon, Jae Sun

    2016-07-01

    The complete nucleotide sequence of a new barley polerovirus, tentatively named barley virus G (BVG), which was isolated in Gimje, South Korea, has been determined using an RNA sequencing technique combined with polymerase chain reaction methods. The viral genomic RNA of BVG is 5,620 nucleotides long and contains six typical open reading frames commonly observed in other poleroviruses. Sequence comparisons revealed that BVG is most closely related to maize yellow dwarf virus-RMV, with the highest amino acid identities being less than 90 % for all of the corresponding proteins. These results suggested that BVG is a member of a new species in the genus Polerovirus.

  11. Nucleotide sequence of the Kaposi sarcoma-associated herpesvirus (HHV8)

    PubMed Central

    Russo, James J.; Bohenzky, Roy A.; Chien, Ming-Cheng; Chen, Jing; Yan, Ming; Maddalena, Dawn; Parry, J. Preston; Peruzzi, Daniela; Edelman, Isidore S.; Chang, Yuan; Moore, Patrick S.

    1996-01-01

    The genome of the Kaposi sarcoma-associated herpesvirus (KSHV or HHV8) was mapped with cosmid and phage genomic libraries from the BC-1 cell line. Its nucleotide sequence was determined except for a 3-kb region at the right end of the genome that was refractory to cloning. The BC-1 KSHV genome consists of a 140.5-kb-long unique coding region flanked by multiple G+C-rich 801-bp terminal repeat sequences. A genomic duplication that apparently arose in the parental tumor is present in this cell culture-derived strain. At least 81 ORFs, including 66 with homology to herpesvirus saimiri ORFs, and 5 internal repeat regions are present in the long unique region. The virus encodes homologs to complement-binding proteins, three cytokines (two macrophage inflammatory proteins and interleukin 6), dihydrofolate reductase, bcl-2, interferon regulatory factors, interleukin 8 receptor, neural cell adhesion molecule-like adhesin, and a D-type cyclin, as well as viral structural and metabolic proteins. Terminal repeat analysis of virus DNA from a KS lesion suggests a monoclonal expansion of KSHV in the KS tumor. PMID:8962146

  12. Genome Sequences of Ilzat and Eleri, Two Phages Isolated Using Microbacterium foliorum NRRL B-24224

    PubMed Central

    Ali, Ilzat; Jones, Acacia Eleri; Mohamed, Aleem

    2018-01-01

    ABSTRACT Bacteriophages Ilzat and Eleri are newly isolated Siphoviridae infecting Microbacterium foliorum NRRL B-24224. The phage genomes are similar in length, G+C content, and architecture and share 62.9% nucleotide sequence identity. PMID:29650566

  13. Sequence variation and phylogenetic analysis of envelope glycoprotein of hepatitis G virus.

    PubMed

    Lim, M Y; Fry, K; Yun, A; Chong, S; Linnen, J; Fung, K; Kim, J P

    1997-11-01

    A transfusion-transmissible agent provisionally designated hepatitis G virus (HGV) was recently identified. In this study, we examined the variability of the HGV genome by analysing sequences in the putative envelope region from 72 isolates obtained from diverse geographical sources. The 1561 nucleotide sequence of the E1/E2/NS2a region of HGV was determined from 12 isolates, and compared with three published sequences. The most variability was observed in 400 nucleotides at the N terminus of E2. We next analysed this 400 nucleotide envelope variable region (EV) from an additional 60 HGV isolates. This sequence varied considerably among the 75 isolates, with overall identity ranging from 79.3% to 99.5% at the nucleotide level, and from 83.5% to 100% at the amino acid level. However, hypervariable regions were not identified. Phylogenetic analyses indicated that the 75 HGV isolates belong to a single genotype. A single-tier distribution of evolutionary distances was observed among the 15 E1/E2/NS2a sequences and the 75 EV sequences. In contrast, 11 isolates of HCV were analysed and showed a three-tiered distribution, representing genotypes, subtypes, and isolates. The 75 isolates of HGV fell into four clusters on the phylogenetic tree. Tight geographical clustering was observed among the HGV isolates from Japan and Korea.

  14. Complete sequence and diversity of a maize-associated Polerovirus in East Africa.

    PubMed

    Massawe, Deogracious P; Stewart, Lucy R; Kamatenesi, Jovia; Asiimwe, Theodore; Redinbaugh, Margaret G

    2018-06-01

    Since 2011-2012, Maize lethal necrosis (MLN) has emerged in East Africa, causing massive yield loss and propelling research to identify viruses and virus populations present in maize. As expected, next generation sequencing (NGS) has revealed diverse and abundant viruses from the family Potyviridae, primarily sugarcane mosaic virus (SCMV), and maize chlorotic mottle virus (MCMV) (Tombusviridae), which are known to cause MLN by synergistic co-infection. In addition to these expected viruses, we identified a virus in the genus Polerovirus (family Luteoviridae) in 104/172 samples selected for MLN or other potential virus symptoms from Kenya, Uganda, Rwanda, and Tanzania. This polerovirus (MF974579) nucleotide sequence is 97% identical to maize-associated viruses recently reported in China, termed 'maize yellow mosaic virus' (MaYMV) and maize yellow dwarf virus (MaYMV; KU291101, KU291107, MYDV-RMV2; KT992824); and 99% identical to MaYMV (KY684356) infecting sugarcane and itch grass in Nigeria; 83% identical to a barley-associated polerovirus recently identified in Korea (BVG; KT962089); and 79% identical to the U.S. maize-infecting polerovirus maize yellow dwarf virus (MYDV-RMV; KT992824). Nucleotide sequences from ORF0 of 20 individual East African isolates collected from Kenya, Uganda, Rwanda, and Tanzania shared 98% or higher identity, and were detected in 104/172 (60.5%) of samples collected for virus-like symptoms, indicating extensive prevalence but limited diversity of this virus in East Africa. We refer to this virus as "MYDV-like polerovirus" until symptoms of the virus in maize are known.

  15. The genome sequence of pepper vein yellows virus (family Luteoviridae, genus Polerovirus).

    PubMed

    Murakami, Ritsuko; Nakashima, Nobuhiko; Hinomoto, Norihide; Kawano, Shinji; Toyosato, Tetsuya

    2011-05-01

    The complete genome of pepper vein yellows virus (PeVYV) was sequenced using random amplification of RNA samples isolated from vector insects (Aphis gossypii) that had been given access to PeVYV-infected plants. The PeVYV genome consisted of 6244 nucleotides and had a genomic organization characteristic of members of the genus Polerovirus. PeVYV had highest amino acid sequence identities in ORF0 to ORF3 (75.9 - 91.9%) with tobacco vein distorting polerovirus, with which it was only 25.1% identical in ORF5. These sequence comparisons and previously studied biological properties indicate that PeVYV is a distinctly different virus and belongs to a new species of the genus Polerovirus.

  16. The complete nucleotide sequences of the five genetically distinct plastid genomes of Oenothera, subsection Oenothera: I. sequence evaluation and plastome evolution.

    PubMed

    Greiner, Stephan; Wang, Xi; Rauwolf, Uwe; Silber, Martina V; Mayer, Klaus; Meurer, Jörg; Haberer, Georg; Herrmann, Reinhold G

    2008-04-01

    The flowering plant genus Oenothera is uniquely suited for studying molecular mechanisms of speciation. It assembles an intriguing combination of genetic features, including permanent translocation heterozygosity, biparental transmission of plastids, and a general interfertility of well-defined species. This allows an exchange of plastids and nuclei between species often resulting in plastome-genome incompatibility. For evaluation of its molecular determinants we present the complete nucleotide sequences of the five basic, genetically distinguishable plastid chromosomes of subsection Oenothera (=Euoenothera) of the genus, which are associated in distinct combinations with six basic genomes. Sizes of the chromosomes range from 163 365 bp (plastome IV) to 165 728 bp (plastome I), display between 96.3% and 98.6% sequence similarity and encode a total of 113 unique genes. Plastome diversification is caused by an abundance of nucleotide substitutions, small insertions, deletions and repetitions. The five plastomes deviate from the general ancestral design of plastid chromosomes of vascular plants by a subsection-specific 56 kb inversion within the large single-copy segment. This inversion disrupted operon structures and predates the divergence of the subsection presumably 1 My ago. Phylogenetic relationships suggest plastomes I-III in one clade, while plastome IV appears to be closest to the common ancestor.

  17. The complete nucleotide sequences of the five genetically distinct plastid genomes of Oenothera, subsection Oenothera: I. Sequence evaluation and plastome evolution†

    PubMed Central

    Greiner, Stephan; Wang, Xi; Rauwolf, Uwe; Silber, Martina V.; Mayer, Klaus; Meurer, Jörg; Haberer, Georg; Herrmann, Reinhold G.

    2008-01-01

    The flowering plant genus Oenothera is uniquely suited for studying molecular mechanisms of speciation. It assembles an intriguing combination of genetic features, including permanent translocation heterozygosity, biparental transmission of plastids, and a general interfertility of well-defined species. This allows an exchange of plastids and nuclei between species often resulting in plastome–genome incompatibility. For evaluation of its molecular determinants we present the complete nucleotide sequences of the five basic, genetically distinguishable plastid chromosomes of subsection Oenothera (=Euoenothera) of the genus, which are associated in distinct combinations with six basic genomes. Sizes of the chromosomes range from 163 365 bp (plastome IV) to 165 728 bp (plastome I), display between 96.3% and 98.6% sequence similarity and encode a total of 113 unique genes. Plastome diversification is caused by an abundance of nucleotide substitutions, small insertions, deletions and repetitions. The five plastomes deviate from the general ancestral design of plastid chromosomes of vascular plants by a subsection-specific 56 kb inversion within the large single-copy segment. This inversion disrupted operon structures and predates the divergence of the subsection presumably 1 My ago. Phylogenetic relationships suggest plastomes I–III in one clade, while plastome IV appears to be closest to the common ancestor. PMID:18299283

  18. Single nucleotide polymorphisms from Theobroma cacao expressed sequence tags associated with witches' broom disease in cacao.

    PubMed

    Lima, L S; Gramacho, K P; Carels, N; Novais, R; Gaiotto, F A; Lopes, U V; Gesteira, A S; Zaidan, H A; Cascardo, J C M; Pires, J L; Micheli, F

    2009-07-14

    In order to increase the efficiency of cacao tree resistance to witches' broom disease, which is caused by Moniliophthora perniciosa (Tricholomataceae), we looked for molecular markers that could help in the selection of resistant cacao genotypes. Among the different markers useful for developing marker-assisted selection, single nucleotide polymorphisms (SNPs) constitute the most common type of sequence difference between alleles and can be easily detected by in silico analysis from expressed sequence tag libraries. We report the first detection and analysis of SNPs from cacao-M. perniciosa interaction expressed sequence tags, using bioinformatics. Selection based on analysis of these SNPs should be useful for developing cacao varieties resistant to this devastating disease.

  19. Systematic and stochastic influences on the performance of the MinION nanopore sequencer across a range of nucleotide bias

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.

    Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less

  20. Systematic and stochastic influences on the performance of the MinION nanopore sequencer across a range of nucleotide bias

    DOE PAGES

    Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.; ...

    2018-02-16

    Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less

  1. Draft Genome Sequences of Three Novel Low-Abundance Species Strains Isolated from Kefir Grain.

    PubMed

    Kim, Yongkyu; Blasche, Sonja; Patil, Kiran R

    2017-09-28

    We report here the genome sequences of three novel bacterial species strains- Bacillus kefirresidentii Opo, Rothia kefirresidentii KRP, and Streptococcus kefirresidentii YK-isolated from kefir grains collected in Germany. The draft genomes of these isolates were remarkably dissimilar (average nucleotide identities, 77.80%, 89.01%, and 92.10%, respectively) to those of the previously sequenced strains. Copyright © 2017 Kim et al.

  2. Molecular cloning and nucleotide sequence of the alpha and beta subunits of allophycocyanin from the cyanelle genome of Cyanophora paradoxa.

    PubMed Central

    Bryant, D A; de Lorimier, R; Lambert, D H; Dubbs, J M; Stirewalt, V L; Stevens, S E; Porter, R D; Tam, J; Jay, E

    1985-01-01

    The genes for the alpha- and beta-subunit apoproteins of allophycocyanin (AP) were isolated from the cyanelle genome of Cyanophora paradoxa and subjected to nucleotide sequence analysis. The AP beta-subunit apoprotein gene was localized to a 7.8-kilobase-pair Pst I restriction fragment from cyanelle DNA by hybridization with a tetradecameric oligonucleotide probe. Sequence analysis using that oligonucleotide and its complement as primers for the dideoxy chain-termination sequencing method confirmed the presence of both AP alpha- and beta-subunit genes on this restriction fragment. Additional oligonucleotide primers were synthesized as sequencing progressed and were used to determine rapidly the nucleotide sequence of a 1336-base-pair region of this cloned fragment. This strategy allowed the sequencing to be completed without a detailed restriction map and without extensive and time-consuming subcloning. The sequenced region contains two open reading frames whose deduced amino acid sequences are 81-85% homologous to cyanobacterial and red algal AP subunits whose amino acid sequences have been determined. The two open reading frames are in the same orientation and are separated by 39 base pairs. AP alpha is 5' to AP beta and both coding sequences are preceded by a polypurine, Shine-Dalgarno-type sequence. Sequences upstream from AP alpha closely resemble the Escherichia coli consensus promoter sequences and also show considerable homology to promoter sequences for several chloroplast-encoded psbA genes. A 56-base-pair palindromic sequence downstream from the AP beta gene could play a role in the termination of transcription or translation. The allophycocyanin apoprotein subunit genes are located on the large single-copy region of the cyanelle genome. PMID:2987916

  3. Unique nucleotide sequence (UNS)-guided assembly of repetitive DNA parts for synthetic biology applications

    PubMed Central

    Torella, Joseph P.; Lienert, Florian; Boehm, Christian R.; Chen, Jan-Hung; Way, Jeffrey C.; Silver, Pamela A.

    2016-01-01

    Recombination-based DNA construction methods, such as Gibson assembly, have made it possible to easily and simultaneously assemble multiple DNA parts and hold promise for the development and optimization of metabolic pathways and functional genetic circuits. Over time, however, these pathways and circuits have become more complex, and the increasing need for standardization and insulation of genetic parts has resulted in sequence redundancies — for example repeated terminator and insulator sequences — that complicate recombination-based assembly. We and others have recently developed DNA assembly methods that we refer to collectively as unique nucleotide sequence (UNS)-guided assembly, in which individual DNA parts are flanked with UNSs to facilitate the ordered, recombination-based assembly of repetitive sequences. Here we present a detailed protocol for UNS-guided assembly that enables researchers to convert multiple DNA parts into sequenced, correctly-assembled constructs, or into high-quality combinatorial libraries in only 2–3 days. If the DNA parts must be generated from scratch, an additional 2–5 days are necessary. This protocol requires no specialized equipment and can easily be implemented by a student with experience in basic cloning techniques. PMID:25101822

  4. A statistical model for investigating binding probabilities of DNA nucleotide sequences using microarrays.

    PubMed

    Lee, Mei-Ling Ting; Bulyk, Martha L; Whitmore, G A; Church, George M

    2002-12-01

    There is considerable scientific interest in knowing the probability that a site-specific transcription factor will bind to a given DNA sequence. Microarray methods provide an effective means for assessing the binding affinities of a large number of DNA sequences as demonstrated by Bulyk et al. (2001, Proceedings of the National Academy of Sciences, USA 98, 7158-7163) in their study of the DNA-binding specificities of Zif268 zinc fingers using microarray technology. In a follow-up investigation, Bulyk, Johnson, and Church (2002, Nucleic Acid Research 30, 1255-1261) studied the interdependence of nucleotides on the binding affinities of transcription proteins. Our article is motivated by this pair of studies. We present a general statistical methodology for analyzing microarray intensity measurements reflecting DNA-protein interactions. The log probability of a protein binding to a DNA sequence on an array is modeled using a linear ANOVA model. This model is convenient because it employs familiar statistical concepts and procedures and also because it is effective for investigating the probability structure of the binding mechanism.

  5. Complete genome sequence of the first human parechovirus type 3 isolated in Taiwan.

    PubMed

    Chang, Jenn-Tzong; Yang, Chih-Shiang; Chen, Bao-Chen; Chen, Yao-Shen; Chang, Tsung-Hsien

    2017-11-01

    The first human parechovirus 3 (HPeV3 VGHKS-2007) in Taiwan was identified from a clinical specimen from a male infant. The entire genome of the HPeV3 isolate was sequenced and compared to known HPeV3 sequences. Genome alignment data showed that HPeV3 VGHKS-2007 shares the highest nucleotide identity, 99%, with the Japanese strain of HPeV3 1361K-162589-Yamagata-2008. All HPeV3 isolates possess at least 97% amino acid identity. The analysis of the genome sequence of HPeV3 VGHKS-2007 will facilitate future investigations of the epidemiology and pathogenicity of HPeV3 infection. Copyright © 2017. Published by Elsevier Taiwan LLC.

  6. Direct bisulfite sequencing for examination of DNA methylation with gene and nucleotide resolution from brain tissues.

    PubMed

    Parrish, R Ryley; Day, Jeremy J; Lubin, Farah D

    2012-07-01

    DNA methylation is an epigenetic modification that is essential for the development and mature function of the central nervous system. Due to the relevance of this modification to the transcriptional control of gene expression, it is often necessary to examine changes in DNA methylation patterns with both gene and single-nucleotide resolution. Here, we describe an in-depth basic protocol for direct bisulfite sequencing of DNA isolated from brain tissue, which will permit direct assessment of methylation status at individual genes as well as individual cytosine molecules/nucleotides within a genomic region. This method yields analysis of DNA methylation patterns that is robust, accurate, and reproducible, thereby allowing insights into the role of alterations in DNA methylation in brain tissue.

  7. A comparative genomics strategy for targeted discovery of single-nucleotide polymorphisms and conserved-noncoding sequences in orphan crops.

    PubMed

    Feltus, F A; Singh, H P; Lohithaswa, H C; Schulze, S R; Silva, T D; Paterson, A H

    2006-04-01

    Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species.

  8. Molecular characterisation and nucleotide sequence analysis of canine parvovirus strains in vaccines in India.

    PubMed

    Nandi, Sukdeb; Anbazhagan, Rajendra; Kumar, Manoj

    2010-01-01

    Canine parvovirus 2 (CPV-2) is one of the most important viruses that causes haemorrhagic gastroenteritis and myocarditis of dogs worldwide. The picture has been complicated further due to the emergence of new mutants of CPV, namely: CPV-2a, CPV-2b and CPV-2c. In this study, the molecular characterisation of strains present in the CPV vaccines available on the Indian market was performed using polymerase chain reaction and DNA sequencing. The VP1/VP2 genes of two vaccine strains and a field strain (Bhopal) were sequenced and the nucleotide and the deduced amino acid sequences were compared. The results indicated that the isolate belonged to CPV type 2b and the strains in the vaccines belonged to type CPV-2. From the study, it is inferred that the CPV strain used in commercially available vaccine preparation differed from the strains present in CPV infection in dogs in India.

  9. Complete genome sequence of keunjorong mosaic virus, a potyvirus from Cynanchum wilfordii.

    PubMed

    Nam, Moon; Lee, Joo-Hee; Choi, Hong Soo; Lim, Hyoun-Sub; Moon, Jae Sun; Lee, Su-Heon

    2013-08-01

    We have determined the complete genome sequence of keunjorong mosaic virus (KjMV). The KjMV genome is composed of 9,611 nucleotides, excluding the 3'-terminal poly(A) tail. It contains two open reading frames (ORFs), with the large one encoding a polyprotein of 3,070 amino acids and the small overlapping ORF encoding a PIPO protein of 81 amino acids. The KjMV genome shared the highest nucleotide sequence identity (57.5  %) with pepper mottle virus and freesia mosaic virus, two members of the genus Potyvirus. Based on the phylogenetic relatedness to known potyviruses, KjMV appears to be a member of a new species in the genus Potyvirus.

  10. Single nucleotide polymorphism analysis reveals heterogeneity within a seedling tree population of a polyembryonic mango cultivar.

    PubMed

    Winterhagen, Patrick; Wünsche, Jens-Norbert

    2016-05-01

    Within a polyembryonic mango seedling tree population, the genetic background of individuals should be identical because vigorous plants for cultivation are expected to develop from nucellar embryos representing maternal clones. Due to the fact that the mango cultivar 'Hôi' is assigned to the polyembryonic ecotype, an intra-cultivar variability of ethylene receptor genes was unexpected. Ethylene receptors in plants are conserved, but the number of receptors or receptor isoforms is variable regarding different plant species. However, it is shown here that the ethylene receptor MiETR1 is present in various isoforms within the mango cultivar 'Hôi'. The investigation of single nucleotide polymorphisms revealed that different MiETR1 isoforms can not be discriminated simply by individual single nucleotide exchanges but by the specific arrangement of single nucleotide polymorphisms at certain positions in the exons of MiETR1. Furthermore, an MiETR1 isoform devoid of introns in the genomic sequence was identified. The investigation demonstrates some limitations of high resolution melting and ScreenClust analysis and points out the necessity of sequencing to identify individual isoforms and to determine the variability within the tree population.

  11. An Engineered Kinetic Amplification Mechanism for Single Nucleotide Variant Discrimination by DNA Hybridization Probes.

    PubMed

    Chen, Sherry Xi; Seelig, Georg

    2016-04-20

    Even a single-nucleotide difference between the sequences of two otherwise identical biological nucleic acids can have dramatic functional consequences. Here, we use model-guided reaction pathway engineering to quantitatively improve the performance of selective hybridization probes in recognizing single nucleotide variants (SNVs). Specifically, we build a detection system that combines discrimination by competition with DNA strand displacement-based catalytic amplification. We show, both mathematically and experimentally, that the single nucleotide selectivity of such a system in binding to single-stranded DNA and RNA is quadratically better than discrimination due to competitive hybridization alone. As an additional benefit the integrated circuit inherits the property of amplification and provides at least 10-fold better sensitivity than standard hybridization probes. Moreover, we demonstrate how the detection mechanism can be tuned such that the detection reaction is agnostic to the position of the SNV within the target sequence. in contrast, prior strand displacement-based probes designed for kinetic discrimination are highly sensitive to position effects. We apply our system to reliably discriminate between different members of the let-7 microRNA family that differ in only a single base position. Our results demonstrate the power of systematic reaction network design to quantitatively improve biotechnology.

  12. Sequence and RT-PCR expression analysis of two peroxidases from Arabidopsis thaliana belonging to a novel evolutionary branch of plant peroxidases.

    PubMed

    Kjaersgård, I V; Jespersen, H M; Rasmussen, S K; Welinder, K G

    1997-03-01

    cDNA clones encoding two new Arabidopsis thaliana peroxidases, ATP 1a and ATP 2a, have been identified by searching the Arabidopsis database of expressed sequence tags (dbEST). They represent a novel branch of hitherto uncharacterized plant peroxidases which is only 35% identical in amino acid sequence to the well characterized group of basic plant peroxidases represented by the horseradish (Armoracia rusticana) isoperoxidases HRP C, HRP E5 and the similar Arabidopsis isoperoxidases ATP Ca, ATP Cb, and ATP Ea. However ATP 1a is 87% identical in amino acid sequence to a peroxidase encoded by an mRNA isolated from cotton (Gossypium hirsutum). As cotton and Arabidopsis belong to rather diverse families (Malvaceae and Crucifereae, respectively), in contrast with Arabidopsis and horseradish (both Crucifereae), the high degree of sequence identity indicates that this novel type of peroxidase, albeit of unknown function, is likely to be widespread in plant species. The atp 1 and atp 2 types of cDNA sequences were the most redundant among the 28 different isoperoxidases identified among about 200 peroxidase encoding ESTs. Interestingly, 8 out of totally 38 EST sequences coding for ATP 1 showed three identical nucleotide substitutions. This variant form is designated ATP 1b. Similarly, six out of totally 16 EST sequences coding for ATP 2 showed a number of deletions and nucleotide changes. This variant form is designated ATP 2b. The selected EST clones are full-length and contain coding regions of 993 nucleotides for atp 1a, and 984 nucleotides for atp 2a. These regions show 61% DNA sequence identity. The predicted mature proteins ATP 1a, and ATP 2a are 57% identical in sequence and contain the structurally and functionally important residues, characteristic of the plant peroxidase superfamily. However, they do show two differences of importance to peroxidase catalysis: (1) the asparagine residue linked with the active site distal histidine via hydrogen bonding is absent

  13. Automated Identification of Medically Important Bacteria by 16S rRNA Gene Sequencing Using a Novel Comprehensive Database, 16SpathDB▿

    PubMed Central

    Woo, Patrick C. Y.; Teng, Jade L. L.; Yeung, Juilian M. Y.; Tse, Herman; Lau, Susanna K. P.; Yuen, Kwok-Yung

    2011-01-01

    Despite the increasing use of 16S rRNA gene sequencing, interpretation of 16S rRNA gene sequence results is one of the most difficult problems faced by clinical microbiologists and technicians. To overcome the problems we encountered in the existing databases during 16S rRNA gene sequence interpretation, we built a comprehensive database, 16SpathDB (http://147.8.74.24/16SpathDB) based on the 16S rRNA gene sequences of all medically important bacteria listed in the Manual of Clinical Microbiology and evaluated its use for automated identification of these bacteria. Among 91 nonduplicated bacterial isolates collected in our clinical microbiology laboratory, 71 (78%) were reported by 16SpathDB as a single bacterial species having >98.0% nucleotide identity with the query sequence, 19 (20.9%) were reported as more than one bacterial species having >98.0% nucleotide identity with the query sequence, and 1 (1.1%) was reported as no match. For the 71 bacterial isolates reported as a single bacterial species, all results were identical to their true identities as determined by a polyphasic approach. For the 19 bacterial isolates reported as more than one bacterial species, all results contained their true identities as determined by a polyphasic approach and all of them had their true identities as the “best match in 16SpathDB.” For the isolate (Gordonibacter pamelaeae) reported as no match, the bacterium has never been reported to be associated with human disease and was not included in the Manual of Clinical Microbiology. 16SpathDB is an automated, user-friendly, efficient, accurate, and regularly updated database for 16S rRNA gene sequence interpretation in clinical microbiology laboratories. PMID:21389154

  14. [Identification and phylogenetic application of unique nucleotide sequence of nad7 intron2 in Rhodiola (Crassulaceae) species].

    PubMed

    Deng, Ke-Jun; Yang, Zu-Jun; Liu, Cheng; Zhao, Wei; Liu, Chang; Feng, Juan; Ren, Zheng-Long

    2007-03-01

    Genetic characterization of 9 populations of Rhodiola crenulata, R. fastigiata and R. sachalinensis (Crassulaceae) species from Sichuan and Jilin Provinces of China, was investigated using the conserved primer of nad7 intron 2. All PCR products about 800 bp long were shorter than other Crassulaceae plants, which were used as molecular markers to identify the Rhodiola species. The sequence of the products indicated that total exon of 53 bp and intron of 738 bp exhibit only 9 nucleotide variations. Blasting the nad7 sequences to GenBank and the phylogenetic analysis showed that the sequence of Rhodiola species was clusted independently, and the length was smaller than all the registered sequences of higher plants. The result suggests that the Rhiodola species had a unique sequence in this gene region, which might be related to the special growth condition.

  15. Nucleotide sequence analysis reveals linked N-acetyl hydrolase, thioesterase, transport, and regulatory genes encoded by the bialaphos biosynthetic gene cluster of Streptomyces hygroscopicus.

    PubMed Central

    Raibaud, A; Zalacain, M; Holt, T G; Tizard, R; Thompson, C J

    1991-01-01

    Nucleotide sequence analysis of a 5,000-bp region of the bialaphos antibiotic production (bap) gene cluster defined five open reading frames (ORFs) which predicted structural genes in the order bah, ORF1, ORF2, and ORF3 followed by the regulatory gene, brpA (H. Anzai, T. Murakami, S. Imai, A. Satoh, K. Nagaoka, and C.J. Thompson, J. Bacteriol. 169:3482-3488, 1987). The four structural genes were translationally coupled and apparently cotranscribed from an undefined promoter(s) under the positive control of the brpA gene product. S1 mapping experiments indicated that brpA was transcribed by two promoters (brpAp1 and brpAp2) which initiate transcription 150 and 157 bp upstream of brp A within an intergenic region and at least one promoter further upstream within the bap gene cluster (brpAp3). All three transcripts were present at low levels during exponential growth and increased just before the stationary phase. The levels of the brpAp3 band continued to increase at the onset of stationary phase, whereas brpAp1-and brpAp2-protected fragments showed no further change. BrpA contained a possible helix-turn-helix motif at its C terminus which was similar to the C-terminal regulatory motif found in the receiver component of a family of two-component transcriptional activator proteins. This motif was not associated with the N-terminal domain conserved in other members of the family. The structural gene cluster sequenced began with bah, encoding a bialaphos acetylhydrolase which removes the N-acetyl group from bialaphos as one of the final steps in the biosynthetic pathway. The observation that Bah was similar to a rat and to a bacterial (Acinetobacter calcoaceticus) lipase probably reflects the fact that the ester bonds of triglycerides and the amide bond linking acetate to phosphinothricin are similar and hydrolysis is catalyzed by structurally related enzymes. This was followed by two regions encoding ORF1 and ORF2 which were similar to each other (48% nucleotide

  16. Genome sequence of a distinct watermelon mosaic virus identified from ginseng (Panax ginseng) transcriptome.

    PubMed

    Park, D; Kim, H; Hahn, Y

    Watermelon mosaic virus (WMV) is a member of the genus Potyvirus, which is the largest genus of plant viruses. WMV is a significant pathogen of crop plants, including Cucurbitaceae species. A WMV strain, designated as WMV-Pg, was identified in transcriptome data collected from ginseng (Panax ginseng) root. WMV-Pg showed 84% nucleotide sequence identity and 91% amino acid sequence identity with its closest related virus, WMV-Fr. A phylogenetic analysis of WMV-Pg with other WMVs and soybean mosaic viruses (SMVs) indicated that WMV-Pg is a distinct subtype of the WMV/SMV group of the genus Potyvirus in the family Potyviridae.

  17. Nucleotide sequence determination of guinea-pig casein B mRNA reveals homology with bovine and rat alpha s1 caseins and conservation of the non-coding regions of the mRNA.

    PubMed Central

    Hall, L; Laird, J E; Craig, R K

    1984-01-01

    Nucleotide sequence analysis of cloned guinea-pig casein B cDNA sequences has identified two casein B variants related to the bovine and rat alpha s1 caseins. Amino acid homology was largely confined to the known bovine or predicted rat phosphorylation sites and within the 'signal' precursor sequence. Comparison of the deduced nucleotide sequence of the guinea-pig and rat alpha s1 casein mRNA species showed greater sequence conservation in the non-coding than in the coding regions, suggesting a functional and possibly regulatory role for the non-coding regions of casein mRNA. The results provide insight into the evolution of the casein genes, and raise questions as to the role of conserved nucleotide sequences within the non-coding regions of mRNA species. Images Fig. 1. PMID:6548375

  18. Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.

    PubMed

    Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C

    2018-01-10

    Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing

  19. Molecular cloning and nucleotide sequence of a transforming gene detected by transfection of chicken B-cell lymphoma DNA

    NASA Astrophysics Data System (ADS)

    Goubin, Gerard; Goldman, Debra S.; Luce, Judith; Neiman, Paul E.; Cooper, Geoffrey M.

    1983-03-01

    A transforming gene detected by transfection of chicken B-cell lymphoma DNA has been isolated by molecular cloning. It is homologous to a conserved family of sequences present in normal chicken and human DNAs but is not related to transforming genes of acutely transforming retroviruses. The nucleotide sequence of the cloned transforming gene suggests that it encodes a protein that is partially homologous to the amino terminus of transferrin and related proteins although only about one tenth the size of transferrin.

  20. Nucleotide Sequence of the blaRTG-2 (CARB-5) Gene and Phylogeny of a New Group of Carbenicillinases

    PubMed Central

    Choury, Daniele; Szajnert, Marie-France; Joly-Guillou, Marie-Laure; Azibi, Kemal; Delpech, Marc; Paul, Gérard

    2000-01-01

    We determined the nucleotide sequence of the bla gene for the Acinetobacter calcoaceticus β-lactamase previously described as CARB-5. Alignment of the deduced amino acid sequence with those of known β-lactamases revealed that CARB-5 possesses an RTG triad in box VII, as described for the Proteus mirabilis GN79 enzyme, instead of the RSG consensus characteristic of the other carbenicillinases. Phylogenetic studies showed that these RTG enzymes constitute a new, separate group, possibly ancestors of the carbenicillinase family. PMID:10722515

  1. Methylation levels of the "long interspersed nucleotide element-1" repetitive sequences predict survival of melanoma patients.

    PubMed

    Sigalotti, Luca; Fratta, Elisabetta; Bidoli, Ettore; Covre, Alessia; Parisi, Giulia; Colizzi, Francesca; Coral, Sandra; Massarut, Samuele; Kirkwood, John M; Maio, Michele

    2011-05-26

    The prognosis of cutaneous melanoma (CM) differs for patients with identical clinico-pathological stage, and no molecular markers discriminating the prognosis of stage III individuals have been established. Genome-wide alterations in DNA methylation are a common event in cancer. This study aimed to define the prognostic value of genomic DNA methylation levels in stage III CM patients. Overall level of genomic DNA methylation was measured using bisulfite pyrosequencing at three CpG sites (CpG1, CpG2, CpG3) of the Long Interspersed Nucleotide Element-1 (LINE-1) sequences in short-term CM cultures from 42 stage IIIC patients. The impact of LINE-1 methylation on overall survival (OS) was assessed using Cox regression and Kaplan-Meier analysis. Hypomethylation (i.e., methylation below median) at CpG2 and CpG3 sites significantly associated with improved prognosis of CM, CpG3 showing the strongest association. Patients with hypomethylated CpG3 had increased OS (P = 0.01, log-rank = 6.39) by Kaplan-Meyer analysis. Median OS of patients with hypomethylated or hypermethylated CpG3 were 31.9 and 11.5 months, respectively. The 5 year OS for patients with hypomethylated CpG3 was 48% compared to 7% for patients with hypermethylated sequences. Among the variables examined by Cox regression analysis, LINE-1 methylation at CpG2 and CpG3 was the only predictor of OS (Hazard Ratio = 2.63, for hypermethylated CpG3; 95% Confidence Interval: 1.21-5.69; P = 0.01). LINE-1 methylation is identified as a molecular marker of prognosis for CM patients in stage IIIC. Evaluation of LINE-1 promises to represent a key tool for driving the most appropriate clinical management of stage III CM patients.

  2. Methylation levels of the "long interspersed nucleotide element-1" repetitive sequences predict survival of melanoma patients

    PubMed Central

    2011-01-01

    Background The prognosis of cutaneous melanoma (CM) differs for patients with identical clinico-pathological stage, and no molecular markers discriminating the prognosis of stage III individuals have been established. Genome-wide alterations in DNA methylation are a common event in cancer. This study aimed to define the prognostic value of genomic DNA methylation levels in stage III CM patients. Methods Overall level of genomic DNA methylation was measured using bisulfite pyrosequencing at three CpG sites (CpG1, CpG2, CpG3) of the Long Interspersed Nucleotide Element-1 (LINE-1) sequences in short-term CM cultures from 42 stage IIIC patients. The impact of LINE-1 methylation on overall survival (OS) was assessed using Cox regression and Kaplan-Meier analysis. Results Hypomethylation (i.e., methylation below median) at CpG2 and CpG3 sites significantly associated with improved prognosis of CM, CpG3 showing the strongest association. Patients with hypomethylated CpG3 had increased OS (P = 0.01, log-rank = 6.39) by Kaplan-Meyer analysis. Median OS of patients with hypomethylated or hypermethylated CpG3 were 31.9 and 11.5 months, respectively. The 5 year OS for patients with hypomethylated CpG3 was 48% compared to 7% for patients with hypermethylated sequences. Among the variables examined by Cox regression analysis, LINE-1 methylation at CpG2 and CpG3 was the only predictor of OS (Hazard Ratio = 2.63, for hypermethylated CpG3; 95% Confidence Interval: 1.21-5.69; P = 0.01). Conclusion LINE-1 methylation is identified as a molecular marker of prognosis for CM patients in stage IIIC. Evaluation of LINE-1 promises to represent a key tool for driving the most appropriate clinical management of stage III CM patients. PMID:21615918

  3. Nucleotide sequence analysis of the gene encoding the Deinococcus radiodurans surface protein, derived amino acid sequence, and complementary protein chemical studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peters, J.; Peters, M.; Lottspeich, F.

    1987-11-01

    The complete nucleotide sequence of the gene encoding the surface (hexagonally packed intermediate (HPI))-layer polypeptide of Deinococcus radiodurans Sark was determined and found to encode a polypeptide of 1036 amino acids. Amino acid sequence analysis of about 30% of the residues revealed that the mature polypeptide consists of at least 978 amino acids. The N terminus was blocked to Edman degradation. The results of proteolytic modification of the HPI layer in situ and M/sub r/ estimations of the HPI polypeptide expressed in Escherichia coli indicated that there is a leader sequence. The N-terminal region contained a very high percentage (29%)more » of threonine and serine, including a cluster of nine consecutive serine or threonine residues, whereas a stretch near the C terminus was extremely rich in aromatic amino acids (29%). The protein contained at least two disulfide bridges, as well as tightly bound reducing sugars and fatty acids.« less

  4. Sequence of a second gene encoding bovine submaxillary mucin: implication for mucin heterogeneity and cloning.

    PubMed

    Jiang, W; Woitach, J T; Gupta, D; Bhavanandan, V P

    1998-10-20

    Secreted epithelial mucins are extremely large and heterogeneous glycoproteins. We report the 5 kilobase DNA sequence of a second gene, BSM2, which encodes bovine submaxillary mucin. The determined nucleotide and deduced amino acid sequences of BSM2 are 95.2% and 92. 2% identical, respectively, to those of the previously described BSM1 gene isolated from the same cow. Further, the five predicted protein domains of the two genes are 100%, 94%, 93%, 77%, and 88% identical. Based on the above results, we propose that expression of multiple homologous core proteins from a single animal is a factor in generating diversity of saccharides in mucins and in providing resistance of the molecules to proteolysis. In addition, this work raises several important issues in mucin cloning such as assembling sequences from seemingly overlapping clones and deducing consensus sequences for nearly identical tandem repeats. Copyright 1998 Academic Press.

  5. A resource of single-nucleotide polymorphisms for rainbow trout generated by restriction-site associated DNA sequencing of doubled haploids

    USDA-ARS?s Scientific Manuscript database

    Salmonid genomes are considered to be in a pseudo-tetraploid state as a result of an evolutionarily recent genome duplication event. This situation complicates single nucleotide polymorphism (SNP) discovery in rainbow trout as many putative SNPs are actually paralogous sequence variants (PSVs) and ...

  6. Identification of single nucleotide polymorphism in ginger using expressed sequence tags

    PubMed Central

    Chandrasekar, Arumugam; Riju, Aikkal; Sithara, Kandiyl; Anoop, Sahadevan; Eapen, Santhosh J

    2009-01-01

    Ginger (Zingiber officinale Rosc) (Family: Zingiberaceae) is a herbaceous perennial, the rhizomes of which are used as a spice. Ginger is a plant which is well known for its medicinal applications. Recently EST-derived SNPs are a free by-product of the currently expanding EST (Expressed Sequence Tag) databases. The development of high-throughput methods for the detection of SNPs (Single Nucleotide Polymorphism) and small indels (insertion/deletion) has led to a revolution in their use as molecular markers. Available (38139) Ginger EST sequences were mined from dbEST of NCBI. CAP3 program was used to assemble EST sequences into contigs. Candidate SNPs and Indel polymorphisms were detected using the perl script AutoSNP version 1.0 which has used 31905 ESTs for detecting SNPs and Indel sites. We found 64026 SNP sites and 7034 indel polymorphisms with frequency of 0.84 SNPs / 100 bp. Among the three tissues from which the EST libraries had been generated, Rhizomes had high frequency of 1.08 SNPs/indels per 100 bp whereas the leaves had lowest frequency of 0.63 per 100 bp and root is showing relative frequency 0.82/100bp. Transitions and transversion ratio is 0.90. In overall detected SNP, transversion is high when compare to transition. These detected SNPs can be used as markers for genetic studies. Availability The results of the present study hosted in our webserver www.spices.res.in/spicesnip PMID:20198184

  7. Stringent control during carbon starvation of marine Vibrio sp. strain S14: molecular cloning, nucleotide sequence, and deletion of the relA gene.

    PubMed Central

    Flärdh, K; Axberg, T; Albertson, N H; Kjelleberg, S

    1994-01-01

    In order to evaluate the role of the stringent response in starvation adaptations of the marine Vibrio sp. strain S14, we have cloned the relA gene and generated relaxed mutants of this organism. The Vibrio relA gene was selected from a chromosomal DNA library by complementation of an Escherichia coli delta relA strain. The nucleotide sequence contains a 743-codon open reading frame that encodes a polypeptide that is identical in length and highly homologous to the E. coli RelA protein. The amino acid sequences are 64% identical, and they share some completely conserved regions. A delta relA::kan allele was generated by replacing 53% of the open reading frame with a kanamycin resistance gene. The Vibrio relA mutants displayed a relaxed control of RNA synthesis and failed to accumulate ppGpp during amino acid limitation. During carbon and energy starvation, a relA-dependent burst of ppGpp synthesis concomitant with carbon source depletion and growth arrest was observed. Also, in the absence of the relA gene, there was an accumulation of ppGpp during carbon starvation, but this was slower and smaller than that which occurred in the stringent strains, and it was preceded by a marked decrease in the [ATP]/[ADP] ratio. In both the wild-type and the relaxed strains, carbon source depletion caused an immediate decrease in the size of the GTP pool and a block of net RNA accumulation. The relA mutation did not affect long-term survival or the development of resistance against heat, ethanol, and oxidative stress during carbon starvation of Vibrio sp. strain S14. PMID:7928955

  8. Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition

    PubMed Central

    Alberti, Adriana; Poulain, Julie; Engelen, Stefan; Labadie, Karine; Romac, Sarah; Ferrera, Isabel; Albini, Guillaume; Aury, Jean-Marc; Belser, Caroline; Bertrand, Alexis; Cruaud, Corinne; Da Silva, Corinne; Dossat, Carole; Gavory, Frédérick; Gas, Shahinaz; Guy, Julie; Haquelle, Maud; Jacoby, E'krame; Jaillon, Olivier; Lemainque, Arnaud; Pelletier, Eric; Samson, Gaëlle; Wessner, Mark; Bazire, Pascal; Beluche, Odette; Bertrand, Laurie; Besnard-Gonnet, Marielle; Bordelais, Isabelle; Boutard, Magali; Dubois, Maria; Dumont, Corinne; Ettedgui, Evelyne; Fernandez, Patricia; Garcia, Espérance; Aiach, Nathalie Giordanenco; Guerin, Thomas; Hamon, Chadia; Brun, Elodie; Lebled, Sandrine; Lenoble, Patricia; Louesse, Claudine; Mahieu, Eric; Mairey, Barbara; Martins, Nathalie; Megret, Catherine; Milani, Claire; Muanga, Jacqueline; Orvain, Céline; Payen, Emilie; Perroud, Peggy; Petit, Emmanuelle; Robert, Dominique; Ronsin, Murielle; Vacherie, Benoit; Acinas, Silvia G.; Royo-Llonch, Marta; Cornejo-Castillo, Francisco M.; Logares, Ramiro; Fernández-Gómez, Beatriz; Bowler, Chris; Cochrane, Guy; Amid, Clara; Hoopen, Petra Ten; De Vargas, Colomban; Grimsley, Nigel; Desgranges, Elodie; Kandels-Lewis, Stefanie; Ogata, Hiroyuki; Poulton, Nicole; Sieracki, Michael E.; Stepanauskas, Ramunas; Sullivan, Matthew B.; Brum, Jennifer R.; Duhaime, Melissa B.; Poulos, Bonnie T.; Hurwitz, Bonnie L.; Acinas, Silvia G.; Bork, Peer; Boss, Emmanuel; Bowler, Chris; De Vargas, Colomban; Follows, Michael; Gorsky, Gabriel; Grimsley, Nigel; Hingamp, Pascal; Iudicone, Daniele; Jaillon, Olivier; Kandels-Lewis, Stefanie; Karp-Boss, Lee; Karsenti, Eric; Not, Fabrice; Ogata, Hiroyuki; Pesant, Stéphane; Raes, Jeroen; Sardet, Christian; Sieracki, Michael E.; Speich, Sabrina; Stemmann, Lars; Sullivan, Matthew B.; Sunagawa, Shinichi; Wincker, Patrick; Pesant, Stéphane; Karsenti, Eric; Wincker, Patrick

    2017-01-01

    A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009–2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world’s planktonic ecosystems. PMID:28763055

  9. Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition.

    PubMed

    Alberti, Adriana; Poulain, Julie; Engelen, Stefan; Labadie, Karine; Romac, Sarah; Ferrera, Isabel; Albini, Guillaume; Aury, Jean-Marc; Belser, Caroline; Bertrand, Alexis; Cruaud, Corinne; Da Silva, Corinne; Dossat, Carole; Gavory, Frédérick; Gas, Shahinaz; Guy, Julie; Haquelle, Maud; Jacoby, E'krame; Jaillon, Olivier; Lemainque, Arnaud; Pelletier, Eric; Samson, Gaëlle; Wessner, Mark; Acinas, Silvia G; Royo-Llonch, Marta; Cornejo-Castillo, Francisco M; Logares, Ramiro; Fernández-Gómez, Beatriz; Bowler, Chris; Cochrane, Guy; Amid, Clara; Hoopen, Petra Ten; De Vargas, Colomban; Grimsley, Nigel; Desgranges, Elodie; Kandels-Lewis, Stefanie; Ogata, Hiroyuki; Poulton, Nicole; Sieracki, Michael E; Stepanauskas, Ramunas; Sullivan, Matthew B; Brum, Jennifer R; Duhaime, Melissa B; Poulos, Bonnie T; Hurwitz, Bonnie L; Pesant, Stéphane; Karsenti, Eric; Wincker, Patrick

    2017-08-01

    A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009-2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world's planktonic ecosystems.

  10. Draft Genome Sequence of Corynebacterium kefirresidentii SB, Isolated from Kefir.

    PubMed

    Blasche, Sonja; Kim, Yongkyu; Patil, Kiran R

    2017-09-14

    The genus Corynebacterium includes Gram-positive species with a high G+C content. We report here a novel species, Corynebacterium kefirresidentii SB, isolated from kefir grains collected in Germany. Its draft genome sequence was remarkably dissimilar (average nucleotide identity, 76.54%) to those of other Corynebacterium spp., confirming that this is a unique novel species. Copyright © 2017 Blasche et al.

  11. Single nucleotide polymorphism analysis of Korean native chickens using next generation sequencing data.

    PubMed

    Seo, Dong-Won; Oh, Jae-Don; Jin, Shil; Song, Ki-Duk; Park, Hee-Bok; Heo, Kang-Nyeong; Shin, Younhee; Jung, Myunghee; Park, Junhyung; Jo, Cheorun; Lee, Hak-Kyo; Lee, Jun-Heon

    2015-02-01

    There are five native chicken lines in Korea, which are mainly classified by plumage colors (black, white, red, yellow, gray). These five lines are very important genetic resources in the Korean poultry industry. Based on a next generation sequencing technology, whole genome sequence and reference assemblies were performed using Gallus_gallus_4.0 (NCBI) with whole genome sequences from these lines to identify common and novel single nucleotide polymorphisms (SNPs). We obtained 36,660,731,136 ± 1,257,159,120 bp of raw sequence and average 26.6-fold of 25-29 billion reference assembly sequences representing 97.288 % coverage. Also, 4,006,068 ± 97,534 SNPs were observed from 29 autosomes and the Z chromosome and, of these, 752,309 SNPs are the common SNPs across lines. Among the identified SNPs, the number of novel- and known-location assigned SNPs was 1,047,951 ± 14,956 and 2,948,648 ± 81,414, respectively. The number of unassigned known SNPs was 1,181 ± 150 and unassigned novel SNPs was 8,238 ± 1,019. Synonymous SNPs, non-synonymous SNPs, and SNPs having character changes were 26,266 ± 1,456, 11,467 ± 604, 8,180 ± 458, respectively. Overall, 443,048 ± 26,389 SNPs in each bird were identified by comparing with dbSNP in NCBI. The presently obtained genome sequence and SNP information in Korean native chickens have wide applications for further genome studies such as genetic diversity studies to detect causative mutations for economic and disease related traits.

  12. Empirical Bayes Estimation of Coalescence Times from Nucleotide Sequence Data.

    PubMed

    King, Leandra; Wakeley, John

    2016-09-01

    We demonstrate the advantages of using information at many unlinked loci to better calibrate estimates of the time to the most recent common ancestor (TMRCA) at a given locus. To this end, we apply a simple empirical Bayes method to estimate the TMRCA. This method is both asymptotically optimal, in the sense that the estimator converges to the true value when the number of unlinked loci for which we have information is large, and has the advantage of not making any assumptions about demographic history. The algorithm works as follows: we first split the sample at each locus into inferred left and right clades to obtain many estimates of the TMRCA, which we can average to obtain an initial estimate of the TMRCA. We then use nucleotide sequence data from other unlinked loci to form an empirical distribution that we can use to improve this initial estimate. Copyright © 2016 by the Genetics Society of America.

  13. Complete Genomic Sequence and Comparative Analysis of the Genome Segments of Sweet Potato Chlorotic Stunt Virus in China

    PubMed Central

    Qin, Yanhong; Wang, Li; Zhang, Zhenchen; Qiao, Qi; Zhang, Desheng; Tian, Yuting; Wang, Shuang; Wang, Yongjiang; Yan, Zhaoling

    2014-01-01

    Background Sweet potato chlorotic stunt virus (family Closteroviridae, genus Crinivirus) features a large bipartite, single-stranded, positive-sense RNA genome. To date, only three complete genomic sequences of SPCSV can be accessed through GenBank. SPCSV was first detected from China in 2011, only partial genomic sequences have been determined in the country. No report on the complete genomic sequence and genome structure of Chinese SPCSV isolates or the genetic relation between isolates from China and other countries is available. Methodology/Principal Findings The complete genomic sequences of five isolates from different areas in China were characterized. This study is the first to report the complete genome sequences of SPCSV from whitefly vectors. Genome structure analysis showed that isolates of WA and EA strains from China have the same coding protein as isolates Can181-9 and m2-47, respectively. Twenty cp genes and four RNA1 partial segments were sequenced and analyzed, and the nucleotide identities of complete genomic, cp, and RNA1 partial sequences were determined. Results indicated high conservation among strains and significant differences between WA and EA strains. Genetic analysis demonstrated that, except for isolates from Guangdong Province, SPCSVs from other areas belong to the WA strain. Genome organization analysis showed that the isolates in this study lack the p22 gene. Conclusions/Significance We presented the complete genome sequences of SPCSV in China. Comparison of nucleotide identities and genome structures between these isolates and previously reported isolates showed slight differences. The nucleotide identities of different SPCSV isolates showed high conservation among strains and significant differences between strains. All nine isolates in this study lacked p22 gene. WA strains were more extensively distributed than EA strains in China. These data provide important insights into the molecular variation and genomic structure of SPCSV

  14. Nucleotide sequence and transcriptional start site of the Methylobacterium organophilum XX methanol dehydrogenase structural gene

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Machlin, S.M.; Hanson, R.S.

    The nucleotide sequence of a cloned 2.5-kilobase-pair SmaI fragment containing the methanol dehydrogenase (MDH) structural gene from Methylobacterium organophilum XX was determined. A single open reading frame with a coding capacity of 626 amino acids (molecular weight, 66,000) was identified on one stand, and N-terminal sequencing of purified MDH revealed that 27 of these residues constituted a putative signal peptide. Primer extension mapping of in vivo transcripts indicated that the start of mRNA synthesis was 160 to 170 base pairs upstream of the ATG codon. Northern (RNA) blot analysis further demonstrated that the transcript was 2.1 kilobase pairs in lengthmore » and therefore appeared to encode only MDH.« less

  15. A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data

    PubMed Central

    Feng, Hao; Conneely, Karen N.; Wu, Hao

    2014-01-01

    DNA methylation is an important epigenetic modification that has essential roles in cellular processes including gene regulation, development and disease and is widely dysregulated in most types of cancer. Recent advances in sequencing technology have enabled the measurement of DNA methylation at single nucleotide resolution through methods such as whole-genome bisulfite sequencing and reduced representation bisulfite sequencing. In DNA methylation studies, a key task is to identify differences under distinct biological contexts, for example, between tumor and normal tissue. A challenge in sequencing studies is that the number of biological replicates is often limited by the costs of sequencing. The small number of replicates leads to unstable variance estimation, which can reduce accuracy to detect differentially methylated loci (DML). Here we propose a novel statistical method to detect DML when comparing two treatment groups. The sequencing counts are described by a lognormal-beta-binomial hierarchical model, which provides a basis for information sharing across different CpG sites. A Wald test is developed for hypothesis testing at each CpG site. Simulation results show that the proposed method yields improved DML detection compared to existing methods, particularly when the number of replicates is low. The proposed method is implemented in the Bioconductor package DSS. PMID:24561809

  16. A novel HLA-B allele, B*5214, detected in a Taiwanese volunteer bone marrow donor using a sequence-based typing method.

    PubMed

    Chen, M J; Chu, C C; Shyr, M H; Lin, C L; Lin, P Y; Yang, K L

    2010-02-01

    HLA-B*5214, a novel rare allele of HLA-B*52 variant, was found in a Taiwanese volunteer bone marrow donor by sequence-based typing method. The sequence of B*5214 is identical to that of B*520101 in exon 2 but differs from B*520101 in exon 3 at nucleotide positions 419 A-->T and 435 A-->G. Alteration of these two nucleotides resulted an amino acid substitution at amino acid residue 116 Y-->F ( TAC-->TTC) and a silent exchange at residue 121 K-->K (AAA-->AAG).

  17. Effects of transcriptional start site sequence and position on nucleotide-sensitive selection of alternative start sites at the pyrC promoter in Escherichia coli.

    PubMed Central

    Liu, J; Turnbough, C L

    1994-01-01

    In Escherichia coli, expression of the pyrC gene is regulated primarily by a translational control mechanism based on nucleotide-sensitive selection of transcriptional start sites at the pyrC promoter. When intracellular levels of CTP are high, pyrC transcripts are initiated predominantly with CTP at a site 7 bases downstream of the Pribnow box. These transcripts form a stable hairpin at their 5' ends that blocks ribosome binding. When the CTP level is low and the GTP level is high, conditions found in pyrimidine-limited cells, transcripts are initiated primarily with GTP at a site 9 bases downstream of the Pribnow box. These shorter transcripts are unable to form a hairpin at their 5' ends and are readily translated. In this study, we examined the effects of nucleotide sequence and position on the selection of transcriptional start sites at the pyrC promoter. We characterized promoter mutations that systematically alter the sequence at position 7 or 9 downstream of the Pribnow box or vary the spacing between the Pribnow box and wild-type transcriptional initiation region. The results reveal preferences for particular initiating nucleotides (ATP > or = GTP > UTP >> CTP) and for starting positions downstream of the Pribnow box (7 >> 6 and 8 > 9 > 10). The results indicate that optimal nucleotide-sensitive start site switching at the wild-type pyrC promoter is the result of competition between the preferred start site (position 7) that uses the poorest initiating nucleotide (CTP) and a weak start site (position 9) that uses a good initiating nucleotide (GTP). The sequence of the pyrC promoter also minimizes the synthesis of untranslatable transcripts and provides for maximum stability of the regulatory transcript hairpin. In addition, the results show that the effects of the mutations on pyrC expression and regulation are consistent with the current model for translational control. Possible effects of preferences for initiating nucleotides and start sites on the

  18. Nucleotide sequence of a cluster of early and late genes in a conserved segment of the vaccinia virus genome.

    PubMed Central

    Plucienniczak, A; Schroeder, E; Zettlmeissl, G; Streeck, R E

    1985-01-01

    The nucleotide sequence of a 7.6 kb vaccinia DNA segment from a genomic region conserved among different orthopox virus has been determined. This segment contains a tight cluster of 12 partly overlapping open reading frames most of which can be correlated with previously identified early and late proteins and mRNAs. Regulatory signals used by vaccinia virus have been studied. Presumptive promoter regions are rich in A, T and carry the consensus sequences TATA and AATAA spaced at 20-24 base pairs. Tandem repeats of a CTATTC consensus sequence are proposed to be involved in the termination of early transcription. PMID:2987815

  19. ITS rDNA sequences of Pomphorhynchus laevis (Zoega in Müller, 1776) and P. lucyi Williams and Rogers, 1984 (Acanthocephala: Palaeacanthocephala).

    PubMed

    Král'ová-Hromadová, Iva; Tietz, David F; Shinn, Andrew P; Spakulová, Marta

    2003-10-01

    The internal transcribed spacers (ITS-1 and ITS-2) of the ribosomal RNA gene of Pomphorhynchus laevis (Zoega in Müller, 1776) (Acanthocephala) isolated from various fish species across Central and Southern Europe were compared with those of P. lucyi Williams and Rogers, 1984 collected from the largemouth bass Micropterus salmonoides Boulenger from the USA. The nucleotide sequences of ITS regions of P. laevis from minnows Phoxinus phoxinus (L.) and chub Leuciscus cephalus (L.) from two distant localities in the Slovak Republic were found to be 100% identical. The ITS-1 and ITS-2 of P. laevis from chub from the Czech Republic and Italy were also mutually identical, but significantly different from Slovak worms (88.7% identity for ITS-1, 91.3% identity for ITS-2). A fifth sample collected from Barbus tyberinus Bonaparte from Italy was very similar to the sympatric Italian isolate from chub, possessing four nucleotide substitutions in ITS-1 (98.4% identity). The ITS rDNA sequences of P. lucyi differed significantly from those of P. laevis; the values of identity were 51.8-56.1% for ITS-1 and 63.1-65.3% for ITS-2, and were significantly higher than the range of P. laevis within-species variability. The results based on the ITS sequences confirmed the occurrence of strains in P. laevis from Continental Europe which are well defined by molecules but reveal only slight differences in their morphology.

  20. The complete nucleotide sequence of the domestic dog (Canis familiaris) mitochondrial genome.

    PubMed

    Kim, K S; Lee, S E; Jeong, H W; Ha, J H

    1998-10-01

    The complete nucleotide sequence of the mitochondrial genome of the domestic dog, Canis familiaris, was determined. The length of the sequence was 16,728 bp; however, the length was not absolute due to the variation (heteroplasmy) caused by differing numbers of the repetitive motif, 5'-GTACACGT(A/G)C-3', in the control region. The genome organization, gene contents, and codon usage conformed to those of other mammalian mitochondrial genomes. Although its features were unknown, the "CTAGA" duplication event which followed the translational stop codon of the COII gene was not observed in other mammalian mitochondrial genomes. In order to determine the possible differences between mtDNAs in carnivores, two rRNA and 13 protein-coding genes from the cat, dog, and seal were compared. The combined molecular differences, in two rRNA genes as well as in the inferred amino acid sequences of the mitochondrial 13 protein-coding genes, suggested that there is a closer relationship between the dog and the seal than there is between either of these species and the cat. Based on the molecular differences of the mtDNA, the evolutionary divergence between the cat, the dog, and the seal was dated to approximately 50 +/- 4 million years ago. The degree of difference between carnivore mtDNAs varied according to the individual protein-coding gene applied, showing that the evolutionary relationships of distantly related species should be presented in an extended study based on ample sequence data like complete mtDNA molecules. Copyright 1998 Academic Press.

  1. Molecular cloning and nucleotide sequence of CYP6BF1 from the diamondback moth, Plutella xylostella

    PubMed Central

    Li, Hongshan; Dai, Huaguo; Wei, Hui

    2005-01-01

    A novel cDNA clong encoding a cytochrome P450 was screened from the insecticide-susceptible strain of Plutella xylostella (L.) (Lepidoptera:Yponomeutidae). The nucleotide sequence of the clone, designated CYP6BF1, was determined. This is the first full-length sequence of the CYP6 family from Plutella xylostella (L.). The cDNA is 1661bp in length and contains an open reading frame from base pairs 26 to 1570, encoding a protein of 514 amino acid residues. It is similar to the other insect P450s in gene family 6, including CYP6AE1 from Depressaria pastinacella, (46%). The GenBank accession number is AY971374. PMID:17119627

  2. Cyclic nucleotide content of tobacco BY-2 cells.

    PubMed

    Richards, Helen; Das, Swadipa; Smith, Christopher J; Pereira, Louisa; Geisbrecht, Alan; Devitt, Nicola J; Games, David E; van Geyschem, Jan; Gareth Brenton, A; Newton, Russell P

    2002-11-01

    The cyclic nucleotide content of cultured tobacco bright yellow-2 (BY-2) cells was determined, after freeze-killing, perchlorate extraction and sequential chromatography, by radioimmunoassay. The identities of the putative cyclic nucleotides, adenosine 3',5'-cyclic monophosphate (cyclic AMP), guanosine 3',5'-cyclic monophosphate (cyclic GMP) and cytidine 3',5'-cyclic monophosphate (cyclic CMP) were unambiguously confirmed by tandem mass spectrometry. The potential of BY-2 cell cultures as a model system for future investigations of cyclic nucleotide function in higher plants is discussed.

  3. Correlations of nucleotide substitution rates and base composition of mammalian coding sequences with protein structure.

    PubMed

    Chiusano, M L; D'Onofrio, G; Alvarez-Valin, F; Jabbari, K; Colonna, G; Bernardi, G

    1999-09-30

    We investigated the relationships between the nucleotide substitution rates and the predicted secondary structures in the three states representation (alpha-helix, beta-sheet, and coil). The analysis was carried out on 34 alignments, each of which comprised sequences belonging to at least four different mammalian orders. The rates of synonymous substitution were found to be significantly different in regions predicted to be alpha-helix, beta-sheet, or coil. Likewise, the nonsynonymous rates also differ, although expectedly at a lower extent, in the three types of secondary structure, suggesting that different selective constraints associated with the different structures are affecting in a similar way the synonymous and nonsynonymous rates. Moreover, the base composition of the third codon positions is different in coding sequence regions corresponding to different secondary structures of proteins.

  4. Molecular variability analysis of five new complete cacao swollen shoot virus genomic sequences.

    PubMed

    Muller, E; Sackey, S

    2005-01-01

    Cacao swollen shoot virus (CSSV), a member of the family Caulimovi-ridae, genus Badnavirus occurs in all the main cacao-growing areas of West Africa. We amplified, cloned and sequenced complete genomes of five new isolates, two originating from Togo and three originating from Ghana. The genome of these five newly sequenced isolates all contain the five putative open reading frames I, II, III, X and Y described for the first sequenced CSSV isolate, Agou1 originating from Togo. Their genomes have been aligned with the genome of Agou1. The nucleotide and amino acid sequence identities between isolates have been calculated and a phylogenetic analysis has been made including other pararetroviruses. Maximum nucleotide sequence variability between complete genomes of CSSV isolates was 29.4%. Geographical differentiation between isolates appears more important than differentiation between mild and severe isolates. ORF X differs greatly in size and sequence between the Togolese isolates Nyongbo2 and Agou1, and the four other isolates, its functional role is therefore clearly questionable.

  5. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.

    PubMed

    Mizianty, Marcin J; Kurgan, Lukasz

    2009-12-13

    Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. The improved predictions stem from the novel features that express collocation

  6. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

    PubMed Central

    2009-01-01

    Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel

  7. A Comparative Genomics Strategy for Targeted Discovery of Single-Nucleotide Polymorphisms and Conserved-Noncoding Sequences in Orphan Crops1[W

    PubMed Central

    Feltus, F.A.; Singh, H.P.; Lohithaswa, H.C.; Schulze, S.R.; Silva, T.D.; Paterson, A.H.

    2006-01-01

    Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species. PMID:16607031

  8. Single nucleotide primer extension to detect genetic diseases: Experimental application to hemophilia B (factor IX) and cystic fibrosis genes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kuppuswamy, M.N.; Hoffmann, J.W.; Spitzer, S.G.

    1991-02-15

    In this report, the authors describe an approach to detect the presence of abnormal alleles in those genetic diseases in which frequency of occurrence of the same mutation is high (e.g., hemophilia B). Initially, from each subject, the DNA fragment containing the putative mutation site is amplified by the polymerase chain reaction. For each fragment two reaction mixtures are then prepared. Each contains the amplified fragment, a primer (18-mer or longer) whose sequence is identical to the coding sequence of the normal gene immediately flanking the 5{prime} end of the mutation site, and either an {alpha}-{sup 32}P-labeled nucleotide corresponding tomore » the normal coding sequence at the mutation site or an {alpha}-{sup 32}P-labeled nucleotide corresponding to the mutant sequence. An essential feature of the present methodology is that the base immediately 3{prime} to the template-bound primer is one of those altered in the mutant, since in this way an extension of the primer by a single base will give an extended molecule characteristic of either the mutant or the wild type. The method is rapid and should be useful in carrier detection and prenatal diagnosis of every genetic disease with a known sequence variation.« less

  9. Molecular identification and partial sequence analysis of an aryl hydrocarbon receptor from beluga (Delphinapterus leucas)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jensen, B.A.; Hahn, M.E.

    1995-12-31

    The aryl hydrocarbon receptor (AhR) mediates the effects of many common and potentially toxic organic hydrocarbons, including some polychlorinated biphenyls and dioxins. Since small cetaceans often inhabit industrially polluted coastal waters, comparison of the molecular structure and function of this protein in cetaeans with other marine and mammalian species is important for evaluating the sensitivity of cetaceans to these pollutants. An AhR protein has been identified in beluga liver by photoaffinity labeling. In the present study, the authors sought to clone and sequence an AhR cDNA from beluga as a prelude to studying its structure and function, using reverse-transcription polymerasemore » chain reaction (RT-PCR) and degenerate primers, a 515 base pair fragment was amplified, cloned and sequenced, revealing homology to the PAS domain (ligand binding and dimerization region) of AhRs from terrestrial mammals. This portion of the putative beluga AhR has 82% amino acid and 81% nucleotide sequence identity to the mouse AhR, and 63% amino acid and 64% nucleotide sequence identity to an AhR from the marine fish Fundulus heteroclitus. A beluga cDNA library was synthesized and is currently being screened with the PCR-generated fragment to obtain the complete coding sequence. This is the first molecular evidence of AhR presence in cetaceans.« less

  10. The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition

    NASA Astrophysics Data System (ADS)

    Štambuk, Nikola

    The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.

  11. Massively parallel sequencing of 68 insertion/deletion markers identifies novel microhaplotypes for utility in human identity testing.

    PubMed

    Wendt, Frank R; Warshauer, David H; Zeng, Xiangpei; Churchill, Jennifer D; Novroski, Nicole M M; Song, Bing; King, Jonathan L; LaRue, Bobby L; Budowle, Bruce

    2016-11-01

    Short tandem repeat (STR) loci are the traditional markers used for kinship, missing persons, and direct comparison human identity testing. These markers hold considerable value due to their highly polymorphic nature, amplicon size, and ability to be multiplexed. However, many STRs are still too large for use in analysis of highly degraded DNA. Small bi-allelic polymorphisms, such as insertions/deletions (INDELs), may be better suited for analyzing compromised samples, and their allele size differences are amenable to analysis by capillary electrophoresis. The INDEL marker allelic states range in size from 2 to 6 base pairs, enabling small amplicon size. In addition, heterozygote balance may be increased by minimizing preferential amplification of the smaller allele, as is more common with STR markers. Multiplexing a large number of INDELs allows for generating panels with high discrimination power. The Nextera™ Rapid Capture Custom Enrichment Kit (Illumina, Inc., San Diego, CA) and massively parallel sequencing (MPS) on the Illumina MiSeq were used to sequence 68 well-characterized INDELs in four major US population groups. In addition, the STR Allele Identification Tool: Razor (STRait Razor) was used in a novel way to analyze INDEL sequences and detect adjacent single nucleotide polymorphisms (SNPs) and other polymorphisms. This application enabled the discovery of unique allelic variants, which increased the discrimination power and decreased the single-locus random match probabilities (RMPs) of 22 of these well-characterized INDELs which can be considered as microhaplotypes. These findings suggest that additional microhaplotypes containing human identification (HID) INDELs may exist elsewhere in the genome. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  12. Identification of protein-interacting nucleotides in a RNA sequence using composition profile of tri-nucleotides.

    PubMed

    Panwar, Bharat; Raghava, Gajendra P S

    2015-04-01

    The RNA-protein interactions play a diverse role in the cells, thus identification of RNA-protein interface is essential for the biologist to understand their function. In the past, several methods have been developed for predicting RNA interacting residues in proteins, but limited efforts have been made for the identification of protein-interacting nucleotides in RNAs. In order to discriminate protein-interacting and non-interacting nucleotides, we used various classifiers (NaiveBayes, NaiveBayesMultinomial, BayesNet, ComplementNaiveBayes, MultilayerPerceptron, J48, SMO, RandomForest, SMO and SVM(light)) for prediction model development using various features and achieved highest 83.92% sensitivity, 84.82 specificity, 84.62% accuracy and 0.62 Matthew's correlation coefficient by SVM(light) based models. We observed that certain tri-nucleotides like ACA, ACC, AGA, CAC, CCA, GAG, UGA, and UUU preferred in protein-interaction. All the models have been developed using a non-redundant dataset and are evaluated using five-fold cross validation technique. A web-server called RNApin has been developed for the scientific community (http://crdd.osdd.net/raghava/rnapin/). Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale.

    PubMed

    Liu, Siyang; Huang, Shujia; Rao, Junhua; Ye, Weijian; Krogh, Anders; Wang, Jun

    2015-01-01

    Comprehensive recognition of genomic variation in one individual is important for understanding disease and developing personalized medication and treatment. Many tools based on DNA re-sequencing exist for identification of single nucleotide polymorphisms, small insertions and deletions (indels) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction of population-scale pan-genomes. Our study also highlights the usefulness of the de novo assembly strategy for definition of genome structure.

  14. Evaluation of atpB nucleotide sequences for phylogenetic studies of ferns and other pteridophytes.

    PubMed

    Wolf, P

    1997-10-01

    Inferring basal relationships among vascular plants poses a major challenge to plant systematists. The divergence events that describe these relationships occurred long ago and considerable homoplasy has since accrued for both molecular and morphological characters. A potential solution is to examine phylogenetic analyses from multiple data sets. Here I present a new source of phylogenetic data for ferns and other pteridophytes. I sequenced the chloroplast gene atpB from 23 pteridophyte taxa and used maximum parsimony to infer relationships. A 588-bp region of the gene appeared to contain a statistically significant amount of phylogenetic signal and the resulting trees were largely congruent with similar analyses of nucleotide sequences from rbcL. However, a combined analysis of atpB plus rbcL produced a better resolved tree than did either data set alone. In the shortest trees, leptosporangiate ferns formed a monophyletic group. Also, I detected a well-supported clade of Psilotaceae (Psilotum and Tmesipteris) plus Ophioglossaceae (Ophioglossum and Botrychium). The demonstrated utility of atpB suggests that sequences from this gene should play a role in phylogenetic analyses that incorporate data from chloroplast genes, nuclear genes, morphology, and fossil data.

  15. Complex structural behavior of oligopurine-oligopyrimidine sequence cloned within the supercoiled plasmid.

    PubMed Central

    Parniewski, P; Galazka, G; Wilk, A; Klysik, J

    1989-01-01

    Synthetic sequence GATCC(AG)7ATCG(AT)4CG(AG)7 was cloned into plasmid and its structural behavior under the influence of supercoiling was analysed by chemical modification at variety of experimental conditions. It was found that this sequence adopts at least two different non-B conformations depending on -delta and pH values. Moreover, 12 nucleotide long non-pur.pyr spacer region separating two identical (AG)7 blocks does not provide a significant energy barrier protecting against unusual structures formation. Images PMID:2644622

  16. PCR amplification and sequence analysis of the major capsid protein gene of megalocytiviruses isolated in Taiwan.

    PubMed

    Wang, C S; Chao, S Y; Ku, C C; Wen, C M; Shih, H H

    2009-06-01

    Viruses belonging to the genus Megalocytivirus in the family Iridoviridae are one of the major agents causing mass mortalities in marine and freshwater fish in Asian countries. Outbreaks of iridovirus disease have been reported among various fish species in Taiwan. However, the genotypes of these iridoviruses have not yet been determined. In this study, seven megalocytivirus isolates from four fish species: king grouper, Epinephelus lanceolatus (Bloch), barramundi perch, Lates calcarifer (Bloch), silver sea bream, Rhabdosargus sarba (Forsskal), and common ponyfish, Leiognathus equulus (Forsskal), cultured in three different regions of Taiwan were collected. The full open reading frame encoding the viral major capsid protein gene was amplified using PCR. The PCR products of approximately 1581 bp were cloned and the nucleotide sequences were phylogenetically analysed. Results showed that all seven PCR products contained a unique open reading frame with 1362 nucleotides and encoded a structural protein with 453 amino acids. Even though the nucleotide sequences were not identical, these seven megalocytiviruses were classified into one cluster and showed very high homology with red sea bream iridovirus (RSIV) with more than 97% identity. Thus, the seven iridovirus strains isolated from cultured marine fish in Taiwan were closer to the RSIV genotype than the infectious spleen and kidney necrosis virus genotype.

  17. Energy efficiency trade-offs drive nucleotide usage in transcribed regions

    PubMed Central

    Chen, Wei-Hua; Lu, Guanting; Bork, Peer; Hu, Songnian; Lercher, Martin J.

    2016-01-01

    Efficient nutrient usage is a trait under universal selection. A substantial part of cellular resources is spent on making nucleotides. We thus expect preferential use of cheaper nucleotides especially in transcribed sequences, which are often amplified thousand-fold compared with genomic sequences. To test this hypothesis, we derive a mutation-selection-drift equilibrium model for nucleotide skews (strand-specific usage of ‘A' versus ‘T' and ‘G' versus ‘C'), which explains nucleotide skews across 1,550 prokaryotic genomes as a consequence of selection on efficient resource usage. Transcription-related selection generally favours the cheaper nucleotides ‘U' and ‘C' at synonymous sites. However, the information encoded in mRNA is further amplified through translation. Due to unexpected trade-offs in the codon table, cheaper nucleotides encode on average energetically more expensive amino acids. These trade-offs apply to both strand-specific nucleotide usage and GC content, causing a universal bias towards the more expensive nucleotides ‘A' and ‘G' at non-synonymous coding sites. PMID:27098217

  18. Detecting authorized and unauthorized genetically modified organisms containing vip3A by real-time PCR and next-generation sequencing.

    PubMed

    Liang, Chanjuan; van Dijk, Jeroen P; Scholtens, Ingrid M J; Staats, Martijn; Prins, Theo W; Voorhuijzen, Marleen M; da Silva, Andrea M; Arisi, Ana Carolina Maisonnave; den Dunnen, Johan T; Kok, Esther J

    2014-04-01

    The growing number of biotech crops with novel genetic elements increasingly complicates the detection of genetically modified organisms (GMOs) in food and feed samples using conventional screening methods. Unauthorized GMOs (UGMOs) in food and feed are currently identified through combining GMO element screening with sequencing the DNA flanking these elements. In this study, a specific and sensitive qPCR assay was developed for vip3A element detection based on the vip3Aa20 coding sequences of the recently marketed MIR162 maize and COT102 cotton. Furthermore, SiteFinding-PCR in combination with Sanger, Illumina or Pacific BioSciences (PacBio) sequencing was performed targeting the flanking DNA of the vip3Aa20 element in MIR162. De novo assembly and Basic Local Alignment Search Tool searches were used to mimic UGMO identification. PacBio data resulted in relatively long contigs in the upstream (1,326 nucleotides (nt); 95 % identity) and downstream (1,135 nt; 92 % identity) regions, whereas Illumina data resulted in two smaller contigs of 858 and 1,038 nt with higher sequence identity (>99 % identity). Both approaches outperformed Sanger sequencing, underlining the potential for next-generation sequencing in UGMO identification.

  19. Reading biological processes from nucleotide sequences

    NASA Astrophysics Data System (ADS)

    Murugan, Anand

    Cellular processes have traditionally been investigated by techniques of imaging and biochemical analysis of the molecules involved. The recent rapid progress in our ability to manipulate and read nucleic acid sequences gives us direct access to the genetic information that directs and constrains biological processes. While sequence data is being used widely to investigate genotype-phenotype relationships and population structure, here we use sequencing to understand biophysical mechanisms. We present work on two different systems. First, in chapter 2, we characterize the stochastic genetic editing mechanism that produces diverse T-cell receptors in the human immune system. We do this by inferring statistical distributions of the underlying biochemical events that generate T-cell receptor coding sequences from the statistics of the observed sequences. This inferred model quantitatively describes the potential repertoire of T-cell receptors that can be produced by an individual, providing insight into its potential diversity and the probability of generation of any specific T-cell receptor. Then in chapter 3, we present work on understanding the functioning of regulatory DNA sequences in both prokaryotes and eukaryotes. Here we use experiments that measure the transcriptional activity of large libraries of mutagenized promoters and enhancers and infer models of the sequence-function relationship from this data. For the bacterial promoter, we infer a physically motivated 'thermodynamic' model of the interaction of DNA-binding proteins and RNA polymerase determining the transcription rate of the downstream gene. For the eukaryotic enhancers, we infer heuristic models of the sequence-function relationship and use these models to find synthetic enhancer sequences that optimize inducibility of expression. Both projects demonstrate the utility of sequence information in conjunction with sophisticated statistical inference techniques for dissecting underlying biophysical

  20. Protecting genomic sequence anonymity with generalization lattices.

    PubMed

    Malin, B A

    2005-01-01

    Current genomic privacy technologies assume the identity of genomic sequence data is protected if personal information, such as demographics, are obscured, removed, or encrypted. While demographic features can directly compromise an individual's identity, recent research demonstrates such protections are insufficient because sequence data itself is susceptible to re-identification. To counteract this problem, we introduce an algorithm for anonymizing a collection of person-specific DNA sequences. The technique is termed DNA lattice anonymization (DNALA), and is based upon the formal privacy protection schema of k -anonymity. Under this model, it is impossible to observe or learn features that distinguish one genetic sequence from k-1 other entries in a collection. To maximize information retained in protected sequences, we incorporate a concept generalization lattice to learn the distance between two residues in a single nucleotide region. The lattice provides the most similar generalized concept for two residues (e.g. adenine and guanine are both purines). The method is tested and evaluated with several publicly available human population datasets ranging in size from 30 to 400 sequences. Our findings imply the anonymization schema is feasible for the protection of sequences privacy. The DNALA method is the first computational disclosure control technique for general DNA sequences. Given the computational nature of the method, guarantees of anonymity can be formally proven. There is room for improvement and validation, though this research provides the groundwork from which future researchers can construct genomics anonymization schemas tailored to specific datasharing scenarios.

  1. Nucleotide sequence of the phosphoglycerate kinase gene from the extreme thermophile Thermus thermophilus. Comparison of the deduced amino acid sequence with that of the mesophilic yeast phosphoglycerate kinase.

    PubMed Central

    Bowen, D; Littlechild, J A; Fothergill, J E; Watson, H C; Hall, L

    1988-01-01

    Using oligonucleotide probes derived from amino acid sequencing information, the structural gene for phosphoglycerate kinase from the extreme thermophile, Thermus thermophilus, was cloned in Escherichia coli and its complete nucleotide sequence determined. The gene consists of an open reading frame corresponding to a protein of 390 amino acid residues (calculated Mr 41,791) with an extreme bias for G or C (93.1%) in the codon third base position. Comparison of the deduced amino acid sequence with that of the corresponding mesophilic yeast enzyme indicated a number of significant differences. These are discussed in terms of the unusual codon bias and their possible role in enhanced protein thermal stability. Images Fig. 1. PMID:3052437

  2. Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome

    PubMed Central

    Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred

    2014-01-01

    Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield. PMID:25333064

  3. Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome.

    PubMed

    Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred

    2014-09-01

    Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield.

  4. Sequence analysis of Jembrana disease virus strains reveals a genetically stable lentivirus.

    PubMed

    Desport, Moira; Stewart, Meredith E; Mikosza, Andrew S; Sheridan, Carol A; Peterson, Shane E; Chavand, Olivier; Hartaningsih, Nining; Wilcox, Graham E

    2007-06-01

    Jembrana disease virus (JDV) is a lentivirus associated with an acute disease syndrome with a 20% case fatality rate in Bos javanicus (Bali cattle) in Indonesia, occurring after a short incubation period and with no recurrence of the disease after recovery. Partial regions of gag and pol and the entire env were examined for sequence variation in DNA samples from cases of Jembrana disease obtained from Bali, Sumatra and South Kalimantan in Indonesian Borneo. A high level of nucleotide conservation (97-100%) was observed in gag sequences from samples taken in Bali and Sumatra, indicating that the source of JDV in Sumatra was most likely to have originated from Bali. The pol sequences and, unexpectedly, the env sequences from Bali samples were also well conserved with low nucleotide (96-99%) and amino acid substitutions (95-99%). However, the sample from South Kalimantan (JDV(KAL/01)) contained more divergent sequences, particularly in env (88% identity). Phylogenetic analysis revealed that the JDV(KAL/01)env sequences clustered with the sequence from the Pulukan sample (Bali) from 2001. JDV appears to be remarkably stable genetically and has undergone minor genetic changes over a period of nearly 20 years in Bali despite becoming endemic in the cattle population of the island.

  5. Analysing grouping of nucleotides in DNA sequences using lumped processes constructed from Markov chains.

    PubMed

    Guédon, Yann; d'Aubenton-Carafa, Yves; Thermes, Claude

    2006-03-01

    The most commonly used models for analysing local dependencies in DNA sequences are (high-order) Markov chains. Incorporating knowledge relative to the possible grouping of the nucleotides enables to define dedicated sub-classes of Markov chains. The problem of formulating lumpability hypotheses for a Markov chain is therefore addressed. In the classical approach to lumpability, this problem can be formulated as the determination of an appropriate state space (smaller than the original state space) such that the lumped chain defined on this state space retains the Markov property. We propose a different perspective on lumpability where the state space is fixed and the partitioning of this state space is represented by a one-to-many probabilistic function within a two-level stochastic process. Three nested classes of lumped processes can be defined in this way as sub-classes of first-order Markov chains. These lumped processes enable parsimonious reparameterizations of Markov chains that help to reveal relevant partitions of the state space. Characterizations of the lumped processes on the original transition probability matrix are derived. Different model selection methods relying either on hypothesis testing or on penalized log-likelihood criteria are presented as well as extensions to lumped processes constructed from high-order Markov chains. The relevance of the proposed approach to lumpability is illustrated by the analysis of DNA sequences. In particular, the use of lumped processes enables to highlight differences between intronic sequences and gene untranslated region sequences.

  6. An integrated genetic linkage map of watermelon and genetic diversity based on single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers

    USDA-ARS?s Scientific Manuscript database

    Watermelon (Citrullus lanatus var. lanatus) is an important vegetable fruit throughout the world. A high number of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers should provide large coverage of the watermelon genome and high phylogenetic resolution of germplasm acces...

  7. Regions of conservation and divergence in the 3' untranslated sequences of genomic RNA from Ross River virus isolates.

    PubMed

    Faragher, S G; Dalgarno, L

    1986-07-20

    The 3' untranslated (UT) sequences of the genomic RNAs of five geographic variants of the alphavirus Ross River virus (RRV) were determined and compared with the 3' UT sequence of RRV T48, the prototype strain. Part of the 3' UT region of Getah virus, a close serological relative of RRV, was also sequenced. The RRV 3' UT region varies markedly in length between variants. Large deletions or insertions, sequence rearrangements and single nucleotide substitutions are observed. A sequence tract of 49 to 58 nucleotides, which is repeated as four blocks in the RRV T48 3' UT region, occurs only once in the 3' UT region of one RRV strain (NB5092), indicating that the existence of repeat sequence blocks is not essential for RRV replication. However, the precise sequence of the 3' proximal copy of the repeat block and its position relative to the poly(A) tail were identical in all RRV isolates examined, suggesting that it has an important role in RRV replication. Nucleotide substitutions between RRV variants are distributed non-randomly along the length of the 3' UT region. The sequence of 120 to 130 nucleotides adjacent to the poly(A) tail is strongly conserved. Getah virus RNA contains three repeat sequence blocks in the 3' UT region. These are similar in sequence to those in RRV RNA but differ in their arrangement. Homology between the RRV and Getah 3' UT sequences is greatest in the 3' proximal repeat sequence block that shows three differences in 49 nucleotides. The 3' proximal repeat in Getah RNA occurs at the same position, relative to the poly(A) tail, as in all RRV variants. The RRV and Getah virus 3' UT sequences show extensive homology in the region between the 3' proximal repeat and the poly(A) tail but, apart from the repeat blocks themselves, they show no significant homology elsewhere.

  8. Sequence-Based Prioritization of Nonsynonymous Single-Nucleotide Polymorphisms for the Study of Disease Mutations

    PubMed Central

    Jiang, Rui ; Yang, Hua ; Zhou, Linqi ; Kuo, C.-C. Jay ; Sun, Fengzhu ; Chen, Ting 

    2007-01-01

    The increasing demand for the identification of genetic variation responsible for common diseases has translated into a need for sophisticated methods for effectively prioritizing mutations occurring in disease-associated genetic regions. In this article, we prioritize candidate nonsynonymous single-nucleotide polymorphisms (nsSNPs) through a bioinformatics approach that takes advantages of a set of improved numeric features derived from protein-sequence information and a new statistical learning model called “multiple selection rule voting” (MSRV). The sequence-based features can maximize the scope of applications of our approach, and the MSRV model can capture subtle characteristics of individual mutations. Systematic validation of the approach demonstrates that this approach is capable of prioritizing causal mutations for both simple monogenic diseases and complex polygenic diseases. Further studies of familial Alzheimer diseases and diabetes show that the approach can enrich mutations underlying these polygenic diseases among the top of candidate mutations. Application of this approach to unclassified mutations suggests that there are 10 suspicious mutations likely to cause diseases, and there is strong support for this in the literature. PMID:17668383

  9. Ab initio electron propagator calculations of transverse conduction through DNA nucleotide bases in 1-nm nanopore corroborate third generation sequencing.

    PubMed

    Kletsov, Aleksey A; Glukhovskoy, Evgeny G; Chumakov, Aleksey S; Ortiz, Joseph V

    2016-01-01

    The conduction properties of DNA molecule, particularly its transverse conductance (electron transfer through nucleotide bridges), represent a point of interest for DNA chemistry community, especially for DNA sequencing. However, there is no fully developed first-principles theory for molecular conductance and current that allows one to analyze the transverse flow of electrical charge through a nucleotide base. We theoretically investigate the transverse electron transport through all four DNA nucleotide bases by implementing an unbiased ab initio theoretical approach, namely, the electron propagator theory. The electrical conductance and current through DNA nucleobases (guanine [G], cytosine [C], adenine [A] and thymine [T]) inserted into a model 1-nm Ag-Ag nanogap are calculated. The magnitudes of the calculated conductance and current are ordered in the following hierarchies: gA>gG>gC>gT and IG>IA>IT>IC correspondingly. The new distinguishing parameter for the nucleobase identification is proposed, namely, the onset bias magnitude. Nucleobases exhibit the following hierarchy with respect to this parameter: Vonset(A)sequencing techniques as well as in the field of DNA chemistry. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. Nucleotide sequences, genetic organization, and distribution of pEU30 and pEL60 from Erwinia amylovora.

    PubMed

    Foster, Gayle C; McGhee, Gayle C; Jones, Alan L; Sundin, George W

    2004-12-01

    The nucleotide sequences, genetic organization, and distribution of plasmids pEU30 (30,314 bp) and pEL60 (60,145 bp) from the plant pathogen Erwinia amylovora are described. The newly characterized pEU30 and pEL60 plasmids inhabited strains isolated in the western United States and Lebanon, respectively. The gene content of pEU30 resembled plasmids found in plant-associated bacteria, while that of pEL60 was most similar to IncL/M plasmids inhabiting enteric bacteria.

  11. Whole Genome Sequences of Three Treponema pallidum ssp. pertenue Strains: Yaws and Syphilis Treponemes Differ in Less than 0.2% of the Genome Sequence

    PubMed Central

    Chen, Lei; Pospíšilová, Petra; Strouhal, Michal; Qin, Xiang; Mikalová, Lenka; Norris, Steven J.; Muzny, Donna M.; Gibbs, Richard A.; Fulton, Lucinda L.; Sodergren, Erica; Weinstock, George M.; Šmajs, David

    2012-01-01

    Background The yaws treponemes, Treponema pallidum ssp. pertenue (TPE) strains, are closely related to syphilis causing strains of Treponema pallidum ssp. pallidum (TPA). Both yaws and syphilis are distinguished on the basis of epidemiological characteristics, clinical symptoms, and several genetic signatures of the corresponding causative agents. Methodology/Principal Findings To precisely define genetic differences between TPA and TPE, high-quality whole genome sequences of three TPE strains (Samoa D, CDC-2, Gauthier) were determined using next-generation sequencing techniques. TPE genome sequences were compared to four genomes of TPA strains (Nichols, DAL-1, SS14, Chicago). The genome structure was identical in all three TPE strains with similar length ranging between 1,139,330 bp and 1,139,744 bp. No major genome rearrangements were found when compared to the four TPA genomes. The whole genome nucleotide divergence (dA) between TPA and TPE subspecies was 4.7 and 4.8 times higher than the observed nucleotide diversity (π) among TPA and TPE strains, respectively, corresponding to 99.8% identity between TPA and TPE genomes. A set of 97 (9.9%) TPE genes encoded proteins containing two or more amino acid replacements or other major sequence changes. The TPE divergent genes were mostly from the group encoding potential virulence factors and genes encoding proteins with unknown function. Conclusions/Significance Hypothetical genes, with genetic differences, consistently found between TPE and TPA strains are candidates for syphilitic treponemes virulence factors. Seventeen TPE genes were predicted under positive selection, and eleven of them coded either for predicted exported proteins or membrane proteins suggesting their possible association with the cell surface. Sequence changes between TPE and TPA strains and changes specific to individual strains represent suitable targets for subspecies- and strain-specific molecular diagnostics. PMID:22292095

  12. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  13. Molecular Properties of Poliovirus Isolates: Nucleotide Sequence Analysis, Typing by PCR and Real-Time RT-PCR.

    PubMed

    Burns, Cara C; Kilpatrick, David R; Iber, Jane C; Chen, Qi; Kew, Olen M

    2016-01-01

    Virologic surveillance is essential to the success of the World Health Organization initiative to eradicate poliomyelitis. Molecular methods have been used to detect polioviruses in tissue culture isolates derived from stool samples obtained through surveillance for acute flaccid paralysis. This chapter describes the use of realtime PCR assays to identify and serotype polioviruses. In particular, a degenerate, inosine-containing, panpoliovirus (panPV) PCR primer set is used to distinguish polioviruses from NPEVs. The high degree of nucleotide sequence diversity among polioviruses presents a challenge to the systematic design of nucleic acid-based reagents. To accommodate the wide variability and rapid evolution of poliovirus genomes, degenerate codon positions on the template were matched to mixed-base or deoxyinosine residues on both the primers and the TaqMan™ probes. Additional assays distinguish between Sabin vaccine strains and non-Sabin strains. This chapter also describes the use of generic poliovirus specific primers, along with degenerate and inosine-containing primers, for routine VP1 sequencing of poliovirus isolates. These primers, along with nondegenerate serotype-specific Sabin primers, can also be used to sequence individual polioviruses in mixtures.

  14. Deep Sequencing Analysis of Apple Infecting Viruses in Korea

    PubMed Central

    Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun

    2016-01-01

    Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694

  15. Molecular cloning and sequencing analysis of the interferon receptor (IFNAR-1) from Columba livia.

    PubMed

    Li, Chao; Chang, Wei Shan

    2014-01-01

    Partial sequence cloning of interferon receptor (IFNAR-1) of Columba livia. In order to obtain a certain length (630 bp) of gene, a pair of primers was designed according to the conserved nucleotide sequence of Gallus (EU477527.1) and Taeniopygia guttata (XM_002189232.1) IFNAR-1 gene fragment that was published by GenBank. Special primers were designed by the Race method to amplify the 3'terminal cDNA. The Columba livia IFNAR-1 displayed 88.5%, 80.5% and 73.8% nucleotide identity to Falco peregrinus, Gallus and Taeniopygia guttata, respectively. Phylogenetic analysis of the IFNAR1 gene showed that the relationship of Columba livia, Falco peregrinus and chicken had high homology. We successfully obtained a Columba livia IFNAR-1 gene partial sequence. Analysis of the genetic tree showed that the relationship of Columba livia and Falco peregrinus IFNAR-1 had high homology. This result can be used as reference for further research and practical application.

  16. Molecular cloning and sequencing analysis of the interferon receptor (IFNAR-1) from Columba livia

    PubMed Central

    Chang, Wei Shan

    2014-01-01

    Objective Partial sequence cloning of interferon receptor (IFNAR-1) of Columba livia. Material and methods In order to obtain a certain length (630 bp) of gene, a pair of primers was designed according to the conserved nucleotide sequence of Gallus (EU477527.1) and Taeniopygia guttata (XM_002189232.1) IFNAR-1 gene fragment that was published by GenBank. Special primers were designed by the Race method to amplify the 3'terminal cDNA. Results The Columba livia IFNAR-1 displayed 88.5%, 80.5% and 73.8% nucleotide identity to Falco peregrinus, Gallus and Taeniopygia guttata, respectively. Phylogenetic analysis of the IFNAR1 gene showed that the relationship of Columba livia, Falco peregrinus and chicken had high homology. Conclusions We successfully obtained a Columba livia IFNAR-1 gene partial sequence. Analysis of the genetic tree showed that the relationship of Columba livia and Falco peregrinus IFNAR-1 had high homology. This result can be used as reference for further research and practical application. PMID:26155117

  17. Probing genomic diversity and evolution of Escherichia coli O157 by single nucleotide polymorphisms.

    PubMed

    Zhang, Wei; Qi, Weihong; Albert, Thomas J; Motiwala, Alifiya S; Alland, David; Hyytia-Trees, Eija K; Ribot, Efrain M; Fields, Patricia I; Whittam, Thomas S; Swaminathan, Bala

    2006-06-01

    Infections by Shiga toxin-producing Escherichia coli O157:H7 (STEC O157) are the predominant cause of bloody diarrhea and hemolytic uremic syndrome in the United States. In silico comparison of the two complete STEC O157 genomes (Sakai and EDL933) revealed a strikingly high level of sequence identity in orthologous protein-coding genes, limiting the use of nucleotide sequences to study the evolution and epidemiology of this bacterial pathogen. To systematically examine single nucleotide polymorphisms (SNPs) at a genome scale, we designed comparative genome sequencing microarrays and analyzed 1199 chromosomal genes (a total of 1,167,948 bp) and 92,721 bp of the large virulence plasmid (pO157) of eleven outbreak-associated STEC O157 strains. We discovered 906 SNPs in 523 chromosomal genes and observed a high level of DNA polymorphisms among the pO157 plasmids. Based on a uniform rate of synonymous substitution for Escherichia coli and Salmonella enterica (4.7x10(-9) per site per year), we estimate that the most recent common ancestor of the contemporary beta-glucuronidase-negative, non-sorbitolfermenting STEC O157 strains existed ca. 40 thousand years ago. The phylogeny of the STEC O157 strains based on the informative synonymous SNPs was compared to the maximum parsimony trees inferred from pulsed-field gel electrophoresis and multilocus variable numbers of tandem repeats analysis. The topological discrepancies indicate that, in contrast to the synonymous mutations, parts of STEC O157 genomes have evolved through different mechanisms with highly variable divergence rates. The SNP loci reported here will provide useful genetic markers for developing high-throughput methods for fine-resolution genotyping of STEC O157. Functional characterization of nucleotide polymorphisms should shed new insights on the evolution, epidemiology, and pathogenesis of STEC O157 and related pathogens.

  18. Probing genomic diversity and evolution of Escherichia coli O157 by single nucleotide polymorphisms

    PubMed Central

    Zhang, Wei; Qi, Weihong; Albert, Thomas J.; Motiwala, Alifiya S.; Alland, David; Hyytia-Trees, Eija K.; Ribot, Efrain M.; Fields, Patricia I.; Whittam, Thomas S.; Swaminathan, Bala

    2006-01-01

    Infections by Shiga toxin-producing Escherichia coli O157:H7 (STEC O157) are the predominant cause of bloody diarrhea and hemolytic uremic syndrome in the United States. In silico comparison of the two complete STEC O157 genomes (Sakai and EDL933) revealed a strikingly high level of sequence identity in orthologous protein-coding genes, limiting the use of nucleotide sequences to study the evolution and epidemiology of this bacterial pathogen. To systematically examine single nucleotide polymorphisms (SNPs) at a genome scale, we designed comparative genome sequencing microarrays and analyzed 1199 chromosomal genes (a total of 1,167,948 bp) and 92,721 bp of the large virulence plasmid (pO157) of eleven outbreak-associated STEC O157 strains. We discovered 906 SNPs in 523 chromosomal genes and observed a high level of DNA polymorphisms among the pO157 plasmids. Based on a uniform rate of synonymous substitution for Escherichia coli and Salmonella enterica (4.7 × 10−9 per site per year), we estimate that the most recent common ancestor of the contemporary β-glucuronidase-negative, non-sorbitolfermenting STEC O157 strains existed ca. 40 thousand years ago. The phylogeny of the STEC O157 strains based on the informative synonymous SNPs was compared to the maximum parsimony trees inferred from pulsed-field gel electrophoresis and multilocus variable numbers of tandem repeats analysis. The topological discrepancies indicate that, in contrast to the synonymous mutations, parts of STEC O157 genomes have evolved through different mechanisms with highly variable divergence rates. The SNP loci reported here will provide useful genetic markers for developing high-throughput methods for fine-resolution genotyping of STEC O157. Functional characterization of nucleotide polymorphisms should shed new insights on the evolution, epidemiology, and pathogenesis of STEC O157 and related pathogens. PMID:16606700

  19. Sequence and features of the tryptophan operon of Vibrio parahemolyticus.

    PubMed

    Crawford, I P; Han, C Y; Silverman, M

    1991-01-01

    The nucleotide sequence of the trp operon of the marine enteric bacterium Vibrio parahemolyticus is presented. The gene order E, G, D, C(F), B, A is identical to that of other enterics. The structural genes of the operon are preceded by a long leader region encoding a 41-residue peptide containing five tryptophan residues. The organization of the leader region suggests that transcription of the operon is subject to attenuation control. The promoter-operator region of the V. parahemolyticus trp operon is almost identical to the corresponding promoter-operator of E. coli. The similarities suggest that promoter strength and operator function are identical in the two species, and that transcription initiation is regulated by repression. The operon appears to lack the internal promoter within trpD that is common in terrestrial enteric species.

  20. Molecular cloning and nucleotide sequences of the genes for two essential proteins constituting a novel enzyme system for heptaprenyl diphosphate synthesis.

    PubMed

    Koike-Takeshita, A; Koyama, T; Obata, S; Ogura, K

    1995-08-04

    The genes encoding two dissociable components essential for Bacillus stearothermophilus heptaprenyl diphosphate synthase (all-trans-hexparenyl-diphosphate:isopentenyl-diphosphate hexaprenyl-trans-transferase, EC 2.5.1.30) were cloned, and their nucleotide sequences were determined. Sequence analyses revealed the presence of three open reading frames within 2,350 base pairs, designated as ORF-1, ORF-2, and ORF-3 in order of nucleotide sequence, which encode proteins of 220, 234, and 323 amino acids, respectively. Deletion experiments have shown that expression of the enzymatic activity requires the presence of ORF-1 and ORF-3, but ORF-2 is not essential. As a result, this enzyme was proved genetically to consist of two different protein compounds with molecular masses of 25 kDa (Component I) and 36 kDa (Component II), encoded by two of the three tandem genes. The protein encoded by ORF-1 has no similarity to any protein so far registered. However, the protein encoded by ORF-3 shows a 32% similarity to the farnesyl diphosphate synthase of the same bacterium and has seven highly conserved regions that have been shown typical in prenyltransferases (Koyama, T., Obata, S., Osabe, M., Takeshita, A., Yokoyama, K., Uchida, M., Nishino, T., and Ogura, K. (1993) J. Biochem. (Tokyo) 113, 355-363).

  1. The nucleotide sequence of a major glycine transfer RNA from the posterior silk gland of Bombyx mori L.

    PubMed Central

    Zúñiga, M C; Steitz, J A

    1977-01-01

    The nucleotide sequence of tRNA1Gly isolated from the posterior silk gland of Bombyx mori has been determined. This transfer RNA is present in high amounts in the posterior silk gland during the fifth larval instar. It has a GCC anticodon, capable of decoding a major glycine codon in the fibroin messenger RNA, GGU. Structural features of Bombyx tRNA1Gly and its homology to other eukaryotic glycine tRNAs are discussed. Images PMID:414206

  2. Identification of Critical Residues for the Tight Binding of Both Correct and Incorrect Nucleotides to Human DNA Polymerase λ

    PubMed Central

    Brown, Jessica A.; Pack, Lindsey R.; Sherrer, Shanen M.; Kshetry, Ajay K.; Newmister, Sean A.; Fowler, Jason D.; Taylor, John-Stephen; Suo, Zucai

    2010-01-01

    DNA polymerase λ (Pol λ) is a novel X-family DNA polymerase that shares 34% sequence identity with DNA polymerase β (Pol β). Pre-steady state kinetic studies have shown that the Pol λ•DNA complex binds both correct and incorrect nucleotides 130-fold tighter on average than the Pol β•DNA complex, although, the base substitution fidelity of both polymerases is 10−4 to 10−5. To better understand Pol λ’s tight nucleotide binding affinity, we created single- and double-substitution mutants of Pol λ to disrupt interactions between active site residues and an incoming nucleotide or a template base. Single-turnover kinetic assays showed that Pol λ binds to an incoming nucleotide via cooperative interactions with active site residues (R386, R420, K422, Y505, F506, A510, and R514). Disrupting protein interactions with an incoming correct or incorrect nucleotide impacted binding with each of the common structural moieties in the following order: triphosphate ≫ base > ribose. In addition, the loss of Watson-Crick hydrogen bonding between the nucleotide and template base led to a moderate increase in the Kd. The fidelity of Pol λ was maintained predominantly by a single residue, R517, which has minor groove interactions with the DNA template. PMID:20851705

  3. JNSViewer—A JavaScript-based Nucleotide Sequence Viewer for DNA/RNA secondary structures

    PubMed Central

    Dong, Min; Graham, Mitchell; Yadav, Nehul

    2017-01-01

    Many tools are available for visualizing RNA or DNA secondary structures, but there is scarce implementation in JavaScript that provides seamless integration with the increasingly popular web computational platforms. We have developed JNSViewer, a highly interactive web service, which is bundled with several popular tools for DNA/RNA secondary structure prediction and can provide precise and interactive correspondence among nucleotides, dot-bracket data, secondary structure graphs, and genic annotations. In JNSViewer, users can perform RNA secondary structure predictions with different programs and settings, add customized genic annotations in GFF format to structure graphs, search for specific linear motifs, and extract relevant structure graphs of sub-sequences. JNSViewer also allows users to choose a transcript or specific segment of Arabidopsis thaliana genome sequences and predict the corresponding secondary structure. Popular genome browsers (i.e., JBrowse and BrowserGenome) were integrated into JNSViewer to provide powerful visualizations of chromosomal locations, genic annotations, and secondary structures. In addition, we used StructureFold with default settings to predict some RNA structures for Arabidopsis by incorporating in vivo high-throughput RNA structure profiling data and stored the results in our web server, which might be a useful resource for RNA secondary structure studies in plants. JNSViewer is available at http://bioinfolab.miamioh.edu/jnsviewer/index.html. PMID:28582416

  4. The Complete Nucleotide Sequence of the Human Immunoglobulin Heavy Chain Variable Region Locus

    PubMed Central

    Matsuda, Fumihiko; Ishii, Kazuo; Bourvagnet, Patrice; Kuma, Kei-ichi; Hayashida, Hidenori; Miyata, Takashi; Honjo, Tasuku

    1998-01-01

    The complete nucleotide sequence of the 957-kb DNA of the human immunoglobulin heavy chain variable (VH) region locus was determined and 43 novel VH segments were identified. The region contains 123 VH segments classifiable into seven different families, of which 79 are pseudogenes. Of the 44 VH segments with an open reading frame, 39 are expressed as heavy chain proteins and 1 as mRNA, while the remaining 4 are not found in immunoglobulin cDNAs. Combinatorial diversity of VH region was calculated to be ∼6,000. Conservation of the promoter and recombination signal sequences was observed to be higher in functional VH segments than in pseudogenes. Phylogenetic analysis of 114 VH segments clearly showed clustering of the VH segments of each family. However, an independent branch in the tree contained a single VH, V4-44.1P, sharing similar levels of homology to human VH families and to those of other vertebrates. Comparison between different copies of homologous units that appear repeatedly across the locus clearly demonstrates that dynamic DNA reorganization of the locus took place at least eight times between 133 and 10 million years ago. One nonimmunoglobulin gene of unknown function was identified in the intergenic region. PMID:9841928

  5. High-throughput nucleotide sequence analysis of diverse bacterial communities in leachates of decomposing pig carcasses

    PubMed Central

    Yang, Seung Hak; Lim, Joung Soo; Khan, Modabber Ahmed; Kim, Bong Soo; Choi, Dong Yoon; Lee, Eun Young; Ahn, Hee Kwon

    2015-01-01

    The leachate generated by the decomposition of animal carcass has been implicated as an environmental contaminant surrounding the burial site. High-throughput nucleotide sequencing was conducted to investigate the bacterial communities in leachates from the decomposition of pig carcasses. We acquired 51,230 reads from six different samples (1, 2, 3, 4, 6 and 14 week-old carcasses) and found that sequences representing the phylum Firmicutes predominated. The diversity of bacterial 16S rRNA gene sequences in the leachate was the highest at 6 weeks, in contrast to those at 2 and 14 weeks. The relative abundance of Firmicutes was reduced, while the proportion of Bacteroidetes and Proteobacteria increased from 3–6 weeks. The representation of phyla was restored after 14 weeks. However, the community structures between the samples taken at 1–2 and 14 weeks differed at the bacterial classification level. The trend in pH was similar to the changes seen in bacterial communities, indicating that the pH of the leachate could be related to the shift in the microbial community. The results indicate that the composition of bacterial communities in leachates of decomposing pig carcasses shifted continuously during the study period and might be influenced by the burial site. PMID:26500442

  6. Analysis of the genome sequence of the pathogenic Muscovy duck parvovirus strain YY reveals a 14-nucleotide-pair deletion in the inverted terminal repeats.

    PubMed

    Wang, Jianye; Huang, Yu; Zhou, Mingxu; Zhu, Guoqiang

    2016-09-01

    Genomic information about Muscovy duck parvovirus is still limited. In this study, the genome of the pathogenic MDPV strain YY was sequenced. The full-length genome of YY is 5075 nucleotides (nt) long, 57 nt shorter than that of strain FM. Sequence alignment indicates that the 5' and 3' inverted terminal repeats (ITR) of strain YY contain a 14-nucleotide-pair deletion in the stem of the palindromic hairpin structure in comparison to strain FM and FZ91-30. The deleted region contains one "E-box" site and one repeated motif with the sequence "TTCCGGT" or "ACCGGAA". Phylogenetic trees constructed based the protein coding genes concordantly showed that YY, together with nine other MDPV isolates from various places, clustered in a separate branch, distinct from the branch formed by goose parvovirus (GPV) strains. These results demonstrate that, despite the distinctive deletion, the YY strain still belongs to the classical MDPV group. Moreover, the deletion of ITR may contribute to the genome evolution of MDPV under immunization pressure.

  7. Evolutionary relationships in the ilarviruses: nucleotide sequence of prunus necrotic ringspot virus RNA 3.

    PubMed

    Sánchez-Navarro, J A; Pallás, V

    1997-01-01

    The complete nucleotide sequence of an isolate of prunus necrotic ringspot virus (PNRSV) RNA 3 has been determined. Elucidation of the amino acid sequence of the proteins encoded by the two large open reading frames (ORFs) allowed us to carry out comparative and phylogenetic studies on the movement (MP) and coat (CP) proteins in the ilarvirus group. Amino acid sequence comparison of the MP revealed a highly conserved basic sequence motif with an amphipathic alpha-helical structure preceding the conserved motif of the '30K superfamily' proposed by Mushegian and Koonin [26] for MP's. Within this '30K' motif a strictly conserved transmembrane domain is present in all ilarviruses sequenced so far. At the amino-terminal end, prune dwarf virus (PDV) has an extension not present in other ilarviruses but which is observed in all bromo- and cucumoviruses, suggesting a common ancestor or a recombinational event in the Bromoviridae family. Examination of the N-terminus of the CP's of all ilarviruses revealed a highly basic region, part of which resembles the Arg-rich motif that has been characterized in the RNA-binding protein family. This motif has also been found in the other members of the Bromoviridae family, suggesting its involvement in a structural function. Furthermore this region is required for infectivity in ilarviruses. The similarities found in this Arg-rich motif are discussed in terms of this process known as genome activation. Finally, phylogenetic analysis of both the MP and CP proteins revealed a higher relationship of A1MV to PNRSV, apple mosaic virus (ApMV) and PDV than any other member of the ilarvirus group. In that sense, A1MV should be considered as a true ilarvirus instead of forming a distinct group of viruses.

  8. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  9. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  10. Molecular characterization of a novel luteovirus infecting apple by next-generation sequencing.

    PubMed

    Shen, Pan; Tian, Xin; Zhang, Song; Ren, Fang; Li, Ping; Yu, Yun-Qi; Li, Ruhui; Zhou, Changyong; Cao, Mengji

    2018-03-01

    A new single-stranded positive-sense RNA virus, which shares the highest nucleotide (nt) sequence identity of 53.4% with the genome sequence of cherry-associated luteovirus South Korean isolate (ChALV-SK, genus Luteovirus), was discovered in this work. It is provisionally named apple-associated luteovirus (AaLV). The complete genome sequence of AaLV comprises 5,890 nt and contains eight open reading frames (ORFs), in a very similar arrangement that is typical of members of the genus Luteovirus. When compared with other members of the family Luteoviridae, ORF1 of AaLV was found to encompass another ORF, ORF1a, which encodes a putative 32.9-kDa protein. The ORF1-ORF2 region (RNA-dependent RNA polymerase, RdRP) showed the greatest amino acid (aa) sequence identity (59.7%) to that of cherry-associated luteovirus Czech Republic isolate (ChALV-CZ, genus Luteovirus). The results of genome sequence comparisons and phylogenetic analysis, suggest that AaLV should be a member of a novel species in the genus Luteovirus. To our knowledge, it is the sixth member of the genus Luteovirus reported to naturally infect rosaceous plants.

  11. Integrating multiple genomic data to predict disease-causing nonsynonymous single nucleotide variants in exome sequencing studies.

    PubMed

    Wu, Jiaxin; Li, Yanda; Jiang, Rui

    2014-03-01

    Exome sequencing has been widely used in detecting pathogenic nonsynonymous single nucleotide variants (SNVs) for human inherited diseases. However, traditional statistical genetics methods are ineffective in analyzing exome sequencing data, due to such facts as the large number of sequenced variants, the presence of non-negligible fraction of pathogenic rare variants or de novo mutations, and the limited size of affected and normal populations. Indeed, prevalent applications of exome sequencing have been appealing for an effective computational method for identifying causative nonsynonymous SNVs from a large number of sequenced variants. Here, we propose a bioinformatics approach called SPRING (Snv PRioritization via the INtegration of Genomic data) for identifying pathogenic nonsynonymous SNVs for a given query disease. Based on six functional effect scores calculated by existing methods (SIFT, PolyPhen2, LRT, MutationTaster, GERP and PhyloP) and five association scores derived from a variety of genomic data sources (gene ontology, protein-protein interactions, protein sequences, protein domain annotations and gene pathway annotations), SPRING calculates the statistical significance that an SNV is causative for a query disease and hence provides a means of prioritizing candidate SNVs. With a series of comprehensive validation experiments, we demonstrate that SPRING is valid for diseases whose genetic bases are either partly known or completely unknown and effective for diseases with a variety of inheritance styles. In applications of our method to real exome sequencing data sets, we show the capability of SPRING in detecting causative de novo mutations for autism, epileptic encephalopathies and intellectual disability. We further provide an online service, the standalone software and genome-wide predictions of causative SNVs for 5,080 diseases at http://bioinfo.au.tsinghua.edu.cn/spring.

  12. Enabling multiplexed testing of pooled donor cells through whole-genome sequencing.

    PubMed

    Chan, Yingleong; Chan, Ying Kai; Goodman, Daniel B; Guo, Xiaoge; Chavez, Alejandro; Lim, Elaine T; Church, George M

    2018-04-19

    We describe a method that enables the multiplex screening of a pool of many different donor cell lines. Our method accurately predicts each donor proportion from the pool without requiring the use of unique DNA barcodes as markers of donor identity. Instead, we take advantage of common single nucleotide polymorphisms, whole-genome sequencing, and an algorithm to calculate the proportions from the sequencing data. By testing using simulated and real data, we showed that our method robustly predicts the individual proportions from a mixed-pool of numerous donors, thus enabling the multiplexed testing of diverse donor cells en masse.More information is available at https://pgpresearch.med.harvard.edu/poolseq/.

  13. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  14. Controllability of Deterministic Networks with the Identical Degree Sequence

    PubMed Central

    Ma, Xiujuan; Zhao, Haixing; Wang, Binghong

    2015-01-01

    Controlling complex network is an essential problem in network science and engineering. Recent advances indicate that the controllability of complex network is dependent on the network's topology. Liu and Barabási, et.al speculated that the degree distribution was one of the most important factors affecting controllability for arbitrary complex directed network with random link weights. In this paper, we analysed the effect of degree distribution to the controllability for the deterministic networks with unweighted and undirected. We introduce a class of deterministic networks with identical degree sequence, called (x,y)-flower. We analysed controllability of the two deterministic networks ((1, 3)-flower and (2, 2)-flower) by exact controllability theory in detail and give accurate results of the minimum number of driver nodes for the two networks. In simulation, we compare the controllability of (x,y)-flower networks. Our results show that the family of (x,y)-flower networks have the same degree sequence, but their controllability is totally different. So the degree distribution itself is not sufficient to characterize the controllability of deterministic networks with unweighted and undirected. PMID:26020920

  15. Mosaic organization of DNA nucleotides

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Havlin, S.; Simons, M.; Stanley, H. E.; Goldberger, A. L.

    1994-01-01

    Long-range power-law correlations have been reported recently for DNA sequences containing noncoding regions. We address the question of whether such correlations may be a trivial consequence of the known mosaic structure ("patchiness") of DNA. We analyze two classes of controls consisting of patchy nucleotide sequences generated by different algorithms--one without and one with long-range power-law correlations. Although both types of sequences are highly heterogenous, they are quantitatively distinguishable by an alternative fluctuation analysis method that differentiates local patchiness from long-range correlations. Application of this analysis to selected DNA sequences demonstrates that patchiness is not sufficient to account for long-range correlation properties.

  16. Sequence identity and antigenic cross-reactivity of white face hornet venom allergen, also a hyaluronidase, with other proteins.

    PubMed

    Lu, G; Kochoumian, L; King, T P

    1995-03-03

    White face hornet (Dolichovespula maculata) venom has three known protein allergens which induce IgE response in susceptible people. They are antigen 5, phospholipase A1, and hyaluronidase, also known as Dol m 5, 1, and 2, respectively. We have cloned Dol m 2, a protein of 331 residues. When expressed in bacteria, a mixture of recombinant Dol m 2 and its fragments was obtained. The fragments were apparently generated by proteolysis of a Met-Met bond at residue 122, as they were not observed for a Dol m 2 mutant with a Leu-Met bond. Dol m 2 has 56% sequence identity with the honey bee venom allergen hyaluronidase and 27% identity with PH-20, a human sperm protein with hyaluronidase activity. A common feature of hornet venom allergens is their sequence identity with other proteins in our environment. We showed previously the sequence identity of Dol m 5 with a plant protein and a mammalian testis protein and of Dol m 1 with mammalian lipases. In BALB/c mice, Dol m 2 and bee hyaluronidase showed cross-reactivity at both antibody and T cell levels. These findings are relevant to some patients' multiple sensitivity to hornet and bee stings.

  17. Nucleotide sequence variation at two genes of the phenylpropanoid pathway, the FAH1 and F3H genes, in Arabidopsis thaliana.

    PubMed

    Aguadé, M

    2001-01-01

    The FAH1 and F3H genes encode ferulate-5-hydroxylase and flavanone-3-hydroxylase, which are enzymes in the pathways leading to the synthesis of sinapic acid esters and flavonoids, respectively. Nucleotide variation at these genes was surveyed by sequencing a sample of 20 worldwide Arabidopsis thaliana ecotypes and one Arabidopsis lyrata spp. petraea stock. In contrast with most previously studied genes, the percentage of singletons was rather low in both the FAH1 and the F3H gene regions. There was, therefore, no footprint of a recent species expansion in the pattern of nucleotide variation in these regions. In both FAH1 and F3H, nucleotide variation was structured into two major highly differentiated haplotypes. In both genes, there was a peak of silent polymorphism in the 5' part of the coding region without a parallel increase in silent divergence. In FAH1, the peak was centered at the beginning of the second exon. In F3H, nucleotide diversity was highest at the beginning of the gene. The observed pattern of variation in both FAH1 and F3H, although suggestive of balancing selection, was compatible with a neutral model with no recombination.

  18. Cloning and sequence analysis of the invertase gene INV 1 from the yeast Pichia anomala.

    PubMed

    Pérez, J A; Rodríguez, J; Rodríguez, L; Ruiz, T

    1996-02-01

    A genomic library from the yeast Pichia anomala has been constructed and employed to clone the gene encoding the sucrose-hydrolysing enzyme invertase by complementation of a sucrose non-fermenting mutant of Saccharomyces cerevisiae. The cloned gene, INV1, was sequenced and found to encode a polypeptide of 550 amino acids which contained a 22 amino-acid signal sequence and ten potential glycosylation sites. The amino-acid sequence shows significant identity with other yeast invertases and also with Kluyveromyces marxianus inulinase, a yeast beta-fructofuranosidase which has a different substrate specificity. The nucleotide sequences of the 5' and 3' non-coding regions were found to contain several consensus motifs probably involved in the initiation and termination of gene transcription.

  19. Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

    PubMed

    Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

    2014-09-18

    Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.

  20. Identification of mitochondrial DNA sequence variation and development of single nucleotide polymorphic markers for CMS-D8 in cotton.

    PubMed

    Suzuki, Hideaki; Yu, Jiwen; Wang, Fei; Zhang, Jinfa

    2013-06-01

    Cytoplasmic male sterility (CMS), which is a maternally inherited trait and controlled by novel chimeric genes in the mitochondrial genome, plays a pivotal role in the production of hybrid seed. In cotton, no PCR-based marker has been developed to discriminate CMS-D8 (from Gossypium trilobum) from its normal Upland cotton (AD1, Gossypium hirsutum) cytoplasm. The objective of the current study was to develop PCR-based single nucleotide polymorphic (SNP) markers from mitochondrial genes for the CMS-D8 cytoplasm. DNA sequence variation in mitochondrial genes involved in the oxidative phosphorylation chain including ATP synthase subunit 1, 4, 6, 8 and 9, and cytochrome c oxidase 1, 2 and 3 subunits were identified by comparing CMS-D8, its isogenic maintainer and restorer lines on the same nuclear genetic background. An allelic specific PCR (AS-PCR) was utilized for SNP typing by incorporating artificial mismatched nucleotides into the third or fourth base from the 3' terminus in both the specific and nonspecific primers. The result indicated that the method modifying allele-specific primers was successful in obtaining eight SNP markers out of eight SNPs using eight primer pairs to discriminate two alleles between AD1 and CMS-D8 cytoplasms. Two of the SNPs for atp1 and cox1 could also be used in combination to discriminate between CMS-D8 and CMS-D2 cytoplasms. Additionally, a PCR-based marker from a nine nucleotide insertion-deletion (InDel) sequence (AATTGTTTT) at the 59-67 bp positions from the start codon of atp6, which is present in the CMS and restorer lines with the D8 cytoplasm but absent in the maintainer line with the AD1 cytoplasm, was also developed. A SNP marker for two nucleotide substitutions (AA in AD1 cytoplasm to CT in CMS-D8 cytoplasm) in the intron (1,506 bp) of cox2 gene was also developed. These PCR-based SNP markers should be useful in discriminating CMS-D8 and AD1 cytoplasms, or those with CMS-D2 cytoplasm as a rapid, simple, inexpensive, and

  1. Choice of Reference Sequence and Assembler for Alignment of Listeria monocytogenes Short-Read Sequence Data Greatly Influences Rates of Error in SNP Analyses

    PubMed Central

    Pightling, Arthur W.; Petronella, Nicholas; Pagotto, Franco

    2014-01-01

    The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should

  2. The nucleotide sequence of RNA1 of Lettuce big-vein virus, genus Varicosavirus, reveals its relation to nonsegmented negative-strand RNA viruses.

    PubMed

    Sasaya, Takahide; Ishikawa, Koichi; Koganezawa, Hiroki

    2002-06-05

    The complete nucleotide sequence of RNA1 from Lettuce big-vein virus (LBVV), the type member of the genus Varicosavirus, was determined. LBVV RNA1 consists of 6797 nucleotides and contains one large ORF that encodes a large (L) protein of 2040 amino acids with a predicted M(r) of 232,092. Northern blot hybridization analysis indicated that the LBVV RNA1 is a negative-sense RNA. Database searches showed that the amino acid sequence of L protein is homologous to those of L polymerases of nonsegmented negative-strand RNA viruses. A cluster dendrogram derived from alignments of the LBVV L protein and the L polymerases indicated that the L protein is most closely related to the L polymerases of plant rhabdoviruses. Transcription termination/polyadenylation signal-like poly(U) tracts that resemble those in rhabdovirus and paramyxovirus RNAs were present upstream and downstream of the coding region. Although LBVV is related to rhabdoviruses, a key distinguishing feature is that the genome of LBVV is segmented. The results reemphasize the need to reconsider the taxonomic position of varicosaviruses.

  3. Nucleotide Interdependency in Transcription Factor Binding Sites in the Drosophila Genome.

    PubMed

    Dresch, Jacqueline M; Zellers, Rowan G; Bork, Daniel K; Drewell, Robert A

    2016-01-01

    A long-standing objective in modern biology is to characterize the molecular components that drive the development of an organism. At the heart of eukaryotic development lies gene regulation. On the molecular level, much of the research in this field has focused on the binding of transcription factors (TFs) to regulatory regions in the genome known as cis-regulatory modules (CRMs). However, relatively little is known about the sequence-specific binding preferences of many TFs, especially with respect to the possible interdependencies between the nucleotides that make up binding sites. A particular limitation of many existing algorithms that aim to predict binding site sequences is that they do not allow for dependencies between nonadjacent nucleotides. In this study, we use a recently developed computational algorithm, MARZ, to compare binding site sequences using 32 distinct models in a systematic and unbiased approach to explore nucleotide dependencies within binding sites for 15 distinct TFs known to be critical to Drosophila development. Our results indicate that many of these proteins have varying levels of nucleotide interdependencies within their DNA recognition sequences, and that, in some cases, models that account for these dependencies greatly outperform traditional models that are used to predict binding sites. We also directly compare the ability of different models to identify the known KRUPPEL TF binding sites in CRMs and demonstrate that a more complex model that accounts for nucleotide interdependencies performs better when compared with simple models. This ability to identify TFs with critical nucleotide interdependencies in their binding sites will lead to a deeper understanding of how these molecular characteristics contribute to the architecture of CRMs and the precise regulation of transcription during organismal development.

  4. Nucleotide Interdependency in Transcription Factor Binding Sites in the Drosophila Genome

    PubMed Central

    Dresch, Jacqueline M.; Zellers, Rowan G.; Bork, Daniel K.; Drewell, Robert A.

    2016-01-01

    A long-standing objective in modern biology is to characterize the molecular components that drive the development of an organism. At the heart of eukaryotic development lies gene regulation. On the molecular level, much of the research in this field has focused on the binding of transcription factors (TFs) to regulatory regions in the genome known as cis-regulatory modules (CRMs). However, relatively little is known about the sequence-specific binding preferences of many TFs, especially with respect to the possible interdependencies between the nucleotides that make up binding sites. A particular limitation of many existing algorithms that aim to predict binding site sequences is that they do not allow for dependencies between nonadjacent nucleotides. In this study, we use a recently developed computational algorithm, MARZ, to compare binding site sequences using 32 distinct models in a systematic and unbiased approach to explore nucleotide dependencies within binding sites for 15 distinct TFs known to be critical to Drosophila development. Our results indicate that many of these proteins have varying levels of nucleotide interdependencies within their DNA recognition sequences, and that, in some cases, models that account for these dependencies greatly outperform traditional models that are used to predict binding sites. We also directly compare the ability of different models to identify the known KRUPPEL TF binding sites in CRMs and demonstrate that a more complex model that accounts for nucleotide interdependencies performs better when compared with simple models. This ability to identify TFs with critical nucleotide interdependencies in their binding sites will lead to a deeper understanding of how these molecular characteristics contribute to the architecture of CRMs and the precise regulation of transcription during organismal development. PMID:27330274

  5. Complete genome sequence analysis of novel human bocavirus reveals genetic recombination between human bocavirus 2 and human bocavirus 4.

    PubMed

    Khamrin, Pattara; Okitsu, Shoko; Ushijima, Hiroshi; Maneekarn, Niwat

    2013-07-01

    Epidemiological surveillance of human bocavirus (HBoV) was conducted on fecal specimens collected from hospitalized children with diarrhea in Chiang Mai, Thailand in 2011. By partial sequence analysis of VP1 gene, an unusual strain of HBoV (CMH-S011-11), was initially identified as HBoV4. The complete genome sequence of CMH-S011-11 was performed and analyzed further to clarify whether it was a recombinant strain or a new HBoV variant. Analysis of complete genome sequence revealed that the coding sequence starting from NS1, NP1 to VP1/VP2 was 4795 nucleotides long. Interestingly, the nucleotide sequence of NS1 gene of CMH-S011-11 was most closely related to the HBoV2 reference strains detected in Pakistan, which contradicted to the initial genotyping result of the partial VP1 region in the previous study. In addition, comparison of NP1 nucleotide sequence of CMH-S011-11 with those of other HBoV1-4 reference strains also revealed a high level of sequence identity with HBoV2. On the other hand, nucleotide sequence of VP1/VP2 gene of CMH-S011-11 was most closely related to those of HBoV4 reference strains detected in Nigeria. The overall full-length sequence analysis revealed that this CMH-S011-11 was grouped within HBoV4 species, but located in a separate branch from other HBoV4 prototype strains. Recombination analysis revealed that CMH-S011-11 was the result of recombination between HBoV2 and HBoV4 strains with the break point located near the start codon of VP2. Copyright © 2013 Elsevier B.V. All rights reserved.

  6. MIG-seq: an effective PCR-based method for genome-wide single-nucleotide polymorphism genotyping using the next-generation sequencing platform

    PubMed Central

    Suyama, Yoshihisa; Matsuki, Yu

    2015-01-01

    Restriction-enzyme (RE)-based next-generation sequencing methods have revolutionized marker-assisted genetic studies; however, the use of REs has limited their widespread adoption, especially in field samples with low-quality DNA and/or small quantities of DNA. Here, we developed a PCR-based procedure to construct reduced representation libraries without RE digestion steps, representing de novo single-nucleotide polymorphism discovery, and its genotyping using next-generation sequencing. Using multiplexed inter-simple sequence repeat (ISSR) primers, thousands of genome-wide regions were amplified effectively from a wide variety of genomes, without prior genetic information. We demonstrated: 1) Mendelian gametic segregation of the discovered variants; 2) reproducibility of genotyping by checking its applicability for individual identification; and 3) applicability in a wide variety of species by checking standard population genetic analysis. This approach, called multiplexed ISSR genotyping by sequencing, should be applicable to many marker-assisted genetic studies with a wide range of DNA qualities and quantities. PMID:26593239

  7. Complete genome sequences of two highly divergent Japanese isolates of Plantago asiatica mosaic virus.

    PubMed

    Komatsu, Ken; Yamashita, Kazuo; Sugawara, Kota; Verbeek, Martin; Fujita, Naoko; Hanada, Kaoru; Uehara-Ichiki, Tamaki; Fuji, Shin-Ichi

    2017-02-01

    Plantago asiatica mosaic virus (PlAMV) is a member of the genus Potexvirus and has an exceptionally wide host range. It causes severe damage to lilies. Here we report on the complete nucleotide sequences of two new Japanese PlAMV isolates, one from the eudicot weed Viola grypoceras (PlAMV-Vi), and the other from the eudicot shrub Nandina domestica Thunb. (PlAMV-NJ). Their genomes contain five open reading frames (ORFs), which is characteristic of potexviruses. Surprisingly, the isolates showed only 76.0-78.0 % sequence identity with each other and with other PlAMV isolates, including isolates from Japanese lily and American nandina. Amino acid alignments of the replicase coding region encoded by ORF1 showed that the regions between the methyltransferase and helicase domains were less conserved than other regions, with several insertions and/or deletions. Phylogenetic analyses of the full-length nucleotide sequences revealed a moderate correlation between phylogenetic clustering and the original host plants of the PlAMV isolates. This study revealed the presence of two highly divergent PlAMV isolates in Japan.

  8. DNA Sequence Polymorphism of the Lactate Dehydrogenase Genefrom Iranian Plasmodium vivax and Plasmodium falciparum Isolates.

    PubMed

    Getacher Feleke, Daniel; Nateghpour, Mehdi; Motevalli Haghi, Afsaneh; Hajjaran, Homa; Farivar, Leila; Mohebali, Mehdi; Raoofian, Reza

    2015-01-01

    Parasite lactate dehydrogenase (pLDH) is extensively employed as malaria rapid diagnostic tests (RDTs). Moreover, it is a well-known drug target candidate. However, the genetic diversity of this gene might influence performance of RDT kits and its drug target candidacy. This study aimed to determine polymorphism of pLDH gene from Iranian isolates of P. vivax and P. falciparum. Genomic DNA was extracted from whole blood of microscopically confirmed P. vivax and P. falciparum infected patients. pLDH gene of P. falciparum and P. vivax was amplified using conventional PCR from 43 symptomatic malaria patients from Sistan and Baluchistan Province, Southeast Iran from 2012 to 2013. Sequence analysis of 15 P. vivax LDH showed fourteen had 100% identity with P. vivax Sal-1 and Belem strains. Two nucleotide substitutions were detected with only one resulted in amino acid change. Analysis of P. falciparum LDH sequences showed six of the seven sequences had 100% homology with P. falciparum 3D7 and Mzr-1. Moreover, PfLDH displayed three nucleotide changes that resulted in changing only one amino acid. PvLDH and PfLDH showed 75%-76% nucleotide and 90.4%-90.76% amino acid homology. pLDH gene from Iranian P. falciparum and P. vivax isolates displayed 98.8-100% homology with 1-3 nucleotide substitutions. This indicated this gene was relatively conserved. Additional studies can be done weather this genetic variation can influence the performance of pLDH based RDTs or not.

  9. Characterization and Nucleotide Sequence of CARB-6, a New Carbenicillin-Hydrolyzing β-Lactamase from Vibrio cholerae

    PubMed Central

    Choury, Danièle; Aubert, Gérald; Szajnert, Marie-France; Azibi, Kemal; Delpech, Marc; Paul, Gérard

    1999-01-01

    A clinical strain of Vibrio cholerae non-O1 non-O139 isolated in France produced a new β-lactamase with a pI of 5.35. The purified enzyme, with a molecular mass of 33,000 Da, was characterized. Its kinetic constants show it to be a carbenicillin-hydrolyzing enzyme comparable to the five previously reported CARB β-lactamases and to SAR-1, another carbenicillin-hydrolyzing β-lactamase that has a pI of 4.9 and that is produced by a V. cholerae strain from Tanzania. This β-lactamase is designated CARB-6, and the gene for CARB-6 could not be transferred to Escherichia coli K-12 by conjugation. The nucleotide sequence of the structural gene was determined by direct sequencing of PCR-generated fragments from plasmid DNA with four pairs of primers covering the whole sequence of the reference CARB-3 gene. The gene encodes a 288-amino-acid protein that shares 94% homology with the CARB-1, CARB-2, and CARB-3 enzymes, 93% homology with the Proteus mirabilis N29 enzyme, and 86.5% homology with the CARB-4 enzyme. The sequence of CARB-6 differs from those of CARB-3, CARB-2, CARB-1, N29, and CARB-4 at 15, 16, 17, 19, and 37 amino acid positions, respectively. All these mutations are located in the C-terminal region of the sequence and at the surface of the molecule, according to the crystal structure of the Staphylococcus aureus PC-1 β-lactamase. PMID:9925522

  10. Molecular characterization of Giardia psittaci by multilocus sequence analysis.

    PubMed

    Abe, Niichiro; Makino, Ikuko; Kojima, Atsushi

    2012-12-01

    Multilocus sequence analyses targeting small subunit ribosomal DNA (SSU rDNA), elongation factor 1 alpha (ef1α), glutamate dehydrogenase (gdh), and beta giardin (β-giardin) were performed on Giardia psittaci isolates from three Budgerigars (Melopsittacus undulates) and four Barred parakeets (Bolborhynchus lineola) kept in individual households or imported from overseas. Nucleotide differences and phylogenetic analyses at four loci indicate the distinction of G. psittaci from the other known Giardia species: Giardia muris, Giardia microti, Giardia ardeae, and Giardia duodenalis assemblages. Furthermore, G. psittaci was related more closely to G. duodenalis than to the other known Giardia species, except for G. microti. Conflicting signals regarded as "double peaks" were found at the same nucleotide positions of the ef1α in all isolates. However, the sequences of the other three loci, including gdh and β-giardin, which are known to be highly variable, from all isolates were also mutually identical at every locus. They showed no double peaks. These results suggest that double peaks found in the ef1α sequences are caused not by mixed infection with genetically different G. psittaci isolates but by allelic sequence heterogeneity (ASH), which is observed in diplomonad lineages including G. duodenalis. No sequence difference was found in any G. psittaci isolates at the gdh and β-giardin, suggesting that G. psittaci is indeed not more diverse genetically than other Giardia species. This report is the first to provide evidence related to the genetic characteristics of G. psittaci obtained using multilocus sequence analysis. Copyright © 2012 Elsevier B.V. All rights reserved.

  11. Genome Sequence of a Bombyx mori Nucleopolyhedrovirus Strain with Cubic Occlusion Bodies

    PubMed Central

    Cheng, Ruo-Lin; Xu, Yi-Peng

    2012-01-01

    Bombyx mori nucleopolyhedrovirus (BmNPV) is a typical species of Baculoviridae. The complete genome sequence of a BmNPV strain with cubic occlusion bodies is reported here. The genome of this strain consists of 127,465 nucleotides with a G+C content of 40.36% and is 97.3% and 97.5% identical to those of BmNPV strain T3 and Bombyx mandarina NPV S1, respectively. Despite the abnormal polyhedra it forms, the polyhedrin gene of the BmNPV cubic strain is 100% identical to those of the other two strains. Baculovirus repeated ORFs and homologous repeat regions cause the major differences in genome size of these BmNPV isolates. PMID:22923803

  12. [Complete nucleotide sequences and genome structure of two Chinese tobacco mosaic virus isolates deduced from full-length infectious cDNA clones].

    PubMed

    Yang, G; Liu, X G; Qiu, B S

    2000-07-01

    The complete nucleotides of two Chinese tobacco mosaic virus (TMV) isolates, TMV-Cv (vulgare strain) and TMV-N14 (an attenuated virus originated from a tomato strain), were determined from their respective full-length infectious cDNA clones and compared with published TMV sequences. The genome structure of TMV-Cv contained 6395 nucleotides, in which four functional open reading frames (ORF), coding for replicase (126 kD/183 kD), movement protein (MP, 30 kD) and coat protein (CP, 17.6 kD) respectively, could be recognized. TMV-N14 contained 6384 nucleotides in its genome. In contrast to TMV-Cv, five functional ORFs encoding the replicase 98.5 kD/126 kD/183 kD, MP(27 kD) and CP(17.6 kD), respectively, were detected in the TMV-N14 genome. TMV-Cv is 99% homologous to a Korean TMV isolate belonging to the vulgare strain at the nucleotide level. TMV-N14 is 99% homologous to a highly virulent Japanese isolate TMV-L (tomato strain) at the nucleotide level. In TMV-N14, one opal nulation (UGA) occurred in the replicase gene and one ochre nutation (UAA) in the MP gene. The former mutation created a potential, additional ORF within the replicase gene, the latter reduced the size of the MP to 27 kD. In addition, there were also 13 amino acid substitutions in the replicase gene of TMV-N14 when compared to that of TMV-L. Collectively, these changes may have significant implications in the attenuation of the virulence of TMV-N14.

  13. Uncommon nucleotide excision repair phenotypes revealed by targeted high-throughput sequencing.

    PubMed

    Calmels, Nadège; Greff, Géraldine; Obringer, Cathy; Kempf, Nadine; Gasnier, Claire; Tarabeux, Julien; Miguet, Marguerite; Baujat, Geneviève; Bessis, Didier; Bretones, Patricia; Cavau, Anne; Digeon, Béatrice; Doco-Fenzy, Martine; Doray, Bérénice; Feillet, François; Gardeazabal, Jesus; Gener, Blanca; Julia, Sophie; Llano-Rivas, Isabel; Mazur, Artur; Michot, Caroline; Renaldo-Robin, Florence; Rossi, Massimiliano; Sabouraud, Pascal; Keren, Boris; Depienne, Christel; Muller, Jean; Mandel, Jean-Louis; Laugel, Vincent

    2016-03-22

    Deficient nucleotide excision repair (NER) activity causes a variety of autosomal recessive diseases including xeroderma pigmentosum (XP) a disorder which pre-disposes to skin cancer, and the severe multisystem condition known as Cockayne syndrome (CS). In view of the clinical overlap between NER-related disorders, as well as the existence of multiple phenotypes and the numerous genes involved, we developed a new diagnostic approach based on the enrichment of 16 NER-related genes by multiplex amplification coupled with next-generation sequencing (NGS). Our test cohort consisted of 11 DNA samples, all with known mutations and/or non pathogenic SNPs in two of the tested genes. We then used the same technique to analyse samples from a prospective cohort of 40 patients. Multiplex amplification and sequencing were performed using AmpliSeq protocol on the Ion Torrent PGM (Life Technologies). We identified causative mutations in 17 out of the 40 patients (43%). Four patients showed biallelic mutations in the ERCC6(CSB) gene, five in the ERCC8(CSA) gene: most of them had classical CS features but some had very mild and incomplete phenotypes. A small cohort of 4 unrelated classic XP patients from the Basque country (Northern Spain) revealed a common splicing mutation in POLH (XP-variant), demonstrating a new founder effect in this population. Interestingly, our results also found ERCC2(XPD), ERCC3(XPB) or ERCC5(XPG) mutations in two cases of UV-sensitive syndrome and in two cases with mixed XP/CS phenotypes. Our study confirms that NGS is an efficient technique for the analysis of NER-related disorders on a molecular level. It is particularly useful for phenotypes with combined features or unusually mild symptoms. Targeted NGS used in conjunction with DNA repair functional tests and precise clinical evaluation permits rapid and cost-effective diagnosis in patients with NER-defects.

  14. Exploring single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes in the jellyfish (Rhopilema esculentum) by transcriptome sequencing.

    PubMed

    Li, Yunfeng; Zhou, Zunchun; Tian, Meilin; Tian, Yi; Dong, Ying; Li, Shilei; Liu, Weidong; He, Chongbo

    2017-08-01

    In this study, single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes (DEGs) in the oral parts, gonads, and umbrella parts of the jellyfish Rhopilema esculentum were analyzed by RNA-Seq technology. A total of 76.4 million raw reads and 72.1 million clean reads were generated from deep sequencing. Approximately 119,874 tentative unigenes and 149,239 transcripts were obtained. A total of 1,034,708 SNP markers were detected in the three tissues. For microsatellite mining, 5088 SSRs were identified from the unigene sequences. The most frequent repeat motifs were mononucleotide repeats, which accounted for 61.93%. Transcriptome comparison of the three tissues yielded a total of 8841 DEGs, of which 3560 were up-regulated and 5281 were down-regulated. This study represents the greatest sequencing effort carried out for a jellyfish and provides the first high-throughput transcriptomic resource for jellyfish. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Nucleotide Selectivity in Abiotic RNA Polymerization Reactions.

    PubMed

    Coari, Kristin M; Martin, Rebecca C; Jain, Kopal; McGown, Linda B

    2017-09-01

    In order to establish an RNA world on early Earth, the nucleotides must form polymers through chemical rather than biochemical reactions. The polymerization products must be long enough to perform catalytic functions, including self-replication, and to preserve genetic information. These functions depend not only on the length of the polymers, but also on their sequences. To date, studies of abiotic RNA polymerization generally have focused on routes to polymerization of a single nucleotide and lengths of the homopolymer products. Less work has been done the selectivity of the reaction toward incorporation of some nucleotides over others in nucleotide mixtures. Such information is an essential step toward understanding the chemical evolution of RNA. To address this question, in the present work RNA polymerization reactions were performed in the presence of montmorillonite clay catalyst. The nucleotides included the monophosphates of adenosine, cytosine, guanosine, uridine and inosine. Experiments included reactions of mixtures of an imidazole-activated nucleotide (ImpX) with one or more unactivated nucleotides (XMP), of two or more ImpX, and of XMP that were activated in situ in the polymerization reaction itself. The reaction products were analyzed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) to identify the lengths and nucleotide compositions of the polymerization products. The results show that the extent of polymerization, the degree of heteropolymerization vs. homopolymerization, and the composition of the polymeric products all vary among the different nucleotides and depend upon which nucleotides and how many different nucleotides are present in the mixture.

  16. Nucleotide Selectivity in Abiotic RNA Polymerization Reactions

    NASA Astrophysics Data System (ADS)

    Coari, Kristin M.; Martin, Rebecca C.; Jain, Kopal; McGown, Linda B.

    2017-09-01

    In order to establish an RNA world on early Earth, the nucleotides must form polymers through chemical rather than biochemical reactions. The polymerization products must be long enough to perform catalytic functions, including self-replication, and to preserve genetic information. These functions depend not only on the length of the polymers, but also on their sequences. To date, studies of abiotic RNA polymerization generally have focused on routes to polymerization of a single nucleotide and lengths of the homopolymer products. Less work has been done the selectivity of the reaction toward incorporation of some nucleotides over others in nucleotide mixtures. Such information is an essential step toward understanding the chemical evolution of RNA. To address this question, in the present work RNA polymerization reactions were performed in the presence of montmorillonite clay catalyst. The nucleotides included the monophosphates of adenosine, cytosine, guanosine, uridine and inosine. Experiments included reactions of mixtures of an imidazole-activated nucleotide (ImpX) with one or more unactivated nucleotides (XMP), of two or more ImpX, and of XMP that were activated in situ in the polymerization reaction itself. The reaction products were analyzed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) to identify the lengths and nucleotide compositions of the polymerization products. The results show that the extent of polymerization, the degree of heteropolymerization vs. homopolymerization, and the composition of the polymeric products all vary among the different nucleotides and depend upon which nucleotides and how many different nucleotides are present in the mixture.

  17. Inferring Multiple Refugia and Phylogeographical Patterns in Pinus massoniana Based on Nucleotide Sequence Variation and DNA Fingerprinting

    PubMed Central

    Lin, Chung-Jian; Huang, Chi-Chung; Huang, Chao-Ching; Chiang, Yu-Chung; Chiang, Tzen-Yuh

    2012-01-01

    Background Pinus massoniana, an ecologically and economically important conifer, is widespread across central and southern mainland China and Taiwan. In this study, we tested the central–marginal paradigm that predicts that the marginal populations tend to be less polymorphic than the central ones in their genetic composition, and examined a founders' effect in the island population. Methodology/Principal Findings We examined the phylogeography and population structuring of the P. massoniana based on nucleotide sequences of cpDNA atpB-rbcL intergenic spacer, intron regions of the AdhC2 locus, and microsatellite fingerprints. SAMOVA analysis of nucleotide sequences indicated that most genetic variants resided among geographical regions. High levels of genetic diversity in the marginal populations in the south region, a pattern seemingly contradicting the central–marginal paradigm, and the fixation of private haplotypes in most populations indicate that multiple refugia may have existed over the glacial maxima. STRUCTURE analyses on microsatellites revealed that genetic structure of mainland populations was mediated with recent genetic exchanges mostly via pollen flow, and that the genetic composition in east region was intermixed between south and west regions, a pattern likely shaped by gene introgression and maintenance of ancestral polymorphisms. As expected, the small island population in Taiwan was genetically differentiated from mainland populations. Conclusions/Significance The marginal populations in south region possessed divergent gene pools, suggesting that the past glaciations might have low impacts on these populations at low latitudes. Estimates of ancestral population sizes interestingly reflect a recent expansion in mainland from a rather smaller population, a pattern that seemingly agrees with the pollen record. PMID:22952747

  18. A novel representation of the conformational structure of transfer RNAs. Correlation of the folding patterns of the polynucleotide chain with the base sequence and the nucleotide backbone torsions.

    PubMed Central

    Srinivasan, A R; Yathindra, N

    1977-01-01

    A novel description of the conformational characteristics of all the individual nucleotides and the phosphodiesters in tRNAs is presented in the form of a circular plot. This representation furnishes information of the base sequence with the folding patterns of the polynucleotide chain as one traverses along the circumference and with the individual nucleotide and phosphodiester linkage torsions along the radii. The circular plot obtained for yeast tRNAPhe strikingly distinguishes the helical and the loop regions. The variation of the different nucleotide torsions along the entire chain length and their effect on the secondary helical and tertiary loop regions become readily apparent. PMID:339206

  19. Variation in the Nucleotide Sequence of Cottontail Rabbit Papillomavirus a and b Subtypes Affects Wart Regression and Malignant Transformation and Level of Viral Replication in Domestic Rabbits

    PubMed Central

    Salmon, Jérôme; Nonnenmacher, Mathieu; Cazé, Sandrine; Flamant, Patricia; Croissant, Odile; Orth, Gérard; Breitburd, Françoise

    2000-01-01

    We previously reported the partial characterization of two cottontail rabbit papillomavirus (CRPV) subtypes with strikingly divergent E6 and E7 oncoproteins. We report now the complete nucleotide sequences of these subtypes, referred to as CRPVa4 (7,868 nucleotides) and CRPVb (7,867 nucleotides). The CRPVa4 and CRPVb genomes differed at 238 (3%) nucleotide positions, whereas CRPVa4 and the prototype CRPV differed by only 5 nucleotides. The most variable region (7% nucleotide divergence) included the long regulatory region (LRR) and the E6 and E7 genes. A mutation in the stop codon resulted in an 8-amino-acid-longer CRPVb E4 protein, and a nucleotide deletion reduced the coding capacity of the E5 gene from 101 to 25 amino acids. In domestic rabbits homozygous for a specific haplotype of the DRA and DQA genes of the major histocompatibility complex, warts induced by CRPVb DNA or a chimeric genome containing the CRPVb LRR/E6/E7 region showed an early regression, whereas warts induced by CRPVa4 or a chimeric genome containing the CRPVa4 LRR/E6/E7 region persisted and evolved into carcinomas. In contrast, most CRPVa, CRPVb, and chimeric CRPV DNA-induced warts showed no early regression in rabbits homozygous for another DRA-DQA haplotype. Little, if any, viral replication is usually observed in domestic rabbit warts. When warts induced by CRPVa and CRPVb virions and DNA were compared, the number of cells positive for viral DNA or capsid antigens was found to be greater by 1 order of magnitude for specimens induced by CRPVb. Thus, both sequence variation in the LRR/E6/E7 region and the genetic constitution of the host influence the expression of the oncogenic potential of CRPV. Furthermore, intratype variation may overcome to some extent the host restriction of CRPV replication in domestic rabbits. PMID:11044121

  20. Fluorogenic DNA Sequencing in PDMS Microreactors

    PubMed Central

    Sims, Peter A.; Greenleaf, William J.; Duan, Haifeng; Xie, X. Sunney

    2012-01-01

    We have developed a multiplex sequencing-by-synthesis method combining terminal-phosphate labeled fluorogenic nucleotides (TPLFNs) and resealable microreactors. In the presence of phosphatase, the incorporation of a non-fluorescent TPLFN into a DNA primer by DNA polymerase results in a fluorophore. We immobilize DNA templates within polydimethylsiloxane (PDMS) microreactors, sequentially introduce one of the four identically labeled TPLFNs, seal the microreactors, allow template-directed TPLFN incorporation, and measure the signal from the fluorophores trapped in the microreactors. This workflow allows sequencing in a manner akin to pyrosequencing but without constant monitoring of each microreactor. With cycle times of <10 minutes, we demonstrate 30 base reads with ∼99% raw accuracy. “Fluorogenic pyrosequencing” combines benefits of pyrosequencing, such as rapid turn-around, native DNA generation, and single-color detection, with benefits of fluorescence-based approaches, such as highly sensitive detection and simple parallelization. PMID:21666670

  1. Fixed-Gap Tunnel Junction for Reading DNA Nucleotides

    PubMed Central

    2015-01-01

    Previous measurements of the electronic conductance of DNA nucleotides or amino acids have used tunnel junctions in which the gap is mechanically adjusted, such as scanning tunneling microscopes or mechanically controllable break junctions. Fixed-junction devices have, at best, detected the passage of whole DNA molecules without yielding chemical information. Here, we report on a layered tunnel junction in which the tunnel gap is defined by a dielectric layer, deposited by atomic layer deposition. Reactive ion etching is used to drill a hole through the layers so that the tunnel junction can be exposed to molecules in solution. When the metal electrodes are functionalized with recognition molecules that capture DNA nucleotides via hydrogen bonds, the identities of the individual nucleotides are revealed by characteristic features of the fluctuating tunnel current associated with single-molecule binding events. PMID:25380505

  2. Nucleotide sequence of the Saccharomyces cerevisiae PUT4 proline-permease-encoding gene: similarities between CAN1, HIP1 and PUT4 permeases.

    PubMed

    Vandenbol, M; Jauniaux, J C; Grenson, M

    1989-11-15

    The complete nucleotide (nt) sequence of the PUT4 gene, whose product is required for high-affinity proline active transport in the yeast Saccharomyces cerevisiae, is presented. The sequence contains a single long open reading frame of 1881 nt, encoding a polypeptide with a calculated Mr of 68,795. The predicted protein is strongly hydrophobic and exhibits six potential glycosylation sites. Its hydropathy profile suggests the presence of twelve membrane-spanning regions flanked by hydrophilic N- and C-terminal domains. The N terminus does not resemble signal sequences found in secreted proteins. These features are characteristic of integral membrane proteins catalyzing translocation of ligands across cellular membranes. Protein sequence comparisons indicate strong resemblance to the arginine and histidine permeases of S. cerevisiae, but no marked sequence similarity to the proline permease of Escherichia coli or to other known prokaryotic or eukaryotic transport proteins. The strong similarity between the three yeast amino acid permeases suggests a common ancestor for the three proteins.

  3. FASH: A web application for nucleotides sequence search.

    PubMed

    Veksler-Lublinksy, Isana; Barash, Danny; Avisar, Chai; Troim, Einav; Chew, Paul; Kedem, Klara

    2008-05-27

    : FASH (Fourier Alignment Sequence Heuristics) is a web application, based on the Fast Fourier Transform, for finding remote homologs within a long nucleic acid sequence. Given a query sequence and a long text-sequence (e.g, the human genome), FASH detects subsequences within the text that are remotely-similar to the query. FASH offers an alternative approach to Blast/Fasta for querying long RNA/DNA sequences. FASH differs from these other approaches in that it does not depend on the existence of contiguous seed-sequences in its initial detection phase. The FASH web server is user friendly and very easy to operate. FASH can be accessed athttps://fash.bgu.ac.il:8443/fash/default.jsp (secured website).

  4. Mitochondrial control-region sequence variation in aboriginal Australians.

    PubMed Central

    van Holst Pellekaan, S; Frommer, M; Sved, J; Boettcher, B

    1998-01-01

    The mitochondrial D-loop hypervariable segment 1 (mt HVS1) between nucleotides 15997 and 16377 has been examined in aboriginal Australian people from the Darling River region of New South Wales (riverine) and from Yuendumu in central Australia (desert). Forty-seven unique HVS1 types were identified, varying at 49 nucleotide positions. Pairwise analysis by calculation of BEPPI (between population proportion index) reveals statistically significant structure in the populations, although some identical HVS1 types are seen in the two contrasting regions. mt HVS1 types may reflect more-ancient distributions than do linguistic diversity and other culturally distinguishing attributes. Comparison with sequences from five published global studies reveals that these Australians demonstrate greatest divergence from some Africans, least from Papua New Guinea highlanders, and only slightly more from some Pacific groups (Indonesian, Asian, Samoan, and coastal Papua New Guinea), although the HVS1 types vary at different nucleotide sites. Construction of a median network, displaying three main groups, suggests that several hypervariable nucleotide sites within the HVS1 are likely to have undergone mutation independently, making phylogenetic comparison with global samples by conventional methods difficult. Specific nucleotide-site variants are major separators in median networks constructed from Australian HVS1 types alone and for one global selection. The distribution of these, requiring extended study, suggests that they may be signatures of different groups of prehistoric colonizers into Australia, for which the time of colonization remains elusive. PMID:9463317

  5. Next-Generation Sequencing of Coccidioides immitis Isolated during Cluster Investigation

    PubMed Central

    Engelthaler, David M.; Chiller, Tom; Schupp, James A.; Colvin, Joshua; Beckstrom-Sternberg, Stephen M.; Driebe, Elizabeth M.; Moses, Tracy; Tembe, Waibhav; Sinari, Shripad; Beckstrom-Sternberg, James S.; Christoforides, Alexis; Pearson, John V.; Carpten, John; Keim, Paul; Peterson, Ashley; Terashita, Dawn

    2011-01-01

    Next-generation sequencing enables use of whole-genome sequence typing (WGST) as a viable and discriminatory tool for genotyping and molecular epidemiologic analysis. We used WGST to confirm the linkage of a cluster of Coccidioides immitis isolates from 3 patients who received organ transplants from a single donor who later had positive test results for coccidioidomycosis. Isolates from the 3 patients were nearly genetically identical (a total of 3 single-nucleotide polymorphisms identified among them), thereby demonstrating direct descent of the 3 isolates from an original isolate. We used WGST to demonstrate the genotypic relatedness of C. immitis isolates that were also epidemiologically linked. Thus, WGST offers unique benefits to public health for investigation of clusters considered to be linked to a single source. PMID:21291593

  6. A Simple Sequence Repeat- and Single-Nucleotide Polymorphism-Based Genetic Linkage Map of the Brown Planthopper, Nilaparvata lugens

    PubMed Central

    Jairin, Jirapong; Kobayashi, Tetsuya; Yamagata, Yoshiyuki; Sanada-Morimura, Sachiyo; Mori, Kazuki; Tashiro, Kosuke; Kuhara, Satoru; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Yamamoto, Kimiko; Matsumura, Masaya; Yasui, Hideshi

    2013-01-01

    In this study, we developed the first genetic linkage map for the major rice insect pest, the brown planthopper (BPH, Nilaparvata lugens). The linkage map was constructed by integrating linkage data from two backcross populations derived from three inbred BPH strains. The consensus map consists of 474 simple sequence repeats, 43 single-nucleotide polymorphisms, and 1 sequence-tagged site, for a total of 518 markers at 472 unique positions in 17 linkage groups. The linkage groups cover 1093.9 cM, with an average distance of 2.3 cM between loci. The average number of marker loci per linkage group was 27.8. The sex-linkage group was identified by exploiting X-linked and Y-specific markers. Our linkage map and the newly developed markers used to create it constitute an essential resource and a useful framework for future genetic analyses in BPH. PMID:23204257

  7. Complete genome sequence of a novel Plum pox virus strain W isolate determined by 454 pyrosequencing.

    PubMed

    Sheveleva, Anna; Kudryavtseva, Anna; Speranskaya, Anna; Belenikin, Maxim; Melnikova, Natalia; Chirkov, Sergei

    2013-10-01

    The near-complete (99.7 %) genome sequence of a novel Russian Plum pox virus (PPV) isolate Pk, belonging to the strain Winona (W), has been determined by 454 pyrosequencing with the exception of the thirty-one 5'-terminal nucleotides. This region was amplified using 5'RACE kit and sequenced by the Sanger method. Genomic RNA released from immunocaptured PPV particles was employed for generation of cDNA library using TransPlex Whole transcriptome amplification kit (WTA2, Sigma-Aldrich). The entire Pk genome has identity level of 92.8-94.5 % when compared to the complete nucleotide sequences of other PPV-W isolates (W3174, LV-141pl, LV-145bt, and UKR 44189), confirming a high degree of variability within the PPV-W strain. The isolates Pk and LV-141pl are most closely related. The Pk has been found in a wild plum (Prunus domestica) in a new region of Russia indicating widespread dissemination of the PPV-W strain in the European part of the former USSR.

  8. Comparative analysis of the prion protein gene sequences in African lion.

    PubMed

    Wu, Chang-De; Pang, Wan-Yong; Zhao, De-Ming

    2006-10-01

    The prion protein gene of African lion (Panthera Leo) was first cloned and polymorphisms screened. The results suggest that the prion protein gene of eight African lions is highly homogenous. The amino acid sequences of the prion protein (PrP) of all samples tested were identical. Four single nucleotide polymorphisms (C42T, C81A, C420T, T600C) in the prion protein gene (Prnp) of African lion were found, but no amino acid substitutions. Sequence analysis showed that the higher homology is observed to felis catus AF003087 (96.7%) and to sheep number M31313.1 (96.2%) Genbank accessed. With respect to all the mammalian prion protein sequences compared, the African lion prion protein sequence has three amino acid substitutions. The homology might in turn affect the potential intermolecular interactions critical for cross species transmission of prion disease.

  9. The sequence specificity of UV-induced DNA damage in a systematically altered DNA sequence.

    PubMed

    Khoe, Clairine V; Chung, Long H; Murray, Vincent

    2018-06-01

    The sequence specificity of UV-induced DNA damage was investigated in a specifically designed DNA plasmid using two procedures: end-labelling and linear amplification. Absorption of UV photons by DNA leads to dimerisation of pyrimidine bases and produces two major photoproducts, cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). A previous study had determined that two hexanucleotide sequences, 5'-GCTC*AC and 5'-TATT*AA, were high intensity UV-induced DNA damage sites. The UV clone plasmid was constructed by systematically altering each nucleotide of these two hexanucleotide sequences. One of the main goals of this study was to determine the influence of single nucleotide alterations on the intensity of UV-induced DNA damage. The sequence 5'-GCTC*AC was designed to examine the sequence specificity of 6-4PPs and the highest intensity 6-4PP damage sites were found at 5'-GTTC*CC nucleotides. The sequence 5'-TATT*AA was devised to investigate the sequence specificity of CPDs and the highest intensity CPD damage sites were found at 5'-TTTT*CG nucleotides. It was proposed that the tetranucleotide DNA sequence, 5'-YTC*Y (where Y is T or C), was the consensus sequence for the highest intensity UV-induced 6-4PP adduct sites; while it was 5'-YTT*C for the highest intensity UV-induced CPD damage sites. These consensus tetranucleotides are composed entirely of consecutive pyrimidines and must have a DNA conformation that is highly productive for the absorption of UV photons. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.

  10. Parameters of proteome evolution from histograms of amino-acid sequence identities of paralogous proteins

    PubMed Central

    Axelsen, Jacob Bock; Yan, Koon-Kiu; Maslov, Sergei

    2007-01-01

    Background The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic parameters of these three basic processes is contained in the proteome-wide distribution of sequence identities of pairs of paralogous proteins. Results We introduce a simple mathematical framework based on a stochastic birth-and-death model that allows one to extract some of this information and apply it to the set of all pairs of paralogous proteins in H. pylori, E. coli, S. cerevisiae, C. elegans, D. melanogaster, and H. sapiens. It was found that the histogram of sequence identities p generated by an all-to-all alignment of all protein sequences encoded in a genome is well fitted with a power-law form ~ p-γ with the value of the exponent γ around 4 for the majority of organisms used in this study. This implies that the intra-protein variability of substitution rates is best described by the Gamma-distribution with the exponent α ≈ 0.33. Different features of the shape of such histograms allow us to quantify the ratio between the genome-wide average deletion/duplication rates and the amino-acid substitution rate. Conclusion We separately measure the short-term ("raw") duplication and deletion rates rdup∗, rdel∗ which include gene copies that will be removed soon after the duplication event and their dramatically reduced long-term counterparts rdup, rdel. High deletion rate among recently duplicated proteins is consistent with a scenario in which they didn't have enough time to significantly change their functional roles and thus are to a large degree disposable. Systematic trends of each of the four duplication/deletion rates with the total number of genes in the genome were analyzed. All but the deletion rate of recent duplicates rdel∗ were shown to systematically increase with Ngenes. Abnormally flat shapes

  11. Isolation and sequence of partial cDNA clones of human L1: homology of human and rodent L1 in the cytoplasmic region.

    PubMed

    Harper, J R; Prince, J T; Healy, P A; Stuart, J K; Nauman, S J; Stallcup, W B

    1991-03-01

    We have isolated cDNA clones coding for the human homologue of the neuronal cell adhesion molecule L1. The nucleotide sequence of the cDNA clones and the deduced primary amino acid sequence of the carboxy terminal portion of the human L1 are homologous to the corresponding sequences of mouse L1 and rat NILE glycoprotein, with an especially high sequences identity in the cytoplasmic regions of the proteins. There is also protein sequence homology with the cytoplasmic region of the Drosophila cell adhesion molecule, neuroglian. The conservation of the cytoplasmic domain argues for an important functional role for this portion of the molecule.

  12. Nucleotide Sequences of Genes Coding for Fimbrial Proteins in a Cryptic Genospecies of Haemophilus spp. Isolated from Neonatal and Genital Tract Infections

    PubMed Central

    Gousset, Nathalie; Rosenau, Agnes; Sizaret, Pierre-Yves; Quentin, Roland

    1999-01-01

    Nineteen isolates belonging to a cryptic genospecies of Haemophilus (referred to here as genital strains) isolated from genital tract infections (6 strains) and from neonatal infections (13 strains) were studied for fimbrial genes. Sixteen strains exhibit peritrichous fimbriae observed by electron microscopy. By PCR with primers corresponding to the extreme ends of the Haemophilus influenzae type b (Hib) hifA and hifD genes and Southern blotting, a hifA-like gene (named ghfA) and a hifD-like gene (named ghfD) were identified in 6 of the 19 strains. Five of these six strains were from the genital tracts of adults, and one was from a neonate. For each gene, the nucleotide sequence was identical for the six strains. A hifE-like gene (named ghfE) was amplified from only one of the 19 genital strains of Haemophilus, but the ghfE probe gave a signal in Southern hybridization with the five other strains positive for ghfA and ghfD. Therefore, these strains may carry a ghfE-like gene. The Hib fimbrial gene cluster is located between the purE and pepN genes as previously described. For the 13 genital Haemophilus strains that lack fimbrial genes, this region corresponds to a noncoding sequence. Another major fimbrial gene designated the fimbrin gene was previously identified in a nontypeable H. influenzae strain. A fimbrin-like gene was identified for all of our 19 genital strains. This gene is similar to the ompP5 gene of many Haemophilus strains. Therefore, other, unidentified genes may explain the piliation observed in electron microscopy on genital Haemophilus strains which do not possess LKP-like fimbrial genes. Fimbrial genes were significantly associated with strains isolated from the genital tract. They may confer on the strain the ability to survive in the genital tract. PMID:9864189

  13. Nucleotide sequence of the COX1 gene in Kluyveromyces lactis mitochondrial DNA: evidence for recent horizontal transfer of a group II intron.

    PubMed

    Hardy, C M; Clark-Walker, G D

    1991-07-01

    The cytochrome oxidase subunit 1 gene (COX1) in K. lactis K8 mtDNA spans 8,826 bp and contains five exons (termed E1-E5) totalling 1,602 bp that show 88% nucleotide base matching and 91% amino acid homology to the equivalent gene in S. cerevisiae. The four introns (termed K1 cox1.1-1.4) contain open reading frames encoding proteins of 786, 333, 319 and 395 amino acids respectively that potentially encode maturase enzymes. The first intron belongs to group II whereas the remaining three are group I type B. Introns K1 cox1.1, 1.3, and 1.4 are found at identical locations to introns Sc cox1.2, 1.5 a, and 1.5 b respectively from S. cerevisiae. Horizontal transfer of an intron between recent progenitors of K. lactis and S. cerevisiae is suggested by the observation that K1 cox1.1 and Sc cox1.2 show 96% base matching. Sequence comparisons between K1 cox1.3/Sc cox1.5 a and K1 cox1.4/Sc cox1.5 b suggest that these introns are likely to have been present in the ancestral COX1 gene of these yeasts. Intron K1 cox1.2 is not found in S. cerevisiae and appears at an unique location in K. lactis. A feature of the DNA sequences of the group I introns K1 cox1.2, 1.3, and 1.4 is the presence of 11 GC-rich clusters inserted into both coding and noncoding regions. Immediately downstream of the COX1 gene is the ATPase subunit 8 gene (A8) that shows 82.6% base matching to its counterpart in S. cerevisiae mtDNA.

  14. The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences

    PubMed Central

    2010-01-01

    Background In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. Results We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS) sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity) elsewhere in the genome, but only 23% have identical copies (99% identity). The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. Conclusions This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is a feasible goal. PMID

  15. The bioinformatics of nucleotide sequence coding for proteins requiring metal coenzymes and proteins embedded with metals

    NASA Astrophysics Data System (ADS)

    Tremberger, G.; Dehipawala, Sunil; Cheung, E.; Holden, T.; Sullivan, R.; Nguyen, A.; Lieberman, D.; Cheung, T.

    2015-09-01

    All metallo-proteins need post-translation metal incorporation. In fact, the isotope ratio of Fe, Cu, and Zn in physiology and oncology have emerged as an important tool. The nickel containing F430 is the prosthetic group of the enzyme methyl coenzyme M reductase which catalyzes the release of methane in the final step of methano-genesis, a prime energy metabolism candidate for life exploration space mission in the solar system. The 3.5 Gyr early life sulfite reductase as a life switch energy metabolism had Fe-Mo clusters. The nitrogenase for nitrogen fixation 3 billion years ago had Mo. The early life arsenite oxidase needed for anoxygenic photosynthesis energy metabolism 2.8 billion years ago had Mo and Fe. The selection pressure in metal incorporation inside a protein would be quantifiable in terms of the related nucleotide sequence complexity with fractal dimension and entropy values. Simulation model showed that the studied metal-required energy metabolism sequences had at least ten times more selection pressure relatively in comparison to the horizontal transferred sequences in Mealybug, guided by the outcome histogram of the correlation R-sq values. The metal energy metabolism sequence group was compared to the circadian clock KaiC sequence group using magnesium atomic level bond shifting mechanism in the protein, and the simulation model would suggest a much higher selection pressure for the energy life switch sequence group. The possibility of using Kepler 444 as an example of ancient life in Galaxy with the associated exoplanets has been proposed and is further discussed in this report. Examples of arsenic metal bonding shift probed by Synchrotron-based X-ray spectroscopy data and Zn controlled FOXP2 regulated pathways in human and chimp brain studied tissue samples are studied in relationship to the sequence bioinformatics. The analysis results suggest that relatively large metal bonding shift amount is associated with low probability correlation R

  16. Determination and analysis of the complete genome sequence of Paralichthys olivaceus rhabdovirus (PORV).

    PubMed

    Zhu, Ruo-Lin; Zhang, Qi-Ya

    2014-04-01

    Paralichthys olivaceus rhabdovirus (PORV), which is associated with high mortality rates in flounder, was isolated in China in 2005. Here, we provide an annotated sequence record of PORV, the genome of which comprises 11,182 nucleotides and contains six genes in the order 3'-N-P-M-G-NV-L-5'. Phylogenetic analysis based on glycoprotein sequences of PORV and other rhabdoviruses showed that PORV clusters with viral haemorrhagic septicemia virus (VHSV), genus Novirhabdovirus, family Rhabdoviridae. Further phylogenetic analysis of the combined amino acid sequences of six proteins of PORV and VHSV strains showed that PORV clusters with Korean strains and is closely related to Asian strains, all of which were isolated from flounder. In a comparison in which the sequences of the six proteins were combined, PORV shared the highest identity (98.3 %) with VHSV strain KJ2008 from Korea.

  17. Characterization of apple stem grooving virus and apple chlorotic leaf spot virus identified in a crab apple tree.

    PubMed

    Li, Yongqiang; Deng, Congliang; Bian, Yong; Zhao, Xiaoli; Zhou, Qi

    2017-04-01

    Apple stem grooving virus (ASGV), apple chlorotic leaf spot virus (ACLSV), and prunus necrotic ringspot virus (PNRSV) were identified in a crab apple tree by small RNA deep sequencing. The complete genome sequence of ACLSV isolate BJ (ACLSV-BJ) was 7554 nucleotides and shared 67.0%-83.0% nucleotide sequence identity with other ACLSV isolates. A phylogenetic tree based on the complete genome sequence of all available ACLSV isolates showed that ACLSV-BJ clustered with the isolates SY01 from hawthorn, MO5 from apple, and JB, KMS and YH from pear. The complete nucleotide sequence of ASGV-BJ was 6509 nucleotides (nt) long and shared 78.2%-80.7% nucleotide sequence identity with other isolates. ASGV-BJ and the isolate ASGV_kfp clustered together in the phylogenetic tree as an independent clade. Recombination analysis showed that isolate ASGV-BJ was a naturally occurring recombinant.

  18. Next generation sequencing of SNPs using the HID-Ion AmpliSeq™ Identity Panel on the Ion Torrent PGM™ platform.

    PubMed

    Guo, Fei; Zhou, Yishu; Song, He; Zhao, Jinling; Shen, Hongying; Zhao, Bin; Liu, Feng; Jiang, Xianhua

    2016-11-01

    The HID-Ion AmpliSeq™ Identity Panel (the HID Identity Panel) is designed to detect 124-plex single nucleotide polymorphisms (SNPs) with next generation sequencing (NGS) technology on the Ion Torrent PGM™ platform, including 90 individual identification SNPs (IISNPs) on autosomal chromosomes and 34 lineage informative SNPs (LISNPs) on Y chromosome. In this study, we evaluated performance for the HID Identity Panel to provide a reference for NGS-SNP application, focusing on locus strand balance, locus coverage balance, heterozygote balance, and background signals. Besides, several experiments were carried out to find out improvements and limitations of this panel, including studies of species specificity, repeatability and concordance, sensitivity, mixtures, case-type samples and degraded samples, population genetics and pedigrees following the Scientific Working Group on DNA Analysis Methods (SWGDAM) guidelines. In addition, Southern and Northern Chinese Han were investigated to assess applicability of this panel. Results showed this panel led to cross-reactivity with primates to some extent but rarely with non-primate animals. Repeatable and concordant genotypes could be obtained in triplicate with one exception at rs7520386. Full profiles could be obtained from 100pg input DNA, but the optimal input DNA would be 1ng-200pg with 21 initial PCR cycles. A sample with ≥20% minor contributor could be considered as a mixture by the number of homozygotes, and full profiles belonging to minor contributors could be detected between 9:1 and 1:9 mixtures with known reference profiles. Also, this assay could be used for case-type samples and degraded samples. For autosomal SNPs (A-SNPs), F ST across all 90loci was not significantly different between Southern and Northern Chinese Han or between male and female samples. All A-SNP loci were independent in Chinese Han population. Except for 18loci with H e <0.4, most of the A-SNPs in the HID Identity Panel presented high

  19. Identification of a novel bovine enterovirus possessing highly divergent amino acid sequences in capsid protein.

    PubMed

    Tsuchiaka, Shinobu; Rahpaya, Sayed Samim; Otomaru, Konosuke; Aoki, Hiroshi; Kishimoto, Mai; Naoi, Yuki; Omatsu, Tsutomu; Sano, Kaori; Okazaki-Terashima, Sachiko; Katayama, Yukie; Oba, Mami; Nagai, Makoto; Mizutani, Tetsuya

    2017-01-17

    Bovine enterovirus (BEV) belongs to the species Enterovirus E or F, genus Enterovirus and family Picornaviridae. Although numerous studies have identified BEVs in the feces of cattle with diarrhea, the pathogenicity of BEVs remains unclear. Previously, we reported the detection of novel kobu-like virus in calf feces, by metagenomics analysis. In the present study, we identified a novel BEV in diarrheal feces collected for that survey. Complete genome sequences were determined by deep sequencing in feces. Secondary RNA structure analysis of the 5' untranslated region (UTR), phylogenetic tree construction and pairwise identity analysis were conducted. The complete genome sequences of BEV were genetically distant from other EVs and the VP1 coding region contained novel and unique amino acid sequences. We named this strain as BEV AN12/Bos taurus/JPN/2014 (referred to as BEV-AN12). According to genome analysis, the genome length of this virus is 7414 nucleotides excluding the poly (A) tail and its genome consists of a 5'UTR, open reading frame encoding a single polyprotein, and 3'UTR. The results of secondary RNA structure analysis showed that in the 5'UTR, BEV-AN12 had an additional clover leaf structure and small stem loop structure, similarly to other BEVs. In pairwise identity analysis, BEV-AN12 showed high amino acid (aa) identities to Enterovirus F in the polyprotein, P2 and P3 regions (aa identity ≥82.4%). Therefore, BEV-AN12 is closely related to Enterovirus F. However, aa sequences in the capsid protein regions, particularly the VP1 encoding region, showed significantly low aa identity to other viruses in genus Enterovirus (VP1 aa identity ≤58.6%). In addition, BEV-AN12 branched separately from Enterovirus E and F in phylogenetic trees based on the aa sequences of P1 and VP1, although it clustered with Enterovirus F in trees based on sequences in the P2 and P3 genome region. We identified novel BEV possessing highly divergent aa sequences in the VP1 coding

  20. Porcine MYF6 gene: sequence, homology analysis, and variation in the promoter region.

    PubMed

    Wyszyńska-Koko, J; Kurył, J

    2004-01-01

    MYF6 gene codes for the bHLH transcription factor belonging to MyoD family. Its expression accompanies the processes of differentiation and maturation of myotubes during embriogenesis and continues on a relatively high level after birth, affecting the muscle phenotype. The porcine MYF6 gene was amplified and sequenced and compared with MYF6 gene sequences of other species. The amino acid sequence was deduced and an interspecies homology analysis was performed. Myf-6 protein shows a high conservation among species of 99 and 97% identity when comparing pig with cow and human, respectively, and of 93% when comparing pig with mouse and rat. The single nucleotide polymorphism (SNP) was revealed within the promoter region, which appeared to be T --> C transition recognized by a MspI restriction enzyme.

  1. Complete genome sequence of Menghai rhabdovirus, a novel mosquito-borne rhabdovirus from China.

    PubMed

    Sun, Qiang; Zhao, Qiumin; An, Xiaoping; Guo, Xiaofang; Zuo, Shuqing; Zhang, Xianglilan; Pei, Guangqian; Liu, Wenli; Cheng, Shi; Wang, Yunfei; Shu, Peng; Mi, Zhiqiang; Huang, Yong; Zhang, Zhiyi; Tong, Yigang; Zhou, Hongning; Zhang, Jiusong

    2017-04-01

    Menghai rhabdovirus (MRV) was isolated from Aedes albopictus in Menghai county of Yunnan Province, China, in August 2010. Whole-genome sequencing of MRV was performed using an Ion PGM™ Sequencer. We found that MRV is a single-stranded, negative-sense RNA virus. The complete genome of MRV has 10,744 nt, with short inverted repeat termini, encoding five typical rhabdovirus proteins (N, P, M, G, and L) and an additional small hypothetical protein. Nucleotide BLAST analysis using the BLASTn method showed that the genome sequence most similar to that of MRV is that of Arboretum virus (NC_025393.1), with a Max score of 322, query coverage of 14%, and 66% identity. Genomic and phylogenetic analyses both demonstrated that MRV should be considered a member of a novel species of the family Rhabdoviridae.

  2. Whole genome sequencing options for bacterial strain typing and epidemiologic analysis based on single nucleotide polymorphism versus gene-by-gene-based approaches.

    PubMed

    Schürch, A C; Arredondo-Alonso, S; Willems, R J L; Goering, R V

    2018-04-01

    Whole genome sequence (WGS)-based strain typing finds increasing use in the epidemiologic analysis of bacterial pathogens in both public health as well as more localized infection control settings. This minireview describes methodologic approaches that have been explored for WGS-based epidemiologic analysis and considers the challenges and pitfalls of data interpretation. Personal collection of relevant publications. When applying WGS to study the molecular epidemiology of bacterial pathogens, genomic variability between strains is translated into measures of distance by determining single nucleotide polymorphisms in core genome alignments or by indexing allelic variation in hundreds to thousands of core genes, assigning types to unique allelic profiles. Interpreting isolate relatedness from these distances is highly organism specific, and attempts to establish species-specific cutoffs are unlikely to be generally applicable. In cases where single nucleotide polymorphism or core gene typing do not provide the resolution necessary for accurate assessment of the epidemiology of bacterial pathogens, inclusion of accessory gene or plasmid sequences may provide the additional required discrimination. As with all epidemiologic analysis, realizing the full potential of the revolutionary advances in WGS-based approaches requires understanding and dealing with issues related to the fundamental steps of data generation and interpretation. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  3. Complete nucleotide sequence, genome organization, and biological properties of human immunodeficiency virus type 1 in vivo: evidence for limited defectiveness and complementation.

    PubMed Central

    Li, Y; Hui, H; Burgess, C J; Price, R W; Sharp, P M; Hahn, B H; Shaw, G M

    1992-01-01

    Previous studies of the genetic and biologic characteristics of human immunodeficiency virus type 1 (HIV-1) have by necessity used tissue culture-derived virus. We recently reported the molecular cloning of four full-length HIV-1 genomes directly from uncultured human brain tissue (Y. Li, J. C. Kappes, J. A. Conway, R. W. Price, G. M. Shaw, and B. H. Hahn, J. Virol. 65:3973-3985, 1991). In this report, we describe the biologic properties of these four clones and the complete nucleotide sequences and genome organization of two of them. Clones HIV-1YU-2 and HIV-1YU-10 were 9,174 and 9,176 nucleotides in length, differed by 0.26% in nucleotide sequence, and except for a frameshift mutation in the pol gene in HIV-1YU-10, contained open reading frames corresponding to 5'-gag-pol-vif-vpr-tat-rev-vpu-env-nef-3' flanked by long terminal repeats. HIV-1YU-2 was fully replication competent, while HIV-1YU-10 and two other clones, HIV-1YU-21 and HIV-1YU-32, were defective. All three defective clones, however, when transfected into Cos-1 cells in any pairwise combination, yielded virions that were replication competent and transmissible by cell-free passage. The cellular host range of HIV-1YU-2 was strictly limited to primary T lymphocytes and monocyte-macrophages, a property conferred by its external envelope glycoprotein. Phylogenetic analyses of HIV-1YU-2 gene sequences revealed this virus to be a member of the North American/European HIV-1 subgroup, with specific similarity to other monocyte-tropic viruses in its V3 envelope amino acid sequence. These results indicate that HIV-1 infection of brain is characterized by the persistence of mixtures of fully competent, minimally defective, and more substantially altered viral forms and that complementation among them is readily attainable. In addition, the limited degree of genotypic heterogeneity observed among HIV-1YU and other brain-derived viruses and their preferential tropism for monocyte-macrophages suggest that viral

  4. Complete mitochondrial genome sequences of Brassica rapa (Chinese cabbage and mizuna), and intraspecific differentiation of cytoplasm in B. rapa and Brassica juncea.

    PubMed

    Hatono, Saki; Nishimura, Kaori; Murakami, Yoko; Tsujimura, Mai; Yamagishi, Hiroshi

    2017-09-01

    The complete sequence of the mitochondrial genome was determined for two cultivars of Brassica rapa . After determining the sequence of a Chinese cabbage variety, 'Oushou hakusai', the sequence of a mizuna variety, 'Chusei shiroguki sensuji kyomizuna', was mapped against the sequence of Chinese cabbage. The precise sequences where the two varieties demonstrated variation were ascertained by direct sequencing. It was found that the mitochondrial genomes of the two varieties are identical over 219,775 bp, with a single nucleotide polymorphism (SNP) between the genomes. Because B. rapa is the maternal species of an amphidiploid crop species, Brassica juncea , the distribution of the SNP was observed both in B. rapa and B. juncea . While the mizuna type SNP was restricted mainly to cultivars of mizuna (japonica group) in B. rapa , the mizuna type was widely distributed in B. juncea . The finding that the two Brassica species have these SNP types in common suggests that the nucleotide substitution occurred in wild B. rapa before both mitotypes were domesticated. It was further inferred that the interspecific hybridization between B. rapa and B. nigra took place twice and resulted in the two mitotypes of cultivated B. juncea .

  5. The annotation-enriched non-redundant patent sequence databases.

    PubMed

    Li, Weizhong; Kondratowicz, Bartosz; McWilliam, Hamish; Nauche, Stephane; Lopez, Rodrigo

    2013-01-01

    The EMBL-European Bioinformatics Institute (EMBL-EBI) offers public access to patent sequence data, providing a valuable service to the intellectual property and scientific communities. The non-redundant (NR) patent sequence databases comprise two-level nucleotide and protein sequence clusters (NRNL1, NRNL2, NRPL1 and NRPL2) based on sequence identity (level-1) and patent family (level-2). Annotation from the source entries in these databases is merged and enhanced with additional information from the patent literature and biological context. Corrections in patent publication numbers, kind-codes and patent equivalents significantly improve the data quality. Data are available through various user interfaces including web browser, downloads via FTP, SRS, Dbfetch and EBI-Search. Sequence similarity/homology searches against the databases are available using BLAST, FASTA and PSI-Search. In this article, we describe the data collection and annotation and also outline major changes and improvements introduced since 2009. Apart from data growth, these changes include additional annotation for singleton clusters, the identifier versioning for tracking entry change and the entry mappings between the two-level databases. Database URL: http://www.ebi.ac.uk/patentdata/nr/

  6. The Annotation-enriched non-redundant patent sequence databases

    PubMed Central

    Li, Weizhong; Kondratowicz, Bartosz; McWilliam, Hamish; Nauche, Stephane; Lopez, Rodrigo

    2013-01-01

    The EMBL-European Bioinformatics Institute (EMBL-EBI) offers public access to patent sequence data, providing a valuable service to the intellectual property and scientific communities. The non-redundant (NR) patent sequence databases comprise two-level nucleotide and protein sequence clusters (NRNL1, NRNL2, NRPL1 and NRPL2) based on sequence identity (level-1) and patent family (level-2). Annotation from the source entries in these databases is merged and enhanced with additional information from the patent literature and biological context. Corrections in patent publication numbers, kind-codes and patent equivalents significantly improve the data quality. Data are available through various user interfaces including web browser, downloads via FTP, SRS, Dbfetch and EBI-Search. Sequence similarity/homology searches against the databases are available using BLAST, FASTA and PSI-Search. In this article, we describe the data collection and annotation and also outline major changes and improvements introduced since 2009. Apart from data growth, these changes include additional annotation for singleton clusters, the identifier versioning for tracking entry change and the entry mappings between the two-level databases. Database URL: http://www.ebi.ac.uk/patentdata/nr/ PMID:23396323

  7. Distant neighbor base sequence context effects in human nucleotide excision repair of a benzo[a]pyrene-derived DNA lesion

    PubMed Central

    Cai, Yuqin; Kropachev, Konstantin; Xu, Rong; Tang, Yijin; Kolbanovskii, Marina; Kolbanovskii, Alexander; Amin, Shantu; Patel, Dinshaw J.; Broyde, Suse; Geacintov, Nicholas E.

    2010-01-01

    Summary The effects of non-nearest base sequences, beyond the nucleotides flanking a DNA lesion on either side, on nucleotide excision repair (NER) in extracts from human cells were investigated. We constructed two duplexes containing the same minor groove-aligned 10S (+)-trans-anti-B[a]P-N2-dG (G*) DNA adduct, derived from the environmental carcinogen benzo[a]pyrene (B[a]P): 5′-C-C-A-T-C-G*-C-T-A-C-C-3′ (CG*C-I), and 5′-C-A-C3-A4-C5-G*-C-A-C-A-C-3′ (CG*C-II). We utilized gel electrophoresis to compare the extent of DNA bending, and molecular dynamics (MD) simulations to analyze the structural characteristics of these two DNA duplexes. The NER efficiencies are 1.6 ± 0.2 times greater in the case of the CG*C-II than the CG*C-I sequence context in 135-mer duplexes. Gel electrophoresis and self-ligation circularization experiments revealed that the CG*C-II duplex is more bent than the CG*C-I duplex, while MD simulations showed that the unique -C3-A4-C5- segment in the CG*C-II duplex plays a key role. The presence of a minor groove-positioned guanine amino group, namely, the Watson-Crick partner to C3, acts as a wedge; facilitated by a highly deformable local -C3-A4- base step, this amino group allows the B[a]P ring system to produce a more enlarged minor groove in CG*C-II than in CG*C-I, as well as a local untwisting and enlarged and flexible Roll only in the CG*C-II sequence. These structural properties fit well with our prior findings that in the case of the family of minor groove 10S (+)-trans-anti-B[a]P-N2-dG lesions, flexible bends and enlarged minor groove widths (Cai et al. (2009) J. Mol. Biol., 385: 30–44) constitute NER recognition signals, and extend our understanding of sequence context effects on NER to the neighbors that are distant to the lesion. PMID:20399214

  8. Genetic analysis of Fasciola isolates from cattle in Korea based on second internal transcribed spacer (ITS-2) sequence of nuclear ribosomal DNA.

    PubMed

    Choe, Se-Eun; Nguyen, Thuy Thi-Dieu; Kang, Tae-Gyu; Kweon, Chang-Hee; Kang, Seung-Won

    2011-09-01

    Nuclear ribosomal DNA sequence of the second internal transcribed spacer (ITS-2) has been used efficiently to identify the liver fluke species collected from different hosts and various geographic regions. ITS-2 sequences of 19 Fasciola samples collected from Korean native cattle were determined and compared. Sequence comparison including ITS-2 sequences of isolates from this study and reference sequences from Fasciola hepatica and Fasciola gigantica and intermediate Fasciola in Genbank revealed seven identical variable sites of investigated isolates. Among 19 samples, 12 individuals had ITS-2 sequences completely identical to that of pure F. hepatica, five possessed the sequences identical to F. gigantica type, whereas two shared the sequence of both F. hepatica and F. gigantica. No variations in length and nucleotide composition of ITS-2 sequence were observed within isolates that belonged to F. hepatica or F. gigantica. At the position of 218, five Fasciola containing a single-base substitution (C>T) formed a distinct branch inside the F. gigantica-type group which was similar to those of Asian-origin isolates. The phylogenetic tree of the Fasciola spp. based on complete ITS-2 sequences from this study and other representative isolates in different locations clearly showed that pure F. hepatica, F. gigantica type and intermediate Fasciola were observed. The result also provided additional genetic evidence for the existence of three forms of Fasciola isolated from native cattle in Korea by genetic approach using ITS-2 sequence.

  9. Detection of de novo single nucleotide variants in offspring of atomic-bomb survivors close to the hypocenter by whole-genome sequencing.

    PubMed

    Horai, Makiko; Mishima, Hiroyuki; Hayashida, Chisa; Kinoshita, Akira; Nakane, Yoshibumi; Matsuo, Tatsuki; Tsuruda, Kazuto; Yanagihara, Katsunori; Sato, Shinya; Imanishi, Daisuke; Imaizumi, Yoshitaka; Hata, Tomoko; Miyazaki, Yasushi; Yoshiura, Koh-Ichiro

    2018-03-01

    Ionizing radiation released by the atomic bombs at Hiroshima and Nagasaki, Japan, in 1945 caused many long-term illnesses, including increased risks of malignancies such as leukemia and solid tumours. Radiation has demonstrated genetic effects in animal models, leading to concerns over the potential hereditary effects of atomic bomb-related radiation. However, no direct analyses of whole DNA have yet been reported. We therefore investigated de novo variants in offspring of atomic-bomb survivors by whole-genome sequencing (WGS). We collected peripheral blood from three trios, each comprising a father (atomic-bomb survivor with acute radiation symptoms), a non-exposed mother, and their child, none of whom had any past history of haematological disorders. One trio of non-exposed individuals was included as a control. DNA was extracted and the numbers of de novo single nucleotide variants in the children were counted by WGS with sequencing confirmation. Gross structural variants were also analysed. Written informed consent was obtained from all participants prior to the study. There were 62, 81, and 42 de novo single nucleotide variants in the children of atomic-bomb survivors, compared with 48 in the control trio. There were no gross structural variants in any trio. These findings are in accord with previously published results that also showed no significant genetic effects of atomic-bomb radiation on second-generation survivors.

  10. RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity.

    PubMed

    Ishikawa, Sohta A; Inagaki, Yuji; Hashimoto, Tetsuo

    2012-01-01

    In phylogenetic analyses of nucleotide sequences, 'homogeneous' substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, 'RY-coding' and 'non-homogeneous' models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity.

  11. Complete genome sequence of Fer-de-Lance Virus reveals a novel gene in reptilian Paramyxoviruses

    USGS Publications Warehouse

    Kurath, G.; Batts, W.N.; Ahne, W.; Winton, J.R.

    2004-01-01

    The complete RNA genome sequence of the archetype reptilian paramyxovirus, Fer-de-Lance virus (FDLV), has been determined. The genome is 15,378 nucleotides in length and consists of seven nonoverlapping genes in the order 3??? N-U-P-M-F-HN-L 5???, coding for the nucleocapsid, unknown, phospho-, matrix, fusion, hemagglutinin-neuraminidase, and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and tri-nucleotide intergenic regions similar to those of other Paramyxoviridae. The FDLV P gene expression strategy is like that of rubulaviruses, which express the accessory V protein from the primary transcript and edit a portion of the mRNA to encode P and I proteins. There is also an overlapping open reading frame potentially encoding a small basic protein in the P gene. The gene designated U (unknown), encodes a deduced protein of 19.4 kDa that has no counterpart in other paramyxoviruses and has no similarity with sequences in the National Center for Biotechnology Information database. Active transcription of the U gene in infected cells was demonstrated by Northern blot analysis, and bicistronic N-U mRNA was also evident. The genomes of two other snake paramyxovirus genotypes were also found to have U genes, with 11 to 16% nucleotide divergence from the FDLV U gene. Pairwise comparisons of amino acid identities and phylogenetic analyses of all deduced FDLV protein sequences with homologous sequences from other Paramyxoviridae indicate that FDLV represents a new genus within the subfamily Paramyxovirinae. We suggest the name Ferlavirus for the new genus, with FDLV as the type species.

  12. Molecular cloning of two human liver 3 alpha-hydroxysteroid/dihydrodiol dehydrogenase isoenzymes that are identical with chlordecone reductase and bile-acid binder.

    PubMed Central

    Deyashiki, Y; Ogasawara, A; Nakayama, T; Nakanishi, M; Miyabe, Y; Sato, K; Hara, A

    1994-01-01

    Human liver contains two dihydrodiol dehydrogenases, DD2 and DD4, associated with 3 alpha-hydroxysteroid dehydrogenase activity. We have raised polyclonal antibodies that cross-reacted with the two enzymes and isolated two 1.2 kb cDNA clones (C9 and C11) for the two enzymes from a human liver cDNA library using the antibodies. The clones of C9 and C11 contained coding sequences corresponding to 306 and 321 amino acid residues respectively, but lacked 5'-coding regions around the initiation codon. Sequence analyses of several peptides obtained by enzymic and chemical cleavages of the two purified enzymes verified that the C9 and C11 clones encoded DD2 and DD4 respectively, and further indicated that the sequence of DD2 had at least additional 16 residues upward from the N-terminal sequence deduced from the cDNA. There was 82% amino acid sequence identity between the two enzymes, indicating that the enzymes are genetic isoenzymes. A computer-based comparison of the cDNAs of the isoenzymes with the DNA sequence database revealed that the nucleotide and amino acid sequences of DD2 and DD4 are virtually identical with those of human bile-acid binder and human chlordecone reductase cDNAs respectively. Images Figure 1 PMID:8172617

  13. Intramolecular interactions in aminoacyl nucleotides: Implications regarding the origin of genetic coding and protein synthesis

    NASA Technical Reports Server (NTRS)

    Lacey, J. C., Jr.; Mullins, D. W., Jr.; Watkins, C. L.; Hall, L. M.

    1986-01-01

    Cellular organisms store information as sequences of nucleotides in double stranded DNA. This information is useless unless it can be converted into the active molecular species, protein. This is done in contemporary creatures first by transcription of one strand to give a complementary strand of mRNA. The sequence of nucleotides is then translated into a specific sequence of amino acids in a protein. Translation is made possible by a genetic coding system in which a sequence of three nucleotides codes for a specific amino acid. The origin and evolution of any chemical system can be understood through elucidation of the properties of the chemical entities which make up the system. There is an underlying logic to the coding system revealed by a correlation of the hydrophobicities of amino acids and their anticodonic nucleotides (i.e., the complement of the codon). Its importance lies in the fact that every amino acid going into protein synthesis must first be activated. This is universally accomplished with ATP. Past studies have concentrated on the chemistry of the adenylates, but more recently we have found, through the use of NMR, that we can observe intramolecular interactions even at low concentrations, between amino acid side chains and nucleotide base rings in these adenylates. The use of this type of compound thus affords a novel way of elucidating the manner in which amino acids and nucleotides interact with each other. In aqueous solution, when a hydrophobic amino acid is attached to the most hydrophobic nucleotide, AMP, a hydrophobic interaction takes place between the amino acid side chain and the adenine ring. The studies to be reported concern these hydrophobic interactions.

  14. Implications of the dependence of the elastic properties of DNA on nucleotide sequence.

    PubMed

    Olson, Wilma K; Swigon, David; Coleman, Bernard D

    2004-07-15

    Recent advances in structural biochemistry have provided evidence that not only the geometric properties but also the elastic moduli of duplex DNA are strongly dependent on nucleotide sequence in a way that is not accounted for by classical rod models of the Kirchhoff type. A theory of sequence-dependent DNA elasticity is employed here to calculate the dependence of the equilibrium configurations of circular DNA on the binding of ligands that can induce changes in intrinsic twist at a single base-pair step. Calculations are presented of the influence on configurations of the assumed values and distribution along the DNA of intrinsic roll and twist and a modulus coupling roll to twist. Among the results obtained are the following. For minicircles formed from intrinsically straight DNA, the distribution of roll-twist coupling strongly affects the dependence of the total elastic energy Psi on the amount alpha of imposed untwisting, and that dependence can be far from quadratic. (In fact, for a periodic distribution of roll-twist coupling with a period equal to the intrinsic helical repeat length, Psi can be essentially independent of alpha for -90 degrees < alpha <90 degrees.) When the minicircle is homogeneous and without roll-twist coupling, but with uniform positive intrinsic roll, the point at which Psi attains its minimum value shifts towards negative values of alpha. It is remarked that there are cases in which one can relate graphs of Psi versus alpha to the 'effective values' of bending and twisting moduli and helical repeat length obtained from measurements of equilibrium distributions of topoisomers and probabilities of ring closure. For a minicircle formed from DNA that has an 'S' shape when stress-free, the graphs of Psi versus alpha have maxima at alpha = 0. As the binding of a twisting agent to such a minicircle results in a net decrease in Psi, the affinity of the twisting agent for binding to the minicircle is greater than its affinity for binding to

  15. A Chromosome 7 Pericentric Inversion Defined at Single-Nucleotide Resolution Using Diagnostic Whole Genome Sequencing in a Patient with Hand-Foot-Genital Syndrome.

    PubMed

    Watson, Christopher M; Crinnion, Laura A; Harrison, Sally M; Lascelles, Carolina; Antanaviciute, Agne; Carr, Ian M; Bonthron, David T; Sheridan, Eamonn

    2016-01-01

    Next generation sequencing methodologies are facilitating the rapid characterisation of novel structural variants at nucleotide resolution. These approaches are particularly applicable to variants initially identified using alternative molecular methods. We report a child born with bilateral postaxial syndactyly of the feet and bilateral fifth finger clinodactyly. This was presumed to be an autosomal recessive syndrome, due to the family history of consanguinity. Karyotype analysis revealed a homozygous pericentric inversion of chromosome 7 (46,XX,inv(7)(p15q21)x2) which was confirmed to be heterozygous in both unaffected parents. Since the resolution of the karyotype was insufficient to identify any putatively causative gene, we undertook medium-coverage whole genome sequencing using paired-end reads, in order to elucidate the molecular breakpoints. In a two-step analysis, we first narrowed down the region by identifying discordant read-pairs, and then determined the precise molecular breakpoint by analysing the mapping locations of "soft-clipped" breakpoint-spanning reads. PCR and Sanger sequencing confirmed the identified breakpoints, both of which were located in intergenic regions. Significantly, the 7p15 breakpoint was located 523 kb upstream of HOXA13, the locus for hand-foot-genital syndrome. By inference from studies of HOXA locus control in the mouse, we suggest that the inversion has delocalised a HOXA13 enhancer to produce the phenotype observed in our patient. This study demonstrates how modern genetic diagnostic approach can characterise structural variants at nucleotide resolution and provide potential insights into functional regulation.

  16. Use of signal sequences as an in situ removable sequence element to stimulate protein synthesis in cell-free extracts

    PubMed Central

    Ahn, Jin-Ho; Hwang, Mi-Yeon; Lee, Kyung-Ho; Choi, Cha-Yong; Kim, Dong-Myung

    2007-01-01

    This study developed a method to boost the expression of recombinant proteins in a cell-free protein synthesis system without leaving additional amino acid residues. It was found that the nucleotide sequences of the signal peptides serve as an efficient downstream box to stimulate protein synthesis when they were fused upstream of the target genes. The extent of stimulation was critically affected by the identity of the second codons of the signal sequences. Moreover, the yield of the synthesized protein was enhanced by as much as 10 times in the presence of an optimal second codon. The signal peptides were in situ cleaved and the target proteins were produced in their native sizes by carrying out the cell-free synthesis reactions in the presence of Triton X-100, most likely through the activation of signal peptidase in the S30 extract. The amplification of the template DNA and the addition of the signal sequences were accomplished by PCR. Hence, elevated levels of recombinant proteins were generated within several hours. PMID:17185295

  17. Biological nanopore MspA for DNA sequencing

    NASA Astrophysics Data System (ADS)

    Manrao, Elizabeth A.

    Unlocking the information hidden in the human genome provides insight into the inner workings of complex biological systems and can be used to greatly improve health-care. In order to allow for widespread sequencing, new technologies are required that provide fast and inexpensive readings of DNA. Nanopore sequencing is a third generation DNA sequencing technology that is currently being developed to fulfill this need. In nanopore sequencing, a voltage is applied across a small pore in an electrolyte solution and the resulting ionic current is recorded. When DNA passes through the channel, the ionic current is partially blocked. If the DNA bases uniquely modulate the ionic current flowing through the channel, the time trace of the current can be related to the sequence of DNA passing through the pore. There are two main challenges to realizing nanopore sequencing: identifying a pore with sensitivity to single nucleotides and controlling the translocation of DNA through the pore so that the small single nucleotide current signatures are distinguishable from background noise. In this dissertation, I explore the use of Mycobacterium smegmatis porin A (MspA) for nanopore sequencing. In order to determine MspA's sensitivity to single nucleotides, DNA strands of various compositions are held in the pore as the resulting ionic current is measured. DNA is immobilized in MspA by attaching it to a large molecule which acts as an anchor. This technique confirms the single nucleotide resolution of the pore and additionally shows that MspA is sensitive to epigenetic modifications and single nucleotide polymorphisms. The forces from the electric field within MspA, the effective charge of nucleotides, and elasticity of DNA are estimated using a Freely Jointed Chain model of single stranded DNA. These results offer insight into the interactions of DNA within the pore. With the nucleotide sensitivity of MspA confirmed, a method is introduced to controllably pass DNA through the pore

  18. Insertion Sequences

    PubMed Central

    Mahillon, Jacques; Chandler, Michael

    1998-01-01

    Insertion sequences (ISs) constitute an important component of most bacterial genomes. Over 500 individual ISs have been described in the literature to date, and many more are being discovered in the ongoing prokaryotic and eukaryotic genome-sequencing projects. The last 10 years have also seen some striking advances in our understanding of the transposition process itself. Not least of these has been the development of various in vitro transposition systems for both prokaryotic and eukaryotic elements and, for several of these, a detailed understanding of the transposition process at the chemical level. This review presents a general overview of the organization and function of insertion sequences of eubacterial, archaebacterial, and eukaryotic origins with particular emphasis on bacterial elements and on different aspects of the transposition mechanism. It also attempts to provide a framework for classification of these elements by assigning them to various families or groups. A total of 443 members of the collection have been grouped in 17 families based on combinations of the following criteria: (i) similarities in genetic organization (arrangement of open reading frames); (ii) marked identities or similarities in the enzymes which mediate the transposition reactions, the recombinases/transposases (Tpases); (iii) similar features of their ends (terminal IRs); and (iv) fate of the nucleotide sequence of their target sites (generation of a direct target duplication of determined length). A brief description of the mechanism(s) involved in the mobility of individual ISs in each family and of the structure-function relationships of the individual Tpases is included where available. PMID:9729608

  19. Predicting protein-binding regions in RNA using nucleotide profiles and compositions.

    PubMed

    Choi, Daesik; Park, Byungkyu; Chae, Hanju; Lee, Wook; Han, Kyungsook

    2017-03-14

    Motivated by the increased amount of data on protein-RNA interactions and the availability of complete genome sequences of several organisms, many computational methods have been proposed to predict binding sites in protein-RNA interactions. However, most computational methods are limited to finding RNA-binding sites in proteins instead of protein-binding sites in RNAs. Predicting protein-binding sites in RNA is more challenging than predicting RNA-binding sites in proteins. Recent computational methods for finding protein-binding sites in RNAs have several drawbacks for practical use. We developed a new support vector machine (SVM) model for predicting protein-binding regions in mRNA sequences. The model uses sequence profiles constructed from log-odds scores of mono- and di-nucleotides and nucleotide compositions. The model was evaluated by standard 10-fold cross validation, leave-one-protein-out (LOPO) cross validation and independent testing. Since actual mRNA sequences have more non-binding regions than protein-binding regions, we tested the model on several datasets with different ratios of protein-binding regions to non-binding regions. The best performance of the model was obtained in a balanced dataset of positive and negative instances. 10-fold cross validation with a balanced dataset achieved a sensitivity of 91.6%, a specificity of 92.4%, an accuracy of 92.0%, a positive predictive value (PPV) of 91.7%, a negative predictive value (NPV) of 92.3% and a Matthews correlation coefficient (MCC) of 0.840. LOPO cross validation showed a lower performance than the 10-fold cross validation, but the performance remains high (87.6% accuracy and 0.752 MCC). In testing the model on independent datasets, it achieved an accuracy of 82.2% and an MCC of 0.656. Testing of our model and other state-of-the-art methods on a same dataset showed that our model is better than the others. Sequence profiles of log-odds scores of mono- and di-nucleotides were much more powerful

  20. Nucleotide sequence and structural organization of the human vasopressin pituitary receptor (V3) gene.

    PubMed

    René, P; Lenne, F; Ventura, M A; Bertagna, X; de Keyzer, Y

    2000-01-04

    In the pituitary, vasopressin triggers ACTH release through a specific receptor subtype, termed V3 or V1b. We cloned the V3 cDNA and showed that its expression was almost exclusive to pituitary corticotrophs and some corticotroph tumors. To study the determinants of this tissue specificity, we have now cloned the gene for the human (h) V3 receptor and characterized its structure. It is composed of two exons, spanning 10kb, with the coding region interrupted between transmembrane domains 6 and 7. We established that the transcription initiation site is located 498 nucleotides upstream of the initiator codon and showed that two polyadenylation sites may be used, while the most frequent is the most downstream. Sequence analysis of the promoter region showed no TATA box but identified consensus binding motifs for Sp1, CREB, and half sites of the estrogen receptor binding site. However comparison with another corticotroph-specific gene, proopiomelanocortin, did not identify common regulatory elements in the two promoters except for a short GC-rich region. Unexpectedly, hV3 gene analysis revealed that a formerly cloned 'artifactual' hV3 cDNA indeed corresponded to a spliced antisense transcript, overlapping the 5' part of the coding sequence in exon 1 and the promoter region. This transcript, hV3rev, was detected in normal pituitary and in many corticotroph tumors expressing hV3 sense mRNA and may therefore play a role in hV3 gene expression.

  1. Bellerophon: A program to detect chimeric sequences in multiple sequence alignments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip

    2003-12-23

    Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments.

  2. Sequence of a cDNA encoding pancreatic preprosomatostatin-22.

    PubMed Central

    Magazin, M; Minth, C D; Funckes, C L; Deschenes, R; Tavianini, M A; Dixon, J E

    1982-01-01

    We report the nucleotide sequence of a precursor to somatostatin that upon proteolytic processing may give rise to a hormone of 22 amino acids. The nucleotide sequence of a cDNA from the channel catfish (Ictalurus punctatus) encodes a precursor to somatostatin that is 105 amino acids (Mr, 11,500). The cDNA coding for somatostatin-22 consists of 36 nucleotides in the 5' untranslated region, 315 nucleotides that code for the precursor to somatostatin-22, 269 nucleotides at the 3' untranslated region, and a variable length of poly(A). The putative preprohormone contains a sequence of hydrophobic amino acids at the amino terminus that has the properties of a "signal" peptide. A connecting sequence of approximately 57 amino acids is followed by a single Arg-Arg sequence, which immediately precedes the hormone. Somatostatin-22 is homologous to somatostatin-14 in 7 of the 14 amino acids, including the Phe-Trp-Lys sequence. Hybridization selection of mRNA, followed by its translation in a wheat germ cell-free system, resulted in the synthesis of a single polypeptide having a molecular weight of approximately 10,000 as estimated on Na-DodSO4/polyacrylamide gels. Images PMID:6127673

  3. The delta-subunit of murine guanine nucleotide exchange factor eIF-2B. Characterization of cDNAs predicts isoforms differing at the amino-terminal end.

    PubMed

    Henderson, R A; Krissansen, G W; Yong, R Y; Leung, E; Watson, J D; Dholakia, J N

    1994-12-02

    Protein synthesis in mammalian cells is regulated at the level of the guanine nucleotide exchange factor, eIF-2B, which catalyzes the exchange of eukaryotic initiation factor 2-bound GDP for GTP. We have isolated and sequenced cDNA clones encoding the delta-subunit of murine eIF-2B. The cDNA sequence encodes a polypeptide of 544 amino acids with molecular mass of 60 kDa. Antibodies against a synthetic polypeptide of 30 amino acids deduced from the cDNA sequence specifically react with the delta-subunit of mammalian eIF-2B. The cDNA-derived amino acid sequence shows significant homology with the yeast translational regulator Gcd2, supporting the hypothesis that Gcd2 may be the yeast homolog of the delta-subunit of mammalian eIF-2B. Primer extension studies and anchor polymerase chain reaction analysis were performed to determine the 5'-end of the transcript for the delta-subunit of eIF-2B. Results of these experiments demonstrate two different mRNAs for the delta-subunit of eIF-2B in murine cells. The isolation and characterization of two different full-length cDNAs also predicts the presence of two alternate forms of the delta-subunit of eIF-2B in murine cells. These differ at their amino-terminal end but have identical nucleotide sequences coding for amino acids 31-544.

  4. Diagnosis of local hepatic tuberculosis through next-generation sequencing: Smarter, faster and better.

    PubMed

    Ai, Jing-Wen; Li, Yang; Cheng, Qi; Cui, Peng; Wu, Hong-Long; Xu, Bin; Zhang, Wen-Hong

    2018-06-01

    A 45-year-old man who complained of continuous fever and multiple hepatic masses was admitted to our hospital. Repeated MRI manifestations were similar while each radiological report suggested contradictory diagnosis pointing to infections or malignances respectively. Pathologic examination of the liver tissue showed no direct evidence of either infections or tumor. We performed next-generation sequencing on the liver tissue and peripheral blood to further investigate the possible etiology. High throughput sequencing was performed on the liver lesion tissues using BGISEQ-100 platform, and data was mapped to the Microbial Genome Databases after filtering low quality data and human reads. We identified a total of 299 sequencing reads of Mycobacterium tuberculosis (M. tuberculosis) complex sequences from the liver tissue, including 8, 229 of 4,424,435 of the M. tuberculosis nucleotide sequences, and Mycobacterium africanum, Mycobacterium bovis, and Mycobacterium canettii were also detected due to the 99.9% identical rate among these strains. No specific Mycobacterial tuberculosis nucleotide sequence was detected in the sample of peripheral blood. Patient's symptom quickly recovered after anti-tuberculosis treatment and repeated Ziehl-Neelsen staining of the liver tissue finally identified small numbers of positive bacillus. The diagnosis of this patient was difficult to establish before the next-generation sequencing because of contradictive radiological results and negative pathological findings. More sensitive diagnostic methods are urgently needed. This is the first case reporting hepatic tuberculosis confirmed by the next-generation sequencing, and marks the promising potential of the application of the next-generation sequencing in the diagnosis of hepatic lesions with unknown etiology. Copyright © 2018 Elsevier Masson SAS. All rights reserved.

  5. Distant sequences determine 5′ end formation of cox3 transcripts in Arabidopsis thaliana ecotype C24

    PubMed Central

    Forner, Joachim; Weber, Bärbel; Wiethölter, Caterina; Meyer, Rhonda C.; Binder, Stefan

    2005-01-01

    The genomic environments and the transcripts of the mitochondrial cox3 gene are investigated in three Arabidopsis thaliana ecotypes. While the proximate 5′ sequences up to nucleotide position −584, the coding regions and the 3′ flanking regions are identical in Columbia (Col), C24 and Landsberg erecta (Ler), genomic variation is detected in regions further upstream. In the mitochondrial DNA of Col, a 1790 bp fragment flanked by a nonanucleotide direct repeat is present beyond position −584 with respect to the ATG. While in Ler only part of this insertion is conserved, this sequence is completely absent in C24, except for a single copy of the nonanucleotide direct repeat. Northern hybridization reveals identical major transcripts in the three ecotypes, but identifies an additional abundant 60 nt larger mRNA species in C24. The extremities of the most abundant mRNA species are identical in the three ecotypes. In C24, an extra major 5′ end is abundant. This terminus and the other major 5′ ends are located in identical sequence regions. Inspection of Atcox3 transcripts in C24/Col hybrids revealed a female inheritance of the mRNA species with the extra 5′ terminus. Thus, a mitochondrially encoded factor determines the generation of an extra 5′ mRNA end. PMID:16107557

  6. Comparative In silico Study of Sex-Determining Region Y (SRY) Protein Sequences Involved in Sex-Determining.

    PubMed

    Vakili Azghandi, Masoume; Nasiri, Mohammadreza; Shamsa, Ali; Jalali, Mohsen; Shariati, Mohammad Mahdi

    2016-04-01

    The SRY gene (SRY) provides instructions for making a transcription factor called the sex-determining region Y protein. The sex-determining region Y protein causes a fetus to develop as a male. In this study, SRY of 15 spices included of human, chimpanzee, dog, pig, rat, cattle, buffalo, goat, sheep, horse, zebra, frog, urial, dolphin and killer whale were used for determine of bioinformatic differences. Nucleotide sequences of SRY were retrieved from the NCBI databank. Bioinformatic analysis of SRY is done by CLC Main Workbench version 5.5 and ClustalW (http:/www.ebi.ac.uk/clustalw/) and MEGA6 softwares. The multiple sequence alignment results indicated that SRY protein sequences from Orcinus orca (killer whale) and Tursiopsaduncus (dolphin) have least genetic distance of 0.33 in these 15 species and are 99.67% identical at the amino acid level. Homosapiens and Pantroglodytes (chimpanzee) have the next lowest genetic distance of 1.35 and are 98.65% identical at the amino acid level. These findings indicate that the SRY proteins are conserved in the 15 species, and their evolutionary relationships are similar.

  7. Complete Genome Sequence of Zucchini Yellow Mosaic Virus Strain Kurdistan, Iran.

    PubMed

    Maghamnia, Hamid Reza; Hajizadeh, Mohammad; Azizi, Abdolbaset

    2018-03-01

    The complete genome sequence of Zucchini yellow mosaic virus strain Kurdistan (ZYMV-Kurdistan) infecting squash from Iran was determined from 13 overlapping fragments. Excluding the poly (A) tail, ZYMV-Kurdistan genome consisted of 9593 nucleotides (nt), with 138 and 211 nt at the 5' and 3' non-translated regions, respectively. It contained two open-reading frames (ORFs), the large ORF encoding a polyprotein of 3080 amino acids (aa) and the small overlapping ORF encoding a P3N-PIPO protein of 74 aa. This isolate had six unique aa differences compared to other ZYMV isolates and shared 79.6-98.8% identities with other ZYMV genome sequences at the nt level and 90.1-99% identities at the aa level. A phylogenetic tree of ZYMV complete genomic sequences showed that Iranian and Central European isolates are closely related and form a phylogenetically homogenous group. All values in the ratio of substitution rates at non-synonymous and synonymous sites ( d N / d S ) were below 1, suggestive of strong negative selection forces during ZYMV protein history. This is the first report of complete genome sequence information of the most prevalent virus in the west of Iran. This study helps our understanding of the genetic diversity of ZYMV isolates infecting cucurbit plants in Iran, virus evolution and epidemiology and can assist in designing better diagnostic tools.

  8. SAM: String-based sequence search algorithm for mitochondrial DNA database queries

    PubMed Central

    Röck, Alexander; Irwin, Jodi; Dür, Arne; Parsons, Thomas; Parson, Walther

    2011-01-01

    The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org). PMID:21056022

  9. A single origin and moderate bottleneck during domestication of soybean (Glycine max): implications from microsatellites and nucleotide sequences

    PubMed Central

    Guo, Juan; Wang, Yunsheng; Song, Chi; Zhou, Jianfeng; Qiu, Lijuan; Huang, Hongwen; Wang, Ying

    2010-01-01

    Background and Aims It is essential to illuminate the evolutionary history of crop domestication in order to understand further the origin and development of modern cultivation and agronomy; however, despite being one of the most important crops, the domestication origin and bottleneck of soybean (Glycine max) are poorly understood. In the present study, microsatellites and nucleotide sequences were employed to elucidate the domestication genetics of soybean. Methods The genomes of 79 landrace soybeans (endemic cultivated soybeans) and 231 wild soybeans (G. soja) that represented the species-wide distribution of wild soybean in East Asia were scanned with 56 microsatellites to identify the genetic structure and domestication origin of soybean. To understand better the domestication bottleneck, four nucleotide sequences were selected to simulate the domestication bottleneck. Key Results Model-based analysis revealed that most of the landrace genotypes were assigned to the inferred wild soybean cluster of south China, South Korea and Japan. Phylogeny for wild and landrace soybeans showed that all landrace soybeans formed a single cluster supporting a monophyletic origin of all the cultivars. The populations of the nearest branches which were basal to the cultivar lineage were wild soybeans from south China. The coalescent simulation detected a bottleneck severity of K′ = 2 during soybean domestication, which could be explained by a foundation population of 6000 individuals if domestication duration lasted 3000 years. Conclusions As a result of integrating geographic distribution with microsatellite genotype assignment and phylogeny between landrace and wild soybeans, a single origin of soybean in south China is proposed. The coalescent simulation revealed a moderate genetic bottleneck with an effective wild soybean population used for domestication estimated to be ≈2 % of the total number of ancestral wild soybeans. Wild soybeans in Asia, especially in south

  10. Molecular characterization and phylogenetic analysis of Explanatum explanatum in India based on nucleotide sequences of ribosomal ITS2 and the mitochondrial gene nad1.

    PubMed

    Hayashi, Kei; Mohanta, Uday K; Ohari, Yuma; Neeraja, Tambireddy; Singh, T Shantikumar; Sugiyama, Hiromu; Itagaki, Tadashi

    2016-12-01

    The aim of this study was to analyze the phylogenetic relationship between Explanatum explanatum populations in India and other countries of the Indian subcontinent. Seventy liver amphistomes collected from four localities in India were identified as E. explanatum based on the nucleotide sequences of ribosomal ITS2. The flukes were then analyzed phylogenetically based on the nucleotide sequence of the mitochondrial gene nad1 in comparison with flukes from Bangladesh and Nepal. In the resulting phylogenetic tree, the nad1 haplotypes from India were divided into four clades, and the flukes showing the haplotypes of clades A and C were predominant in India. The haplotypes of the clades A and C have also been detected in Bangladesh and Nepal, and therefore, it seems they occur commonly throughout the Indian subcontinent. The results of AMOVA suggested that gene flow was likely to occur between E. explanatum populations in these countries. These countries are geographically close and have been historically and culturally connected to each other, and therefore, the movements of host ruminants among these countries might have been involved in the migration of the flukes and their gene flow.

  11. Population structure of pigs determined by single nucleotide polymorphisms observed in assembled expressed sequence tags.

    PubMed

    Matsumoto, Toshimi; Okumura, Naohiko; Uenishi, Hirohide; Hayashi, Takeshi; Hamasima, Noriyuki; Awata, Takashi

    2012-01-01

    We have collected more than 190000 porcine expressed sequence tags (ESTs) from full-length complementary DNA (cDNA) libraries and identified more than 2800 single nucleotide polymorphisms (SNPs). In this study, we tentatively chose 222 SNPs observed in assembled ESTs to study pigs of different breeds; 104 were selected by comparing the cDNA sequences of a Meishan pig and samples of three-way cross pigs (Landrace, Large White, and Duroc: LWD), and 118 were selected from LWD samples. To evaluate the genetic variation between the chosen SNPs from pig breeds, we determined the genotypes for 192 pig samples (11 pig groups) from our DNA reference panel with matrix-assisted laser desorption ionization time-of-flight mass spectrometry. Of the 222 reference SNPs, 186 were successfully genotyped. A neighbor-joining tree showed that the pig groups were classified into two large clusters, namely, Euro-American and East Asian pig populations. F-statistics and the analysis of molecular variance of Euro-American pig groups revealed that approximately 25% of the genetic variations occurred because of intergroup differences. As the F(IS) values were less than the F(ST) values(,) the clustering, based on the Bayesian inference, implied that there was strong genetic differentiation among pig groups and less divergence within the groups in our samples. © 2011 The Authors. Animal Science Journal © 2011 Japanese Society of Animal Science.

  12. Nucleotide sequences of the tet(M) genes from the American and Dutch type tetracycline resistance plasmids of Neisseria gonorrhoeae.

    PubMed

    Gascoyne-Binzi, D M; Heritage, J; Hawkey, P M

    1993-11-01

    High-level tetracycline-resistant Neisseria gonorrhoeae (TRNG) has been associated with the presence of a plasmid approximately 25.2 MDa in size which carries a Tet M tetracycline resistance determinant. Two different plasmid types, American and Dutch, have previously been described, based on the restriction endonuclease digestion pattern. In this study, the tet(M) genes from the two plasmid types have been amplified by the polymerase chain reaction (PCR) and then sequenced. The gene sequences from the two plasmids shared 96.8% identity, and showed similarities with different segments of the tet(M) gene sequences from Tn1545, Tn916 and Ureaplasma urealyticum. The data suggest that it is highly likely that the Tet M determinant found in the American type plasmid has a different origin from that present in the Dutch plasmid.

  13. A new begomovirus associated with alpha- and betasatellite molecules isolated from Vernonia cinerea in China.

    PubMed

    Zulfiqar, Awais; Zhang, Jie; Cui, Xiaofeng; Qian, Yajuan; Zhou, Xueping; Xie, Yan

    2012-01-01

    A begomovirus disease complex associated with Vernonia cinerea showing yellow vein symptoms was studied. The full-length genomic DNA was comprised of 2739 nucleotides (nt) and contained the typical genome structure of begomoviruses. Comparison analysis showed that it shared the highest (78.9%) nucleotide sequence identity with recently characterized Vernonia yellow vein virus (VeYVV) from India. For associated satellites, betasatellite showed the highest nucleotide sequence identity (52.1%) with Vernonia yellow vein virus betasatellite (VeYVVB) and alphasatellite shared the highest sequence identity (70.7%) with Gossypium mustelinium symptomless alphasatellite (GMusSLA). It is a member of a distinct species with cognate alpha- and betasatellites for which the name Vernonia yellow vein Fujian virus (VeYVFjV) is proposed.

  14. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  15. Functional analysis of regulatory single-nucleotide polymorphisms.

    PubMed

    Pampín, Sandra; Rodríguez-Rey, José C

    2007-04-01

    The identification of regulatory polymorphisms has become a key problem in human genetics. In the past few years there has been a conceptual change in the way in which regulatory single-nucleotide polymorphisms are studied. We revise the new approaches and discuss how gene expression studies can contribute to a better knowledge of the genetics of common diseases. New techniques for the association of single-nucleotide polymorphisms with changes in gene expression have been recently developed. This, together with a more comprehensive use of the old in-vitro methods, has produced a great amount of genetic information. When added to current databases, it will help to design better tools for the detection of regulatory single-nucleotide polymorphisms. The identification of functional regulatory single-nucleotide polymorphisms cannot be done by the simple inspection of DNA sequence. In-vivo techniques, based on primer-extension, and the more recently developed 'haploChIP' allow the association of gene variants to changes in gene expression. Gene expression analysis by conventional in-vitro techniques is the only way to identify the functional consequences of regulatory single-nucleotide polymorphisms. The amount of information produced in the last few years will help to refine the tools for the future analysis of regulatory gene variants.

  16. Phosphate-Modified Nucleotides for Monitoring Enzyme Activity.

    PubMed

    Ermert, Susanne; Marx, Andreas; Hacker, Stephan M

    2017-04-01

    Nucleotides modified at the terminal phosphate position have been proven to be interesting entities to study the activity of a variety of different protein classes. In this chapter, we present various types of modifications that were attached as reporter molecules to the phosphate chain of nucleotides and briefly describe the chemical reactions that are frequently used to synthesize them. Furthermore, we discuss a variety of applications of these molecules. Kinase activity, for instance, was studied by transfer of a phosphate modified with a reporter group to the target proteins. This allows not only studying the activity of kinases, but also identifying their target proteins. Moreover, kinases can also be directly labeled with a reporter at a conserved lysine using acyl-phosphate probes. Another important application for phosphate-modified nucleotides is the study of RNA and DNA polymerases. In this context, single-molecule sequencing is made possible using detection in zero-mode waveguides, nanopores or by a Förster resonance energy transfer (FRET)-based mechanism between the polymerase and a fluorophore-labeled nucleotide. Additionally, fluorogenic nucleotides that utilize an intramolecular interaction between a fluorophore and the nucleobase or an intramolecular FRET effect have been successfully developed to study a variety of different enzymes. Finally, also some novel techniques applying electron paramagnetic resonance (EPR)-based detection of nucleotide cleavage or the detection of the cleavage of fluorophosphates are discussed. Taken together, nucleotides modified at the terminal phosphate position have been applied to study the activity of a large diversity of proteins and are valuable tools to enhance the knowledge of biological systems.

  17. Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library.

    PubMed

    Sánchez, Cecilia Castaño; Smith, Timothy P L; Wiedmann, Ralph T; Vallejo, Roger L; Salem, Mohamed; Yao, Jianbo; Rexroad, Caird E

    2009-11-25

    To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs) have been used for single nucleotide polymorphism (SNP) discovery in salmonids. In those strategies, the salmonid semi-tetraploid genomes often led to assemblies of paralogous sequences and therefore resulted in a high rate of false positive SNP identification. Sequencing genomic DNA using primers identified from ESTs proved to be an effective but time consuming methodology of SNP identification in rainbow trout, therefore not suitable for high throughput SNP discovery. In this study, we employed a high-throughput strategy that used pyrosequencing technology to generate data from a reduced representation library constructed with genomic DNA pooled from 96 unrelated rainbow trout that represent the National Center for Cool and Cold Water Aquaculture (NCCCWA) broodstock population. The reduced representation library consisted of 440 bp fragments resulting from complete digestion with the restriction enzyme HaeIII; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends). Three independent data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 independent contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map. Approximately 48% (183) of the putative SNPs were validated; 167 markers were successfully incorporated into the rainbow trout linkage map. In addition, 2% of the sequences from the

  18. A Sabin 3-Derived Poliovirus Recombinant Contained a Sequence Homologous with Indigenous Human Enterovirus Species C in the Viral Polymerase Coding Region†

    PubMed Central

    Arita, Minetaro; Zhu, Shuang-Li; Yoshida, Hiromu; Yoneyama, Tetsuo; Miyamura, Tatsuo; Shimizu, Hiroyuki

    2005-01-01

    Outbreaks of poliomyelitis caused by circulating vaccine-derived polioviruses (cVDPVs) have been reported in areas where indigenous wild polioviruses (PVs) were eliminated by vaccination. Most of these cVDPVs contained unidentified sequences in the nonstructural protein coding region which were considered to be derived from human enterovirus species C (HEV-C) by recombination. In this study, we report isolation of a Sabin 3-derived PV recombinant (Cambodia-02) from an acute flaccid paralysis (AFP) case in Cambodia in 2002. We attempted to identify the putative recombination counterpart of Cambodia-02 by sequence analysis of nonpolio enterovirus isolates from AFP cases in Cambodia from 1999 to 2003. Based on the previously estimated evolution rates of PVs, the recombination event resulting in Cambodia-02 was estimated to have occurred within 6 months after the administration of oral PV vaccine (99.3% nucleotide identity in VP1 region). The 2BC and the 3Dpol coding regions of Cambodia-02 were grouped into the genetic cluster of indigenous coxsackie A virus type 17 (CAV17) (the highest [87.1%] nucleotide identity) and the cluster of indigenous CAV13-CAV18 (the highest [94.9%] nucleotide identity) by the phylogenic analysis of the HEV-C isolates in 2002, respectively. CAV13-CAV18 and CAV17 were the dominant HEV-C serotypes in 2002 but not in 2001 and in 2003. We found a putative recombination between CAV13-CAV18 and CAV17 in the 3CDpro coding region of a CAV17 isolate. These results suggested that a part of the 3Dpol coding region of PV3(Cambodia-02) was derived from a HEV-C strain genetically related to indigenous CAV13-CAV18 strains in 2002 in Cambodia. PMID:16188967

  19. Complete genome sequences of avian paramyxovirus type 8 strains goose/Delaware/1053/76 and pintail/Wakuya/20/78

    PubMed Central

    Paldurai, Anandan; Subbiah, Madhuri; Kumar, Sachin; Collins, Peter L.; Samal, Siba K.

    2009-01-01

    Complete consensus genome sequences were determined for avian paramyxovirus type 8 (APMV-8) strains goose/Delaware/1053/76 (prototype strain) and pintail/Wakuya/20/78. The genome of each strain is 15,342 nucleotides (nt) long, which follows the “rule of six”. The genome consists of six genes in the order of 3′-N-P/V/W-M-F-HN-L-5′. The genes are flanked on either side by conserved transcription start and stop signals, and have intergenic regions ranging from 1 to 30 nt. The genome contains a 55 nt leader region at the 3′-end and a 171 nt trailer region at the 5′-end. Comparison of sequences of strains Delaware and Wakuya showed nucleotide identity of 96.8% at the genome level and amino acid identities of 99.3%, 96.5%, 98.6%, 99.4%, 98.6% and 99.1% for the predicted N, P, M, F, HN and L proteins, respectively. Both strains grew in embryonated chicken eggs and in primary chicken embryo kidney cells, and 293T cells. Both strains contained only a single basic residue at the cleavage activation site of the F protein and their efficiency of replication in vitro depended on and was augmented by, the presence of exogenous protease in most cell lines. Sequence alignment and phylogenic analysis of the predicted amino acid sequence of APMV-8 strain Delaware proteins with the cognate proteins of other available APMV serotypes showed that APMV-8 is more closely related to APMV-2 and -6 than to APMV-1, -3 and -4. PMID:19341613

  20. The cDNA sequence of a neutral horseradish peroxidase.

    PubMed

    Bartonek-Roxå, E; Eriksson, H; Mattiasson, B

    1991-02-16

    A cDNA clone encoding a horseradish (Armoracia rusticana) peroxidase has been isolated and characterized. The cDNA contains 1378 nucleotides excluding the poly(A) tail and the deduced protein contains 327 amino acids which includes a 28 amino acid leader sequence. The predicted amino acid sequence is nine amino acids shorter than the major isoenzyme belonging to the horseradish peroxidase C group (HRP-C) and the sequence shows 53.7% identity with this isoenzyme. The described clone encodes nine cysteines of which eight correspond well with the cysteines found in HRP-C. Five potential N-glycosylation sites with the general sequence Asn-X-Thr/Ser are present in the deduced sequence. Compared to the earlier described HRP-C this is three glycosylation sites less. The shorter sequence and fewer N-glycosylation sites give the native isoenzyme a molecular weight of several thousands less than the horseradish peroxidase C isoenzymes. Comparison with the net charge value of HRP-C indicates that the described cDNA clone encodes a peroxidase which has either the same or a slightly less basic pI value, depending on whether the encoded protein is N-terminally blocked or not. This excludes the possibility that HRP-n could belong to either the HRP-A, -D or -E groups. The low sequence identity (53.7%) with HRP-C indicates that the described clone does not belong to the HRP-C isoenzyme group and comparison of the total amino acid composition with the HRP-B group does not place the described clone within this isoenzyme group. Our conclusion is that the described cDNA clone encodes a neutral horseradish peroxidase which belongs to a new, not earlier described, horseradish peroxidase group.

  1. Detection with synthetic oligonucleotide probes of nucleotide sequence variations in the genes encoding enterotoxins of Escherichia coli.

    PubMed Central

    Nishibuchi, M; Murakami, A; Arita, M; Jikuya, H; Takano, J; Honda, T; Miwatani, T

    1989-01-01

    We examined variations in the genes encoding heat-stable enterotoxin (ST) and heat-labile enterotoxin (LT) in 88 strains of Escherichia coli isolated from individuals with traveler's diarrhea to find suitable sequences for use as oligonucleotide probes. Four oligonucleotide probes of the gene encoding ST of human origin (STIb or STh), one oligonucleotide probe of the gene encoding ST of porcine origin (STIa or STp), and three oligonucleotide probes of the gene encoding LT of human origin (LTIh) were used in DNA colony hybridization tests. In 15 of 22 strains possessing the STh gene and 28 of 42 strains producing LT, the sequences of all regions tested were identical to the published sequences. One region in the STh gene examined with a 18-mer probe was relatively well conserved and was shown to be closely associated with the enterotoxicity of the E. coli strains in suckling mice. This oligonucleotide, however, hybridized with strains of Vibrio cholerae O1, V. parahaemolyticus, and Yersinia enterocolitica that gave negative results in the suckling mouse assay. PMID:2685027

  2. Biomolecule Sequencer: Next-Generation DNA Sequencing Technology for In-Flight Environmental Monitoring, Research, and Beyond

    NASA Technical Reports Server (NTRS)

    Smith, David J.; Burton, Aaron; Castro-Wallace, Sarah; John, Kristen; Stahl, Sarah E.; Dworkin, Jason Peter; Lupisella, Mark L.

    2016-01-01

    On the International Space Station (ISS), technologies capable of rapid microbial identification and disease diagnostics are not currently available. NASA still relies upon sample return for comprehensive, molecular-based sample characterization. Next-generation DNA sequencing is a powerful approach for identifying microorganisms in air, water, and surfaces onboard spacecraft. The Biomolecule Sequencer payload, manifested to SpaceX-9 and scheduled on the Increment 4748 research plan (June 2016), will assess the functionality of a commercially-available next-generation DNA sequencer in the microgravity environment of ISS. The MinION device from Oxford Nanopore Technologies (Oxford, UK) measures picoamp changes in electrical current dependent on nucleotide sequences of the DNA strand migrating through nanopores in the system. The hardware is exceptionally small (9.5 x 3.2 x 1.6 cm), lightweight (120 grams), and powered only by a USB connection. For the ISS technology demonstration, the Biomolecule Sequencer will be powered by a Microsoft Surface Pro3. Ground-prepared samples containing lambda bacteriophage, Escherichia coli, and mouse genomic DNA, will be launched and stored frozen on the ISS until experiment initiation. Immediately prior to sequencing, a crew member will collect and thaw frozen DNA samples, connect the sequencer to the Surface Pro3, inject thawed samples into a MinION flow cell, and initiate sequencing. At the completion of the sequencing run, data will be downlinked for ground analysis. Identical, synchronous ground controls will be used for data comparisons to determine sequencer functionality, run-time sequence, current dynamics, and overall accuracy. We will present our latest results from the ISS flight experiment the first time DNA has ever been sequenced in space and discuss the many potential applications of the Biomolecule Sequencer for environmental monitoring, medical diagnostics, higher fidelity and more adaptable Space Biology Human

  3. First report of Beet western yellows virus infecting Epiphyllum spp

    USDA-ARS?s Scientific Manuscript database

    Beet western yellow virus (BWYV) was identified from an orchid cactus (Epiphyllum spp.) hybrid without obvious symptoms by high-throughput sequencing. The nearly complete genomic sequence of 5,458 nucleotides of the virus was determined. The isolate has the highest nucleotide sequence identity (93%)...

  4. Discovery of a novel retrovirus sequence in an Australian native rodent (Melomys burtoni): a putative link between gibbon ape leukemia virus and koala retrovirus.

    PubMed

    Simmons, Greg; Clarke, Daniel; McKee, Jeff; Young, Paul; Meers, Joanne

    2014-01-01

    Gibbon ape leukaemia virus (GALV) and koala retrovirus (KoRV) share a remarkably close sequence identity despite the fact that they occur in distantly related mammals on different continents. It has previously been suggested that infection of their respective hosts may have occurred as a result of a species jump from another, as yet unidentified vertebrate host. To investigate possible sources of these retroviruses in the Australian context, DNA samples were obtained from 42 vertebrate species and screened using PCR in order to detect proviral sequences closely related to KoRV and GALV. Four proviral partial sequences totalling 2880 bases which share a strong similarity with KoRV and GALV were detected in DNA from a native Australian rodent, the grassland melomys, Melomys burtoni. We have designated this novel gammaretrovirus Melomys burtoni retrovirus (MbRV). The concatenated nucleotide sequence of MbRV shares 93% identity with the corresponding sequence from GALV-SEATO and 83% identity with KoRV. The geographic ranges of the grassland melomys and of the koala partially overlap. Thus a species jump by MbRV from melomys to koalas is conceivable. However the genus Melomys does not occur in mainland South East Asia and so it appears most likely that another as yet unidentified host was the source of GALV.

  5. Discovery and mapping of a new expressed sequence tag-single nucleotide polymorphism and simple sequence repeat panel for large-scale genetic studies and breeding of Theobroma cacao L.

    PubMed Central

    Allegre, Mathilde; Argout, Xavier; Boccara, Michel; Fouet, Olivier; Roguet, Yolande; Bérard, Aurélie; Thévenin, Jean Marc; Chauveau, Aurélie; Rivallan, Ronan; Clement, Didier; Courtois, Brigitte; Gramacho, Karina; Boland-Augé, Anne; Tahi, Mathias; Umaharan, Pathmanathan; Brunel, Dominique; Lanaud, Claire

    2012-01-01

    Theobroma cacao is an economically important tree of several tropical countries. Its genetic improvement is essential to provide protection against major diseases and improve chocolate quality. We discovered and mapped new expressed sequence tag-single nucleotide polymorphism (EST-SNP) and simple sequence repeat (SSR) markers and constructed a high-density genetic map. By screening 149 650 ESTs, 5246 SNPs were detected in silico, of which 1536 corresponded to genes with a putative function, while 851 had a clear polymorphic pattern across a collection of genetic resources. In addition, 409 new SSR markers were detected on the Criollo genome. Lastly, 681 new EST-SNPs and 163 new SSRs were added to the pre-existing 418 co-dominant markers to construct a large consensus genetic map. This high-density map and the set of new genetic markers identified in this study are a milestone in cocoa genomics and for marker-assisted breeding. The data are available at http://tropgenedb.cirad.fr. PMID:22210604

  6. Updating Our View of Organelle Genome Nucleotide Landscape

    PubMed Central

    Smith, David Roy

    2012-01-01

    Organelle genomes show remarkable variation in architecture and coding content, yet their nucleotide composition is relatively unvarying across the eukaryotic domain, with most having a high adenine and thymine (AT) content. Recent studies, however, have uncovered guanine and cytosine (GC)-rich mitochondrial and plastid genomes. These sequences come from a small but eclectic list of species, including certain green plants and animals. Here, I review GC-rich organelle DNAs and the insights they have provided into the evolution of nucleotide landscape. I emphasize that GC-biased mitochondrial and plastid DNAs are more widespread than once thought, sometimes occurring together in the same species, and suggest that the forces biasing their nucleotide content can differ both among and within lineages, and may be associated with specific genome architectural features and life history traits. PMID:22973299

  7. AFLP fragment isolation technique as a method to produce random sequences for single nucleotide polymorphism discovery in the green turtle, Chelonia mydas.

    PubMed

    Roden, Suzanne E; Dutton, Peter H; Morin, Phillip A

    2009-01-01

    The green sea turtle, Chelonia mydas, was used as a case study for single nucleotide polymorphism (SNP) discovery in a species that has little genetic sequence information available. As green turtles have a complex population structure, additional nuclear markers other than microsatellites could add to our understanding of their complex life history. Amplified fragment length polymorphism technique was used to generate sets of random fragments of genomic DNA, which were then electrophoretically separated with precast gels, stained with SYBR green, excised, and directly sequenced. It was possible to perform this method without the use of polyacrylamide gels, radioactive or fluorescent labeled primers, or hybridization methods, reducing the time, expense, and safety hazards of SNP discovery. Within 13 loci, 2547 base pairs were screened, resulting in the discovery of 35 SNPs. Using this method, it was possible to yield a sufficient number of loci to screen for SNP markers without the availability of prior sequence information.

  8. Evaluation of microbial community in hydrothermal field by direct DNA sequencing

    NASA Astrophysics Data System (ADS)

    Kawarabayasi, Y.; Maruyama, A.

    2002-12-01

    Many extremophiles have been discovered from terrestrial and marine hydrothermal fields. Some thermophiles can grow beyond 90°C in culture, while direct microscopic analysis occasionally indicates that microbes may survive in much hotter hydrothermal fluids. However, it is very difficult to isolate and cultivate such microbes from the environments, i.e., over 99% of total microbes remains undiscovered. Based on experiences of entire microbial genome analysis (Y.K.) and microbial community analysis (A.M.), we started to find out unique microbes/genes in hydrothermal fields through direct sequencing of environmental DNA fragments. At first, shotgun plasmid libraries were directly constructed with the DNA molecules prepared from mixed microbes collected by an in situ filtration system from low-temperature fluids at RM24 in the Southern East Pacific Rise (S-EPR). A gene amplification (PCR) technique was not used for preventing mutation in the process. The nucleotide sequences of 285 clones indicated that no sequence had identical data in public databases. Among 27 clones determined entire sequences, no ORF was identified on 14 clones like intron in Eukaryote. On four clones, tetra-nucleotide-long multiple tandem repetitive sequences were identified. This type of sequence was identified in some familiar disease in human. The result indicates that living/dead materials with eukaryotic features may exist in this low temperature field. Secondly, shotgun plasmid libraries were constructed from the environmental DNA prepared from Beppu hot springs. In randomly-selected 143 clones used for sequencing, no known sequence was identified. Unlike the clones in S-EPR library, clear ORFs were identified on all nine clones determined the entire sequence. It was found that one clone, H4052, contained the complete Aspartyl-tRNA synthetase. Phylogenetic analysis using amino acid sequences of this gene indicated that this gene was separated from other Euryarchaea before the

  9. Chromosome specific repetitive DNA sequences

    DOEpatents

    Moyzis, Robert K.; Meyne, Julianne

    1991-01-01

    A method is provided for determining specific nucleotide sequences useful in forming a probe which can identify specific chromosomes, preferably through in situ hybridization within the cell itself. In one embodiment, chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family me This invention is the result of a contract with the Department of Energy (Contract No. W-7405-ENG-36).

  10. A Laboratory Exercise for Genotyping Two Human Single Nucleotide Polymorphisms

    ERIC Educational Resources Information Center

    Fernando, James; Carlson, Bradley; LeBard, Timothy; McCarthy, Michael; Umali, Finianne; Ashton, Bryce; Rose, Ferrill F., Jr.

    2016-01-01

    The dramatic decrease in the cost of sequencing a human genome is leading to an era in which a wide range of students will benefit from having an understanding of human genetic variation. Since over 90% of sequence variation between humans is in the form of single nucleotide polymorphisms (SNPs), a laboratory exercise has been devised in order to…

  11. Novel methodologies for spectral classification of exon and intron sequences

    NASA Astrophysics Data System (ADS)

    Kwan, Hon Keung; Kwan, Benjamin Y. M.; Kwan, Jennifer Y. Y.

    2012-12-01

    Digital processing of a nucleotide sequence requires it to be mapped to a numerical sequence in which the choice of nucleotide to numeric mapping affects how well its biological properties can be preserved and reflected from nucleotide domain to numerical domain. Digital spectral analysis of nucleotide sequences unfolds a period-3 power spectral value which is more prominent in an exon sequence as compared to that of an intron sequence. The success of a period-3 based exon and intron classification depends on the choice of a threshold value. The main purposes of this article are to introduce novel codes for 1-sequence numerical representations for spectral analysis and compare them to existing codes to determine appropriate representation, and to introduce novel thresholding methods for more accurate period-3 based exon and intron classification of an unknown sequence. The main findings of this study are summarized as follows: Among sixteen 1-sequence numerical representations, the K-Quaternary Code I offers an attractive performance. A windowed 1-sequence numerical representation (with window length of 9, 15, and 24 bases) offers a possible speed gain over non-windowed 4-sequence Voss representation which increases as sequence length increases. A winner threshold value (chosen from the best among two defined threshold values and one other threshold value) offers a top precision for classifying an unknown sequence of specified fixed lengths. An interpolated winner threshold value applicable to an unknown and arbitrary length sequence can be estimated from the winner threshold values of fixed length sequences with a comparable performance. In general, precision increases as sequence length increases. The study contributes an effective spectral analysis of nucleotide sequences to better reveal embedded properties, and has potential applications in improved genome annotation.

  12. UCbase 2.0: ultraconserved sequences database (2014 update)

    PubMed Central

    Lomonaco, Vincenzo; Martoglia, Riccardo; Mandreoli, Federica; Anderlucci, Laura; Emmett, Warren; Bicciato, Silvio; Taccioli, Cristian

    2014-01-01

    UCbase 2.0 (http://ucbase.unimore.it) is an update, extension and evolution of UCbase, a Web tool dedicated to the analysis of ultraconserved sequences (UCRs). UCRs are 481 sequences >200 bases sharing 100% identity among human, mouse and rat genomes. They are frequently located in genomic regions known to be involved in cancer or differentially expressed in human leukemias and carcinomas. UCbase 2.0 is a platform-independent Web resource that includes the updated version of the human genome annotation (hg19), information linking disorders to chromosomal coordinates based on the Systematized Nomenclature of Medicine classification, a query tool to search for Single Nucleotide Polymorphisms (SNPs) and a new text box to directly interrogate the database using a MySQL interface. To facilitate the interactive visual interpretation of UCR chromosomal positioning, UCbase 2.0 now includes a graph visualization interface directly linked to UCSC genome browser. Database URL: http://ucbase.unimore.it PMID:24951797

  13. UCbase 2.0: ultraconserved sequences database (2014 update).

    PubMed

    Lomonaco, Vincenzo; Martoglia, Riccardo; Mandreoli, Federica; Anderlucci, Laura; Emmett, Warren; Bicciato, Silvio; Taccioli, Cristian

    2014-01-01

    UCbase 2.0 (http://ucbase.unimore.it) is an update, extension and evolution of UCbase, a Web tool dedicated to the analysis of ultraconserved sequences (UCRs). UCRs are 481 sequences >200 bases sharing 100% identity among human, mouse and rat genomes. They are frequently located in genomic regions known to be involved in cancer or differentially expressed in human leukemias and carcinomas. UCbase 2.0 is a platform-independent Web resource that includes the updated version of the human genome annotation (hg19), information linking disorders to chromosomal coordinates based on the Systematized Nomenclature of Medicine classification, a query tool to search for Single Nucleotide Polymorphisms (SNPs) and a new text box to directly interrogate the database using a MySQL interface. To facilitate the interactive visual interpretation of UCR chromosomal positioning, UCbase 2.0 now includes a graph visualization interface directly linked to UCSC genome browser. Database URL: http://ucbase.unimore.it. © The Author(s) 2014. Published by Oxford University Press.

  14. Prediction of Nucleotide Binding Peptides Using Star Graph Topological Indices.

    PubMed

    Liu, Yong; Munteanu, Cristian R; Fernández Blanco, Enrique; Tan, Zhiliang; Santos Del Riego, Antonino; Pazos, Alejandro

    2015-11-01

    The nucleotide binding proteins are involved in many important cellular processes, such as transmission of genetic information or energy transfer and storage. Therefore, the screening of new peptides for this biological function is an important research topic. The current study proposes a mixed methodology to obtain the first classification model that is able to predict new nucleotide binding peptides, using only the amino acid sequence. Thus, the methodology uses a Star graph molecular descriptor of the peptide sequences and the Machine Learning technique for the best classifier. The best model represents a Random Forest classifier based on two features of the embedded and non-embedded graphs. The performance of the model is excellent, considering similar models in the field, with an Area Under the Receiver Operating Characteristic Curve (AUROC) value of 0.938 and true positive rate (TPR) of 0.886 (test subset). The prediction of new nucleotide binding peptides with this model could be useful for drug target studies in drug development. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Chromobacterium spp. harbour Ambler class A β-lactamases showing high identity with KPC.

    PubMed

    Gudeta, Dereje Dadi; Bortolaia, Valeria; Jayol, Aurélie; Poirel, Laurent; Nordmann, Patrice; Guardabassi, Luca

    2016-06-01

    The origin of KPC is unknown. The aim of this study was to detect progenitors of KPC in silico and to functionally verify their β-lactam hydrolysis activity. The sequence of KPC-2 was used to mine the NCBI protein sequence database. The best non-KPC hits were analysed by amino acid (aa) alignment and phylogenetic tree construction. Genes encoding KPC-2 homologues were expressed in Escherichia coli. The carbapenemase activities of the recombinant strains were characterized by the CarbaNP test and UV spectrophotometry and MICs of selected β-lactams were determined. Genes encoding the closest KPC-2 homologues were identified on the chromosome of Chromobacterium piscinae strain ND17 (CRP-1, 76% aa identity), Chromobacterium sp. C-61 (CRS-1, 70% aa identity) and Chromobacterium haemolyticum DSM19808 (CRH-1, 69% aa identity). All three Chromobacterium β-lactamases were phylogenetically more related to KPC than to other Ambler class A β-lactamases. The 27 bp region preceding the start codon of blaCRP-1 displayed high nucleotide identity to the corresponding region upstream from blaKPC (74%). Heterologous expression of blaCRP-1 and to a lesser extent of blaCRH-1 in E. coli significantly increased the MICs of meropenem and most cephalosporins. The CarbaNP test was positive for both recombinant strains, but spectrophotometric analysis confirmed higher carbapenemase activity for CRP-1-producing clones. The recovery of three class A β-lactamases with up to 76% aa identity to KPC from distinct Chromobacterium species is highly indicative of the role played by this genus in the evolution of KPC. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Nucleotide sequence analysis of a DNA region involved in capsular polysaccharide biosynthesis reveals the molecular basis of the nontypeability of two Actinobacillus pleuropneumoniae isolates.

    PubMed

    Ito, Hiroya; Ogawa, Torata; Fukamizu, Dai; Morinaga, Yuiko; Kusumoto, Masahiro

    2016-11-01

    The aim of our study was to reveal the molecular basis of the serologic nontypeability of 2 Actinobacillus pleuropneumoniae field isolates. Nine field strains of A. pleuropneumoniae, the causative agent of porcine pleuropneumonia, were isolated from pigs raised on the same farm and sent to our diagnostic laboratory for serotyping. Seven of the 9 strains were identified as serovar 15 strains by immunodiffusion tests. However, 2 strains, designated FH24-2 and FH24-5, could not be serotyped with antiserum prepared against serovars 1-15. Strain FH24-5 showed positive results in 2 serovar 15-specific PCR tests, whereas strain FH24-2 was only positive in 1 of the 2 PCR tests. The nucleotide sequence analysis of gene clusters involved in capsular polysaccharide biosynthesis of the 2 nontypeable strains revealed that both had been rendered nontypeable by the action of ISApl1, a transposable element of A. pleuropneumoniae belonging to the IS30 family. The results showed that ISApl1 of A. pleuropneumoniae can interfere with both the serologic and molecular typing methods, and that nucleotide sequence analysis across the capsular gene clusters is the best means of determining the cause of serologic nontypeability in A. pleuropneumoniae. © 2016 The Author(s).

  17. HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data

    PubMed Central

    Hochreiter, Sepp

    2013-01-01

    Identity by descent (IBD) can be reliably detected for long shared DNA segments, which are found in related individuals. However, many studies contain cohorts of unrelated individuals that share only short IBD segments. New sequencing technologies facilitate identification of short IBD segments through rare variants, which convey more information on IBD than common variants. Current IBD detection methods, however, are not designed to use rare variants for the detection of short IBD segments. Short IBD segments reveal genetic structures at high resolution. Therefore, they can help to improve imputation and phasing, to increase genotyping accuracy for low-coverage sequencing and to increase the power of association studies. Since short IBD segments are further assumed to be old, they can shed light on the evolutionary history of humans. We propose HapFABIA, a computational method that applies biclustering to identify very short IBD segments characterized by rare variants. HapFABIA is designed to detect short IBD segments in genotype data that were obtained from next-generation sequencing, but can also be applied to DNA microarray data. Especially in next-generation sequencing data, HapFABIA exploits rare variants for IBD detection. HapFABIA significantly outperformed competing algorithms at detecting short IBD segments on artificial and simulated data with rare variants. HapFABIA identified 160 588 different short IBD segments characterized by rare variants with a median length of 23 kb (mean 24 kb) in data for chromosome 1 of the 1000 Genomes Project. These short IBD segments contain 752 000 single nucleotide variants (SNVs), which account for 39% of the rare variants and 23.5% of all variants. The vast majority—152 000 IBD segments—are shared by Africans, while only 19 000 and 11 000 are shared by Europeans and Asians, respectively. IBD segments that match the Denisova or the Neandertal genome are found significantly more often in Asians and Europeans but also

  18. Pan-genome multilocus sequence typing and outbreak-specific reference-based single nucleotide polymorphism analysis to resolve two concurrent Staphylococcus aureus outbreaks in neonatal services.

    PubMed

    Roisin, S; Gaudin, C; De Mendonça, R; Bellon, J; Van Vaerenbergh, K; De Bruyne, K; Byl, B; Pouseele, H; Denis, O; Supply, P

    2016-06-01

    We used a two-step whole genome sequencing analysis for resolving two concurrent outbreaks in two neonatal services in Belgium, caused by exfoliative toxin A-encoding-gene-positive (eta+) methicillin-susceptible Staphylococcus aureus with an otherwise sporadic spa-type t209 (ST-109). Outbreak A involved 19 neonates and one healthcare worker in a Brussels hospital from May 2011 to October 2013. After a first episode interrupted by decolonization procedures applied over 7 months, the outbreak resumed concomitantly with the onset of outbreak B in a hospital in Asse, comprising 11 neonates and one healthcare worker from mid-2012 to January 2013. Pan-genome multilocus sequence typing, defined on the basis of 42 core and accessory reference genomes, and single-nucleotide polymorphisms mapped on an outbreak-specific de novo assembly were used to compare 28 available outbreak isolates and 19 eta+/spa-type t209 isolates identified by routine or nationwide surveillance. Pan-genome multilocus sequence typing showed that the outbreaks were caused by independent clones not closely related to any of the surveillance isolates. Isolates from only ten cases with overlapping stays in outbreak A, including four pairs of twins, showed no or only a single nucleotide polymorphism variation, indicating limited sequential transmission. Detection of larger genomic variation, even from the start of the outbreak, pointed to sporadic seeding from a pre-existing exogenous source, which persisted throughout the whole course of outbreak A. Whole genome sequencing analysis can provide unique fine-tuned insights into transmission pathways of complex outbreaks even at their inception, which, with timely use, could valuably guide efforts for early source identification. Copyright © 2016 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.

  19. Methods of automatic nucleotide-sequence analysis. Multicomponent spectrophotometric analysis of mixtures of nucleic acid components by a least-squares procedure

    PubMed Central

    Lee, Sheila; McMullen, D.; Brown, G. L.; Stokes, A. R.

    1965-01-01

    1. A theoretical analysis of the errors in multicomponent spectrophotometric analysis of nucleoside mixtures, by a least-squares procedure, has been made to obtain an expression for the error coefficient, relating the error in calculated concentration to the error in extinction measurements. 2. The error coefficients, which depend only on the `library' of spectra used to fit the experimental curves, have been computed for a number of `libraries' containing the following nucleosides found in s-RNA: adenosine, guanosine, cytidine, uridine, 5-ribosyluracil, 7-methylguanosine, 6-dimethylaminopurine riboside, 6-methylaminopurine riboside and thymine riboside. 3. The error coefficients have been used to determine the best conditions for maximum accuracy in the determination of the compositions of nucleoside mixtures. 4. Experimental determinations of the compositions of nucleoside mixtures have been made and the errors found to be consistent with those predicted by the theoretical analysis. 5. It has been demonstrated that, with certain precautions, the multicomponent spectrophotometric method described is suitable as a basis for automatic nucleotide-composition analysis of oligonucleotides containing nine nucleotides. Used in conjunction with continuous chromatography and flow chemical techniques, this method can be applied to the study of the sequence of s-RNA. PMID:14346087

  20. Complete nucleotide sequences of okra isolates of Cotton leaf curl Gezira virus and their associated DNA-beta from Niger.

    PubMed

    Shih, S L; Kumar, S; Tsai, W S; Lee, L M; Green, S K

    2009-01-01

    Okra (Abelmoschus esculentus) is a major crop in Niger. In the fall of 2007, okra leaf curl disease was observed in Niger and the begomovirus and DNA-beta satellite were found associated with the disease. The complete nucleotide sequences of DNA-A (FJ469626 and FJ469627) and associated DNA-beta satellites (FJ469628 and FJ469629) were determined from two samples. This is the first report of molecular characterization of okra-infecting begomovirus and their associated DNA-beta from Niger. The begomovirus and DNA-beta have been identified as Cotton leaf curl Gezira virus and Cotton leaf curl Gezira betasatellite, respectively, which are reported to also infect okra in Egypt, Mali and Sudan.

  1. Three Cases of Anaerobiospirillum succiniciproducens Bacteremia Confirmed by 16S rRNA Gene Sequencing

    PubMed Central

    Tee, Wee; Korman, Tony M.; Waters, Mary Jo; Macphee, Andrew; Jenney, Adam; Joyce, Linda; Dyall-Smith, Michael L.

    1998-01-01

    We describe three cases of Anaerobiospirillum succiniciproducens bacteremia from Australia. We believe one of these cases represents the first report of A. succiniciproducens bacteremia in a human immunodeficiency virus (HIV)-infected individual. The other two patients had an underlying disorder (one patient had bleeding esophageal varices complicating alcohol liver disease and one patient had non-Hodgkin’s lymphoma). A motile, gram-negative, spiral anaerobe was isolated by culturing blood from all patients. Electron microscopy showed a curved bacterium with bipolar tufts of flagella resembling Anaerobiospirillum spp. Sequencing of the 16S rRNA genes of the isolates revealed no close relatives (organisms likely to be in the same genus) in the sequence databases, nor were any sequence data available for A. succiniciproducens. This report presents for the first time the 16S rRNA gene sequence of the type strain of A. succiniciproducens, strain ATCC 29305. Two of the three clinical isolates have sequences identical to that of the type strain, while the sequence of the other strain differs from that of the type strain at 4 nucleotides. PMID:9574678

  2. Complete genome sequence analysis identifies a new genotype of brassica yellows virus that infects cabbage and radish in China.

    PubMed

    Zhang, Xiao-Yan; Xiang, Hai-Ying; Zhou, Cui-Ji; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui

    2014-08-01

    For brassica yellows virus (BrYV), proposed to be a member of a new polerovirus species, two clearly distinct genotypes (BrYV-A and BrYV-B) have been described. In this study, the complete nucleotide sequences of two BrYV isolates from radish and Chinese cabbage were determined. Sequence analysis suggested that these isolates represent a new genotype, referred to here as BrYV-C. The full-length sequences of the two BrYV-C isolates shared 93.4-94.8 % identity with BrYV-A and BrYV-B. Further phylogenetic analysis showed that the BrYV-C isolates formed a subgroup that was distinct from the BrYV-A and BrYV-B isolates based on all of the proteins except P5.

  3. Cloning and Sequence Analysis of Vibrio halioticoli Genes Encoding Three Types of Polyguluronate Lyase.

    PubMed

    Sugimura; Sawabe; Ezura

    2000-01-01

    The alginate lyase-coding genes of Vibrio halioticoli IAM 14596(T), which was isolated from the gut of the abalone Haliotis discus hannai, were cloned using plasmid vector pUC 18, and expressed in Escherichia coli. Three alginate lyase-positive clones, pVHB, pVHC, and pVHE, were obtained, and all clones expressed the enzyme activity specific for polyguluronate. Three genes, alyVG1, alyVG2, and alyVG3, encoding polyguluronate lyase were sequenced: alyVG1 from pVHB was composed of a 1056-bp open reading frame (ORF) encoding 352 amino acid residues; alyVG2 gene from pVHC was composed of a 993-bp ORF encoding 331 amino acid residues; and alyVG3 gene from pVHE was composed of a 705-bp ORF encoding 235 amino acid residues. Comparison of nucleotide and deduced amino acid sequences among AlyVG1, AlyVG2, and AlyVG3 revealed low homologies. The identity value between AlyVG1 and AlyVG2 was 18.7%, and that between AlyVG2 and AlyVG3 was 17.0%. A higher identity value (26.0%) was observed between AlyVG1 and AlyVG3. Sequence comparison among known polyguluronate lyases including AlyVG1, AlyVG2, and AlyVG3 also did not reveal an identical region in these sequences. However, AlyVG1 showed the highest identity value (36.2%) and the highest similarity (73.3%) to AlyA from Klebsiella pneumoniae. A consensus region comprising nine amino acid (YFKAGXYXQ) in the carboxy-terminal region previously reported by Mallisard and colleagues was observed only in AlyVG1 and AlyVG2.

  4. The nucleotide sequence of HLA-B{sup *}2704 reveals a new amino acid substitution in exon 4 which is also present in HLA-B{sup *}2706

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rudwaleit, M.; Bowness, P.; Wordsworth, P.

    1996-12-31

    The HLA-B27 subtype HLA-B{sup *}2704 is virtually absent in Caucasians but common in Orientals, where it is associated with ankylosing spondylitis. The amino acid sequence of HLA-B{sup *}2704 has been established by peptide mapping and was shown to differ by two amino acids from HLA-B{sup *}2705, HLA-B{sup *}2704 is characterized by a serine for aspartic acid substitution at position 77 and glutamic acid for valine at position 152. To date, however, no nucleotide sequence confirming these changes at the DNA level has been published. 13 refs., 2 figs.

  5. Nucleotide sequencing analysis of a LEU gene of Candida maltosa which complements leuB mutation of Escherichia coli and leu2 mutation of Saccharomyces cerevisiae.

    PubMed

    Takagi, M; Kobayashi, N; Sugimoto, M; Fujii, T; Watari, J; Yano, K

    1987-01-01

    The expression of a LEU gene from Candida maltosa (designated as C-LEU2) isolated previously (Kawamura et al. 1983) was shown to be regulated, when transferred into Saccharomyces cerevisiae, by leucine and threonine in the medium, as in the case of LEU2 gene of S. cerevisiae. The coding region together with the regulatory region was subcloned and the nucleotide sequence was determined. When the sequence of the coding region was compared with that of LEU2, the homology was 72% for base pairs and 76% for deduced amino acids. Comparison of the regulatory region of C-LEU2 with those of LEU1 and LEU2 suggested a few short consensus sequences which are involved in regulation of gene expression by leucine and threonine in the medium.

  6. PUTATIVE GENE PROMOTER SEQUENCES IN THE CHLORELLA VIRUSES

    PubMed Central

    Fitzgerald, Lisa A.; Boucher, Philip T.; Yanai-Balser, Giane; Suhre, Karsten; Graves, Michael V.; Van Etten, James L.

    2008-01-01

    Three short (7 to 9 nucleotides) highly conserved nucleotide sequences were identified in the putative promoter regions (150 bp upstream and 50 bp downstream of the ATG translation start site) of three members of the genus Chlorovirus, family Phycodnaviridae. Most of these sequences occurred in similar locations within the defined promoter regions. The sequence and location of the motifs were often conserved among homologous ORFs within the Chlorovirus family. One of these conserved sequences (AATGACA) is predominately associated with genes expressed early in virus replication. PMID:18768195

  7. The maize stripe virus major noncapsid protein messenger RNA transcripts contain heterogeneous leader sequences at their 5' termini.

    PubMed

    Huiet, L; Feldstein, P A; Tsai, J H; Falk, B W

    1993-12-01

    Primer extension analyses and a PCR-based cloning strategy were used to identify and characterize 5' nucleotide sequences on the maize stripe virus (MStV) RNA4 mRNA transcripts encoding the major noncapsid protein (NCP). Direct RNA sequence analysis by primer extension showed that the NCP mRNA transcripts had 10-15 nucleotides beyond the 5' terminus of the MStV RNA4 nucleotide sequence. MStV genomic RNAs isolated from ribonucleoprotein particles (RNPs) lacked the additional 5' nucleotides. cDNA clones representing the 5' region of the mRNA transcripts were constructed, and the nucleotide sequences of the 5' regions were determined for 16 clones. Each was found to have a distinct 10-15 nucleotide sequence immediately 5' of the MStV RNA4 sequence. Eleven of 16 clones had the correct MStV RNA4 5' nucleotide sequence, while five showed minor variations at or near the 5' most MStV RNA4 nucleotide. These characteristics show strong similarities to other viral mRNA transcripts which are synthesized by cap snatching.

  8. Diff-seq: A high throughput sequencing-based mismatch detection assay for DNA variant enrichment and discovery

    PubMed Central

    Karas, Vlad O; Sinnott-Armstrong, Nicholas A; Varghese, Vici; Shafer, Robert W; Greenleaf, William J; Sherlock, Gavin

    2018-01-01

    Abstract Much of the within species genetic variation is in the form of single nucleotide polymorphisms (SNPs), typically detected by whole genome sequencing (WGS) or microarray-based technologies. However, WGS produces mostly uninformative reads that perfectly match the reference, while microarrays require genome-specific reagents. We have developed Diff-seq, a sequencing-based mismatch detection assay for SNP discovery without the requirement for specialized nucleic-acid reagents. Diff-seq leverages the Surveyor endonuclease to cleave mismatched DNA molecules that are generated after cross-annealing of a complex pool of DNA fragments. Sequencing libraries enriched for Surveyor-cleaved molecules result in increased coverage at the variant sites. Diff-seq detected all mismatches present in an initial test substrate, with specific enrichment dependent on the identity and context of the variation. Application to viral sequences resulted in increased observation of variant alleles in a biologically relevant context. Diff-Seq has the potential to increase the sensitivity and efficiency of high-throughput sequencing in the detection of variation. PMID:29361139

  9. HLA-B*5808, a new HLA-B allele characterized by sequence based typing.

    PubMed

    Poli, F; Crespiatico, L; Frison, S; Longhi, E; Marlianici, E; Scalamogna, M

    2003-12-01

    This brief communication describes a new HLA-B allele (HLA-B*5808) detected in an Italian white volunteer bone marrow donor. With serology, this subject was typed as HLA-B15,17, whereas with molecular biology B*15, B*51, B*52 and/or B*58 could be assigned. In order to clarify the results, direct and cloning sequencing of exons 2, 3 and 4 were carried out. This new allele is identical to HLA-B*5801 in exon 2 except for a silent point mutation at nucleotide 141 where a C is substituted by a T; exons 3 and 4 are typical of HLA-B*51, B*52 and B*78. The peculiar sequence of B*5808 could explain the discrepancy between the serological and molecular typing results.

  10. Developing single nucleotide polymorphism (SNP) markers from transcriptome sequences for identification of longan (Dimocarpus longan) germplasm

    PubMed Central

    Wang, Boyi; Tan, Hua-Wei; Fang, Wanping; Meinhardt, Lyndel W; Mischke, Sue; Matsumoto, Tracie; Zhang, Dapeng

    2015-01-01

    Longan (Dimocarpus longan Lour.) is an important tropical fruit tree crop. Accurate varietal identification is essential for germplasm management and breeding. Using longan transcriptome sequences from public databases, we developed single nucleotide polymorphism (SNP) markers; validated 60 SNPs in 50 longan germplasm accessions, including cultivated varieties and wild germplasm; and designated 25 SNP markers that unambiguously identified all tested longan varieties with high statistical rigor (P<0.0001). Multiple trees from the same clone were verified and off-type trees were identified. Diversity analysis revealed genetic relationships among analyzed accessions. Cultivated varieties differed significantly from wild populations (Fst=0.300; P<0.001), demonstrating untapped genetic diversity for germplasm conservation and utilization. Within cultivated varieties, apparent differences between varieties from China and those from Thailand and Hawaii indicated geographic patterns of genetic differentiation. These SNP markers provide a powerful tool to manage longan genetic resources and breeding, with accurate and efficient genotype identification. PMID:26504559

  11. Homology between DNA polymerases of poxviruses, herpesviruses, and adenoviruses: nucleotide sequence of the vaccinia virus DNA polymerase gene.

    PubMed Central

    Earl, P L; Jones, E V; Moss, B

    1986-01-01

    A 5400-base-pair segment of the vaccinia virus genome was sequenced and an open reading frame of 938 codons was found precisely where the DNA polymerase had been mapped by transfer of a phosphonoacetate-resistance marker. A single nucleotide substitution changing glycine at position 347 to aspartic acid accounts for the drug resistance of the mutant vaccinia virus. The 5' end of the DNA polymerase mRNA was located 80 base pairs before the methionine codon initiating the open reading frame. Correspondence between the predicted Mr 108,577 polypeptide and the 110,000 purified enzyme indicates that little or no proteolytic processing occurs. Extensive homology, extending over 435 amino acids, was found upon comparing the DNA polymerase of vaccinia virus and DNA polymerase of Epstein-Barr virus. A highly conserved sequence of 14 amino acids in the carboxyl-terminal regions of the above DNA polymerases is also present at a similar location in adenovirus DNA polymerase. This structure, which is predicted to form a turn flanked by beta-pleated sheets, may form part of an essential binding or catalytic site that accounts for its presence in DNA polymerases of poxviruses, herpesviruses, and adenoviruses. Images PMID:3012524

  12. Complete nucleotide sequence and genome organization of a Chinese isolate of Tobacco vein distorting virus.

    PubMed

    Mo, Xiao-han; Chen, Zheng-bin; Chen, Jian-ping

    2010-12-01

    Tobacco bushy top disease is caused by tobacco bushy top virus (TBTV, a member of the genus Umbravirus) which is dependent on tobacco vein-distorting virus (TVDV) to act as a helper virus encapsidating TBTV and enabling its transmission by aphids. Isometric virions from diseased tobacco plants were purified and disease symptoms were reproduced after experimental aphid transmission. The complete genome of TVDV was determined from cloned RT-PCR products derived from viral RNA. It was 5,920 nucleotides (nts) long and had the six major open reading frames (ORFs) typical of a member of the genus Polerovirus. Sequence comparisons showed that it differed significantly from any of the other species in the genus and this was confirmed by phylogenetic analyses of the RdRp and coat protein. SDS-PAGE analysis of purified virions gave two protein bands of about 26 and 59 kDa both of which reacted strongly in Western blots with antiserum produced to prokaryotically expressed TVDV CP showing that the two forms of the TVDV CP were the only protein components of the capsid.

  13. Amino acid sequence of a trypsin inhibitor from a Spirometra (Spirometra erinaceieuropaei).

    PubMed

    Sanda, A; Uchida, A; Itagaki, T; Kobayashi, H; Inokuchi, N; Koyama, T; Iwama, M; Ohgi, K; Irie, M

    2001-12-01

    A trypsin inhibitor that is highly homologous with bovine pancreatic trypsin inhibitor (BPTI) was co-purified along with RNase from Spirometra (Spirometra erinaceieuropaei). The amino acid sequence of this inhibitor (SETI) and the nucleotide sequence of the cDNA encoding this protein were determined by protein chemistry and gene technology. SETI contains 68 amino acid residues and has a molecular mass of 7,798 Da. SETI has 31 amino acid residues that are identical with BPTI's sequence, including 6 half-cystine and 5 aromatic amino acid residues. The active site Lys residue in BPTI is replaced by an Arg residue in SETI. SETI is an effective inhibitor of trypsin and moderately inhibits a-chymotrypsin, but less inhibits elastase or subtilisin. SETI was expressed by E. coli containing a PelB vector carrying the SETI encoding cDNA; an expression yield of 0.68 mg/l was obtained. The phylogenetic relationship of SETI and the other BPTI-like trypsin inhibitors was analyzed using most likelihood inference methods.

  14. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

    PubMed Central

    Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.

    2015-01-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030

  15. Purification, characterization, gene cloning and nucleotide sequencing of D: -stereospecific amino acid amidase from soil bacterium: Delftia acidovorans.

    PubMed

    Hongpattarakere, Tipparat; Komeda, Hidenobu; Asano, Yasuhisa

    2005-12-01

    The D-amino acid amidase-producing bacterium was isolated from soil samples using an enrichment culture technique in medium broth containing D-phenylalanine amide as a sole source of nitrogen. The strain exhibiting the strongest activity was identified as Delftia acidovorans strain 16. This strain produced intracellular D-amino acid amidase constitutively. The enzyme was purified about 380-fold to homogeneity and its molecular mass was estimated to be about 50 kDa, on sodium dodecyl sulfate polyacrylamide gel electrophoresis. The enzyme was active preferentially toward D-amino acid amides rather than their L-counterparts. It exhibited strong amino acid amidase activity toward aromatic amino acid amides including D-phenylalanine amide, D-tryptophan amide and D-tyrosine amide, yet it was not specifically active toward low-molecular-weight D-amino acid amides such as D-alanine amide, L-alanine amide and L-serine amide. Moreover, it was not specifically active toward oligopeptides. The enzyme showed maximum activity at 40 degrees C and pH 8.5 and appeared to be very stable, with 92.5% remaining activity after the reaction was performed at 45 degrees C for 30 min. However, it was mostly inactivated in the presence of phenylmethanesulfonyl fluoride or Cd2+, Ag+, Zn2+, Hg2+ and As3+ . The NH2 terminal and internal amino acid sequences of the enzyme were determined; and the gene was cloned and sequenced. The enzyme gene damA encodes a 466-amino-acid protein (molecular mass 49,860.46 Da); and the deduced amino acid sequence exhibits homology to the D-amino acid amidase from Variovorax paradoxus (67.9% identity), the amidotransferase A subunit from Burkholderia fungorum (50% identity) and other enantioselective amidases.

  16. Porcine insulin receptor substrate 4 (IRS4) gene: cloning, polymorphism and association study

    USDA-ARS?s Scientific Manuscript database

    Using PCR and IPCR techniques we obtained a 4498 bp nucleotide sequence FN424076 encompassing the complete coding sequence of the porcine IRS4 gene and its proximal promoter. The 1269-amino acid porcine protein deduced from the nucleotide sequence shares 92% identity with the human IRS4 and possesse...

  17. Molecular cloning, sequence characterization and recombinant expression of Nanog gene in goat fibroblast cells using lentiviral based expression system.

    PubMed

    Singhal, Dinesh K; Singhal, Raxita; Malik, Hruda N; Kumar, Surender; Kumar, Sudarshan; Mohanty, Ashok K; Kaushik, Jai K; Malakar, Dhruba

    2014-01-01

    Nanog is a homeodomain containing protein which plays important roles in regulation of signaling pathways for maintenance and induction of pluripotency in stem cells. Because of its unique expression in stem cells it is also regarded as pluripotency marker. In this study goat Nanog (gNanog) gene has been amplified, cloned and characterized at sequence level with successful over-expression in CHO-K1 cell line using a lentiviral based system. gNanog ORF is 903 bp long which codes for Nanog protein of size 300 amino acids (aas). Complete nucleotide sequence shows some evolutionary mutation in goat in comparision to other species. Protein sequence of goat is highly similar to other species. Overall, gNanog nucleotide sequence and predicted protein sequence showed high similarity and minimum divergence with cattle (96 % identity/4 % divergence) and buffalo (94/5 %) while low similarity and high divergence with pig (84/15 %), human (81/23 %) and mouse (69/40 %) indicating evolutionary closeness of gNanog to cattle and buffalo. gNanog lentiviral expression construct was prepared for over-expression of Nanog gene in adult goat fibroblast cells. Lentiviral expression construct of Nanog enabled continuous protein expression for induction and maintenance of pluripotency. Western blotting revealed the expression of Nanog gene at protein level which supported that the lentiviral expression system is highly promising for Nanog protein expression in differentiated goat cell.

  18. Complete sequence of two tick-borne flaviviruses isolated from Siberia and the UK: analysis and significance of the 5' and 3'-UTRs.

    PubMed

    Gritsun, T S; Venugopal, K; Zanotto, P M; Mikhailov, M V; Sall, A A; Holmes, E C; Polkinghorne, I; Frolova, T V; Pogodina, V V; Lashkevich, V A; Gould, E A

    1997-05-01

    The complete nucleotide sequence of two tick-transmitted flaviviruses, Vasilchenko (Vs) from Siberia and louping ill (LI) from the UK, have been determined. The genomes were respectively, 10928 and 10871 nucleotides (nt) in length. The coding strategy and functional protein sequence motifs of tick-borne flaviviruses are presented in both Vs and LI viruses. The phylogenies based on maximum likelihood, maximum parsimony and distance analysis of the polyproteins, identified Vs virus as a member of the tick-borne encephalitis virus subgroup within the tick-borne serocomplex, genus Flavivirus, family Flaviviridae. Comparative alignment of the 3'-untranslated regions revealed deletions of different lengths essentially at the same position downstream of the stop codon for all tick-borne viruses. Two direct 27 nucleotide repeats at the 3'-end were found only for Vs and LI virus. Immediately following the deletions a region of 332-334 nt with relatively conserved primary structure (67-94% identity) was observed at the 3'-non-coding end of the virus genome. Pairwise comparisons of the nucleotide sequence data revealed similar levels of variation between the coding region, and the 5' and 3'-termini of the genome, implying an equivalent strong selective control for translated and untranslated regions. Indeed the predicted folding of the 5' and 3'-untranslated regions revealed patterns of stem and loop structures conserved for all tick-borne flaviviruses suggesting a purifying selection for preservation of essential RNA secondary structures which could be involved in translational control and replication. The possible implications of these findings are discussed.

  19. A comparison of coding sequence and cytogenetic localization of the myostatin gene in the dog, red fox, arctic fox and Chinese raccoon dog.

    PubMed

    Grzes, M; Nowacka-Woszuk, J; Szczerbal, I; Czerwinska, J; Gracz, J; Switonski, M

    2009-01-01

    The gene encoding myostatin (MSTN), due to its crucial function for growth of skeletal muscle mass, is an important candidate for muscularity. In this study we analyzed the nucleotide sequence and FISH localization of this gene in 4 canids, including 3 farm species. The nucleotide sequence of the MSTN coding fragment turned out to be highly conserved, since its identity among the studied species was very high and varied between 99.4 and 99.7%. Only 1, widely spread, silent single nucleotide polymorphism (SNP) was found in exon 1 of the Chinese raccoon dog. The MSTN gene was localized close to the centromere in one-armed chromosomes of the dog (37q11) and bi-armed chromosomes of the red fox (16p11) and arctic fox (10q11), with an exception of the Chinese raccoon dog chromosome (2q14-q21). This chromosome is orthologous to 3 canine chromosomes and thus the MSTN was found more interstitially. Our results are in agreement with the hypothesis that karyotypes of the canids evolved mainly through centric fusion/fission events, while tandem fusions occurred rarely. (c) 2009 S. Karger AG, Basel.

  20. Bellerophon: a program to detect chimeric sequences in multiple sequence alignments.

    PubMed

    Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip

    2004-09-22

    Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments. Bellerophon is available as an interactive web server at http://foo.maths.uq.edu.au/~huber/bellerophon.pl

  1. Complete nucleotide and derived amino acid sequence of cDNA encoding the mitochondrial uncoupling protein of rat brown adipose tissue: lack of a mitochondrial targeting presequence.

    PubMed Central

    Ridley, R G; Patel, H V; Gerber, G E; Morton, R C; Freeman, K B

    1986-01-01

    A cDNA clone spanning the entire amino acid sequence of the nuclear-encoded uncoupling protein of rat brown adipose tissue mitochondria has been isolated and sequenced. With the exception of the N-terminal methionine the deduced N-terminus of the newly synthesized uncoupling protein is identical to the N-terminal 30 amino acids of the native uncoupling protein as determined by protein sequencing. This proves that the protein contains no N-terminal mitochondrial targeting prepiece and that a targeting region must reside within the amino acid sequence of the mature protein. Images PMID:3012461

  2. The complete genome sequence of the Atlantic salmon paramyxovirus (ASPV)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nylund, Stian; Karlsen, Marius; Nylund, Are

    2008-03-30

    The complete RNA genome of the Atlantic salmon paramyxovirus (ASPV), isolated from Atlantic salmon suffering from proliferative gill inflammation (PGI), has been determined. The genome is 16,965 nucleotides in length and consists of six nonoverlapping genes in the order 3'- N - P/C/V - M - F - HN - L -5', coding for the nucleocapsid, phospho-, matrix, fusion, hemagglutinin-neuraminidase and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and trinucleotide intergenic regions similar to those of other Paramyxoviridae. The ASPV P-gene expression strategy is like that of the respiro- and morbilliviruses,more » which express the phosphoprotein from the primary transcript, and edit a portion of the mRNA to encode the accessory proteins V and W. It also encodes the C-protein by ribosomal choice of translation initiation. Pairwise comparisons of amino acid identities, and phylogenetic analysis of deduced ASPV protein sequences with homologous sequences from other Paramyxoviridae, show that ASPV has an affinity for the genus Respirovirus, but may represent a new genus within the subfamily Paramyxovirinae.« less

  3. The Unique hmuY Gene Sequence as a Specific Marker of Porphyromonas gingivalis

    PubMed Central

    Mackiewicz, Paweł; Radwan-Oczko, Małgorzata; Kantorowicz, Małgorzata; Chomyszyn-Gajewska, Maria; Frąszczak, Magdalena; Bielecki, Marcin; Olczak, Mariusz; Olczak, Teresa

    2013-01-01

    Porphyromonas gingivalis, a major etiological agent of chronic periodontitis, acquires heme from host hemoproteins using the HmuY hemophore. The aim of this study was to develop a specific P. gingivalis marker based on a hmuY gene sequence. Subgingival samples were collected from 66 patients with chronic periodontitis and 40 healthy subjects and the entire hmuY gene was analyzed in positive samples. Phylogenetic analyses demonstrated that both the amino acid sequence of the HmuY protein and the nucleotide sequence of the hmuY gene are unique among P. gingivalis strains/isolates and show low identity to sequences found in other species (below 50 and 56%, respectively). In agreement with these findings, a set of hmuY gene-based primers and standard/real-time PCR with SYBR Green chemistry allowed us to specifically detect P. gingivalis in patients with chronic periodontitis (77.3%) and healthy subjects (20%), the latter possessing lower number of P. gingivalis cells and total bacterial cells. Isolates from healthy subjects possess the hmuY gene-based nucleotide sequence pattern occurring in W83/W50/A7436 (n = 4), 381/ATCC 33277 (n = 3) or TDC60 (n = 1) strains, whereas those from patients typically have TDC60 (n = 21), W83/W50/A7436 (n = 17) and 381/ATCC 33277 (n = 13) strains. We observed a significant correlation between periodontal index of risk of infectiousness (PIRI) and the presence/absence of P. gingivalis (regardless of the hmuY gene-based sequence pattern of the isolate identified [r = 0.43; P = 0.0002] and considering particular isolate pattern [r = 0.38; P = 0.0012]). In conclusion, we demonstrated that the hmuY gene sequence or its fragments may be used as one of the molecular markers of P. gingivalis. PMID:23844074

  4. Identification, Characterization and Full-Length Sequence Analysis of a Novel Polerovirus Associated with Wheat Leaf Yellowing Disease

    PubMed Central

    Zhang, Peipei; Liu, Yan; Liu, Wenwen; Cao, Mengji; Massart, Sebastien; Wang, Xifeng

    2017-01-01

    To identify the pathogens responsible for leaf yellowing symptoms on wheat samples collected from Jinan, China, we tested for the presence of three known barley/wheat yellow dwarf viruses (BYDV-GAV, -PAV, WYDV-GPV) (most likely pathogens) using RT-PCR. A sample that tested negative for the three viruses was selected for small RNA sequencing. Twenty-five million sequences were generated, among which 5% were of viral origin. A novel polerovirus was discovered and temporarily named wheat leaf yellowing-associated virus (WLYaV). The full genome of WLYaV corresponds to 5,772 nucleotides (nt), with six AUG-initiated open reading frames, one non-AUG-initiated open reading frame, and three untranslated regions, showing typical features of the family Luteoviridae. Sequence comparison and phylogenetic analyses suggested that WLYaV had the closest relationship with sugarcane yellow leaf virus (ScYLV), but the identities of full genomic nucleotides and deduced amino acid sequence of coat protein (CP) were 64.9 and 86.2%, respectively, below the species demarcation thresholds (90%) in the family Luteoviridae. Furthermore, agroinoculation of Nicotiana benthamiana leaves with a cDNA clone of WLYaV caused yellowing symptoms on the plant. Our study adds a new polerovirus that is associated with wheat leaf yellowing disease, which would help to identify and control pathogens of wheat. PMID:28932215

  5. Identification, Characterization and Full-Length Sequence Analysis of a Novel Polerovirus Associated with Wheat Leaf Yellowing Disease.

    PubMed

    Zhang, Peipei; Liu, Yan; Liu, Wenwen; Cao, Mengji; Massart, Sebastien; Wang, Xifeng

    2017-01-01

    To identify the pathogens responsible for leaf yellowing symptoms on wheat samples collected from Jinan, China, we tested for the presence of three known barley/wheat yellow dwarf viruses (BYDV-GAV, -PAV, WYDV-GPV) (most likely pathogens) using RT-PCR. A sample that tested negative for the three viruses was selected for small RNA sequencing. Twenty-five million sequences were generated, among which 5% were of viral origin. A novel polerovirus was discovered and temporarily named wheat leaf yellowing-associated virus (WLYaV). The full genome of WLYaV corresponds to 5,772 nucleotides (nt), with six AUG-initiated open reading frames, one non-AUG-initiated open reading frame, and three untranslated regions, showing typical features of the family Luteoviridae . Sequence comparison and phylogenetic analyses suggested that WLYaV had the closest relationship with sugarcane yellow leaf virus (ScYLV), but the identities of full genomic nucleotides and deduced amino acid sequence of coat protein (CP) were 64.9 and 86.2%, respectively, below the species demarcation thresholds (90%) in the family Luteoviridae . Furthermore, agroinoculation of Nicotiana benthamiana leaves with a cDNA clone of WLYaV caused yellowing symptoms on the plant. Our study adds a new polerovirus that is associated with wheat leaf yellowing disease, which would help to identify and control pathogens of wheat.

  6. DNA-DNA hybridization values and their relationship to whole-genome sequence similarities.

    PubMed

    Goris, Johan; Konstantinidis, Konstantinos T; Klappenbach, Joel A; Coenye, Tom; Vandamme, Peter; Tiedje, James M

    2007-01-01

    DNA-DNA hybridization (DDH) values have been used by bacterial taxonomists since the 1960s to determine relatedness between strains and are still the most important criterion in the delineation of bacterial species. Since the extent of hybridization between a pair of strains is ultimately governed by their respective genomic sequences, we examined the quantitative relationship between DDH values and genome sequence-derived parameters, such as the average nucleotide identity (ANI) of common genes and the percentage of conserved DNA. A total of 124 DDH values were determined for 28 strains for which genome sequences were available. The strains belong to six important and diverse groups of bacteria for which the intra-group 16S rRNA gene sequence identity was greater than 94 %. The results revealed a close relationship between DDH values and ANI and between DNA-DNA hybridization and the percentage of conserved DNA for each pair of strains. The recommended cut-off point of 70 % DDH for species delineation corresponded to 95 % ANI and 69 % conserved DNA. When the analysis was restricted to the protein-coding portion of the genome, 70 % DDH corresponded to 85 % conserved genes for a pair of strains. These results reveal extensive gene diversity within the current concept of "species". Examination of reciprocal values indicated that the level of experimental error associated with the DDH method is too high to reveal the subtle differences in genome size among the strains sampled. It is concluded that ANI can accurately replace DDH values for strains for which genome sequences are available.

  7. Regulation of Ion Channels by Pyridine Nucleotides

    PubMed Central

    Kilfoil, Peter J.; Tipparaju, Srinivas M.; Barski, Oleg A.; Bhatnagar, Aruni

    2014-01-01

    Recent research suggests that in addition to their role as soluble electron carriers, pyridine nucleotides [NAD(P)(H)] also regulate ion transport mechanisms. This mode of regulation seems to have been conserved through evolution. Several bacterial ion–transporting proteins or their auxiliary subunits possess nucleotide-binding domains. In eukaryotes, the Kv1 and Kv4 channels interact with pyridine nucleotide–binding β-subunits that belong to the aldo-keto reductase superfamily. Binding of NADP+ to Kvβ removes N-type inactivation of Kv currents, whereas NADPH stabilizes channel inactivation. Pyridine nucleotides also regulate Slo channels by interacting with their cytosolic regulator of potassium conductance domains that show high sequence homology to the bacterial TrkA family of K+ transporters. These nucleotides also have been shown to modify the activity of the plasma membrane KATP channels, the cystic fibrosis transmembrane conductance regulator, the transient receptor potential M2 channel, and the intracellular ryanodine receptor calcium release channels. In addition, pyridine nucleotides also modulate the voltage-gated sodium channel by supporting the activity of its ancillary subunit—the glycerol-3-phosphate dehydrogenase-like protein. Moreover, the NADP+ metabolite, NAADP+, regulates intracellular calcium homeostasis via the 2-pore channel, ryanodine receptor, or transient receptor potential M2 channels. Regulation of ion channels by pyridine nucleotides may be required for integrating cell ion transport to energetics and for sensing oxygen levels or metabolite availability. This mechanism also may be an important component of hypoxic pulmonary vasoconstriction, memory, and circadian rhythms, and disruption of this regulatory axis may be linked to dysregulation of calcium homeostasis and cardiac arrhythmias. PMID:23410881

  8. Telling apart Felidae and Ursidae from the distribution of nucleotides in mitochondrial DNA

    NASA Astrophysics Data System (ADS)

    Rovenchak, Andrij

    2018-02-01

    Rank-frequency distributions of nucleotide sequences in mitochondrial DNA are defined in a way analogous to the linguistic approach, with the highest-frequent nucleobase serving as a whitespace. For such sequences, entropy and mean length are calculated. These parameters are shown to discriminate the species of the Felidae (cats) and Ursidae (bears) families. From purely numerical values we are able to see in particular that giant pandas are bears while koalas are not. The observed linear relation between the parameters is explained using a simple probabilistic model. The approach based on the non-additive generalization of the Bose distribution is used to analyze the frequency spectra of the nucleotide sequences. In this case, the separation of families is not very sharp. Nevertheless, the distributions for Felidae have on average longer tails comparing to Ursidae.

  9. Identification of cDNAs encoding viper venom hyaluronidases: cross-generic sequence conservation of full-length and unusually short variant transcripts.

    PubMed

    Harrison, Robert A; Ibison, Frances; Wilbraham, Davina; Wagstaff, Simon C

    2007-05-01

    The immobilisation of prey by snakes is most efficiently achieved by the rapid dissemination of venom from its site of injection into the blood stream. Hyaluronidase is a common component of snake venoms and has been termed the "venom spreading factor". In the absence of nucleotide or protein sequence data to confirm the functional identity of this venom component, we interrogated a venom gland EST database for the saw-scaled viper, Echis ocellatus (Nigeria), using the gene ontology (GO) term "carbohydrate metabolism". A single hyalurononglucosaminadase-activity matching sequence (EOC00242) was found and used to design PCR primers to acquire the full-length cDNA sequence. Although very different from the bee venom and mammalian hyaluronidase sequences, the E. ocellatus sequence retained all the catalytic, positional and structural residues that characterise this class of carbohydrate metabolising hydrolases. An extraordinarily high level of sequence identity (>95%) was observed in analogous venom gland cDNA sequences isolated (by PCR) from another saw-scaled viper species, E. pyramidum leakeyi (Kenya), and from the sahara horned viper, Cerastes cerastes cerastes (Egypt) and the puff adder, Bitis arietans (Nigeria). Smaller amplicons, lacking hyaluronidase catalytic residues because of 768 bp or 855 bp central deletions, appear to encode either truncated peptides without hyaluronidase activity, or are non-translated transcripts because they lack consensus translation initiating motifs.

  10. First complete genome sequence of infectious laryngotracheitis virus

    PubMed Central

    2011-01-01

    Background Infectious laryngotracheitis virus (ILTV) is an alphaherpesvirus that causes acute respiratory disease in chickens worldwide. To date, only one complete genomic sequence of ILTV has been reported. This sequence was generated by concatenating partial sequences from six different ILTV strains. Thus, the full genomic sequence of a single (individual) strain of ILTV has not been determined previously. This study aimed to use high throughput sequencing technology to determine the complete genomic sequence of a live attenuated vaccine strain of ILTV. Results The complete genomic sequence of the Serva vaccine strain of ILTV was determined, annotated and compared to the concatenated ILTV reference sequence. The genome size of the Serva strain was 152,628 bp, with a G + C content of 48%. A total of 80 predicted open reading frames were identified. The Serva strain had 96.5% DNA sequence identity with the concatenated ILTV sequence. Notably, the concatenated ILTV sequence was found to lack four large regions of sequence, including 528 bp and 594 bp of sequence in the UL29 and UL36 genes, respectively, and two copies of a 1,563 bp sequence in the repeat regions. Considerable differences in the size of the predicted translation products of 4 other genes (UL54, UL30, UL37 and UL38) were also identified. More than 530 single-nucleotide polymorphisms (SNPs) were identified. Most SNPs were located within three genomic regions, corresponding to sequence from the SA-2 ILTV vaccine strain in the concatenated ILTV sequence. Conclusions This is the first complete genomic sequence of an individual ILTV strain. This sequence will facilitate future comparative genomic studies of ILTV by providing an appropriate reference sequence for the sequence analysis of other ILTV strains. PMID:21501528

  11. Associations between single nucleotide polymorphisms in multiple candidate genes and body weight in rabbits

    PubMed Central

    El-Sabrout, Karim; Aggag, Sarah A.

    2017-01-01

    Aim: In this study, we examined parts of six growth genes (growth hormone [GH], melanocortin 4 receptor [MC4R], growth hormone receptor [GHR], phosphorglycerate mutase [PGAM], myostatin [MSTN], and fibroblast growth factor [FGF]) as specific primers for two rabbit lines (V-line, Alexandria) using nucleotide sequence analysis, to investigate association between detecting single nucleotide polymorphism (SNP) of these genes and body weight (BW) at market. Materials and Methods: Each line kits were grouped into high and low weight rabbits to identify DNA markers useful for association studies with high BW. DNA from blood samples of each group was extracted to amplify the six growth genes. SNP technique was used to study the associate polymorphism in the six growth genes and marketing BW (at 63 days) in the two rabbit lines. The purified polymerase chain reaction products were sequenced in those had the highest and lowest BW in each line. Results: Alignment of sequence data from each group revealed the following SNPs: At nucleotide 23 (A-C) and nucleotide 35 (T-G) in MC4R gene (sense mutation) of Alexandria and V-line high BW. Furthermore, we detected the following SNPs variation between the two lines: A SNP (T-C) at nucleotide 27 was identified by MC4R gene (sense mutation) and another one (A-C) at nucleotide 14 was identified by GHR gene (nonsense mutation) of Alexandria line. The results of individual BW at market (63 days) indicated that Alexandria rabbits had significantly higher BW compared with V-line rabbits. MC4R polymorphism showed significant association with high BW in rabbits. Conclusion: The results of polymorphism demonstrate the possibility to detect an association between BW in rabbits and the efficiency of the used primers to predict through the genetic specificity using the SNP of MC4R. PMID:28246458

  12. Diversity of partial RNA-dependent RNA polymerase gene sequences of soybean blotchy mosaic virus isolates from different host-, geographical- and temporal origins.

    PubMed

    Strydom, Elrea; Pietersen, Gerhard

    2018-05-01

    Infection of soybean by the plant cytorhabdovirus soybean blotchy mosaic virus (SbBMV) results in significant yield losses in the temperate, lower-lying soybean production regions of South Africa. A 277 bp portion of the RNA-dependent RNA polymerase gene of 66 SbBMV isolates from different: hosts, geographical locations in South Africa, and times of collection (spanning 16 years) were amplified by RT-PCR and sequenced to investigate the genetic diversity of isolates. Phylogenetic reconstruction revealed three main lineages, designated Groups A, B and C, with isolates grouping primarily according to geographic origin. Pairwise nucleotide identities ranged between 85.7% and 100% among all isolates, with isolates in Group A exhibiting the highest degree of sequence identity, and isolates of Groups A and B being more closely related to each other than to those in Group C. This is the first study investigating the genetic diversity of SbBMV.

  13. Genome organisation and sequence comparison suggest intraspecies incongruence in M RNA of Watermelon bud necrosis virus.

    PubMed

    Kumar, Rakesh; Mandal, B; Geetanjali, A S; Jain, R K; Jaiwal, P K

    2010-08-01

    Watermelon bud necrosis virus (WBNV), a member of the genus Tospovirus, family Bunyaviridae is an important viral pathogen in watermelon cultivation in India. The complete genome sequence properties of WBNV are not available. In the present study, the complete M RNA sequence and the genome organisation of a WBNV isolate infecting watermelon in Delhi (WBNV-wDel) were determined. The M RNA was 4,794 nucleotides (nt) long and potentially coded for a movement protein (NSm) of 34.22 kDa (307 amino acids) on the viral sense strand and a Gn/Gc glycoprotein precursor of 127.15 kDa (1,121 amino acids) on the complementary strand. The two open reading frames were separated by an intergenic region of 402 nt. The 5' and 3' untranslated regions were 55 and 47 nt long, respectively, containing complementary termini typical of tospoviruses. WBNV-wDel was most closely related (79.1% identity) to Groundnut bud necrosis virus, an important tospovirus that occurs in several crops in India, and was different (63.3-75.2% identity) from the other cucurbit-infecting tospoviruses known to occur in Taiwan and Japan. Sequence analysis of NSm and Gn/Gc revealed phylogenetic incongruence between WBNV-wDel and another isolate originating from central India (WBNV-Wm-Som isolate). The Wm-Som isolate showed evolutionary divergence from the wDel isolate in the Gn/Gc protein (74.6% identity) potentially due to recombination with the other tospoviruses that are known to occur in India. This is the first report of a comparison of complete sequences of M RNA of WBNV.

  14. High-resolution melting genotyping of Enterococcus faecium based on multilocus sequence typing derived single nucleotide polymorphisms.

    PubMed

    Tong, Steven Y C; Xie, Shirley; Richardson, Leisha J; Ballard, Susan A; Dakh, Farshid; Grabsch, Elizabeth A; Grayson, M Lindsay; Howden, Benjamin P; Johnson, Paul D R; Giffard, Philip M

    2011-01-01

    We have developed a single nucleotide polymorphism (SNP) nucleated high-resolution melting (HRM) technique to genotype Enterococcus faecium. Eight SNPs were derived from the E. faecium multilocus sequence typing (MLST) database and amplified fragments containing these SNPs were interrogated by HRM. We tested the HRM genotyping scheme on 85 E. faecium bloodstream isolates and compared the results with MLST, pulsed-field gel electrophoresis (PFGE) and an allele specific real-time PCR (AS kinetic PCR) SNP typing method. In silico analysis based on predicted HRM curves according to the G+C content of each fragment for all 567 sequence types (STs) in the MLST database together with empiric data from the 85 isolates demonstrated that HRM analysis resolves E. faecium into 231 "melting types" (MelTs) and provides a Simpson's Index of Diversity (D) of 0.991 with respect to MLST. This is a significant improvement on the AS kinetic PCR SNP typing scheme that resolves 61 SNP types with D of 0.95. The MelTs were concordant with the known ST of the isolates. For the 85 isolates, there were 13 PFGE patterns, 17 STs, 14 MelTs and eight SNP types. There was excellent concordance between PFGE, MLST and MelTs with Adjusted Rand Indices of PFGE to MelT 0.936 and ST to MelT 0.973. In conclusion, this HRM based method appears rapid and reproducible. The results are concordant with MLST and the MLST based population structure.

  15. TargetM6A: Identifying N6-Methyladenosine Sites From RNA Sequences via Position-Specific Nucleotide Propensities and a Support Vector Machine.

    PubMed

    Li, Guang-Qing; Liu, Zi; Shen, Hong-Bin; Yu, Dong-Jun

    2016-10-01

    As one of the most ubiquitous post-transcriptional modifications of RNA, N 6 -methyladenosine ( [Formula: see text]) plays an essential role in many vital biological processes. The identification of [Formula: see text] sites in RNAs is significantly important for both basic biomedical research and practical drug development. In this study, we designed a computational-based method, called TargetM6A, to rapidly and accurately target [Formula: see text] sites solely from the primary RNA sequences. Two new features, i.e., position-specific nucleotide/dinucleotide propensities (PSNP/PSDP), are introduced and combined with the traditional nucleotide composition (NC) feature to formulate RNA sequences. The extracted features are further optimized to obtain a much more compact and discriminative feature subset by applying an incremental feature selection (IFS) procedure. Based on the optimized feature subset, we trained TargetM6A on the training dataset with a support vector machine (SVM) as the prediction engine. We compared the proposed TargetM6A method with existing methods for predicting [Formula: see text] sites by performing stringent jackknife tests and independent validation tests on benchmark datasets. The experimental results show that the proposed TargetM6A method outperformed the existing methods for predicting [Formula: see text] sites and remarkably improved the prediction performances, with MCC = 0.526 and AUC = 0.818. We also provided a user-friendly web server for TargetM6A, which is publicly accessible for academic use at http://csbio.njust.edu.cn/bioinf/TargetM6A.

  16. Deduced sequences of the membrane fusion and attachment proteins of canine distemper viruses isolated from dogs and wild animals in Korea.

    PubMed

    Bae, Chae-Wun; Lee, Joong-Bok; Park, Seung-Yong; Song, Chang-Seon; Lee, Nak-Hyung; Seo, Kun-Ho; Kang, Young-Sun; Park, Choi-Kyu; Choi, In-Soo

    2013-08-01

    Canine distemper virus (CDV) causes highly contagious respiratory, gastrointestinal, and neurological diseases in wild and domestic animal species. Despite a broad vaccination campaign, the disease is still a serious problem worldwide. In this study, six field CDV strains were isolated from three dogs, two raccoon dogs, and one badger in Korea. The full sequence of the genes encoding fusion (F) and hemagglutinin (H) proteins were compared with those of other CDVs including field and vaccine strains. The phylogenetic analysis for the F and H genes indicated that the two CDV strains isolated from dogs were most closely related to Chinese strains in the Asia-1 genotype. Another four strains were closely related to Japanese strains in the Asia-2 genotype. The six currently isolated strains shared 90.2-92.1% and 88.2-91.8% identities with eight commercial vaccine strains in their nucleotide and amino acid sequences of the F protein, respectively. They also showed 90.1-91.4% and 87.8-90.7% identities with the same vaccine strains in their nucleotide and deduced amino acid sequences of the H protein, respectively. Different N-linked glycosylation sites were identified in the F and H genes of the six isolates from the prototype vaccine strain Onderstepoort. Collectively, these results demonstrate that at least two different CDV genotypes currently exist in Korea. The considerable genetic differences between the vaccine strains and wild-type isolates would be a major factor of the incomplete protection of dogs from CDV infections.

  17. [Complete genomic sequence of a watermelon isolate of cucumber green mottle mosaic virus in northern China].

    PubMed

    Chen, Hong-yun; Lin, Shi-ming; Chen, Qing; Zhao, Wen-jun; Liao, Fu-rong; Chen, Hong-jun; Zhu, Shui-fang

    2009-01-01

    The complete genomic sequence of a watermelon isolate of Cucumber green mottle mosaic virus (CGMMV-LN) in Liaoning province was determined and compared with other cucurbit-infecting tobamoviruses. The genomic RNA of CGMMV-LN comprised 6422 nt, and 5'- and 3'- noncoding regions consisted of 59 nt and 175 nt, respectively. The encoded four proteins were two replicase proteins of 186 kD and 129 kD, move protein of 29 kD and coat protein of 17.4 kD. The alignment results of complete nucleotide sequence showed that CGMMV-LN shared identities of 97.6%-99.3% with four other CGMMV isolates, but only shared identities of 61.7%-62.8% with three other tobamoviruses. Homology trees generated from replicase proteins of 186 kD and coat proteins suggested that cucurbit-infecting tobamoviruses could be separated into two subgroups: subgroup I comprising all the isolates of CGMMV and subgroup II comprising Cucumber fruit mottle mosaic virus, Kyuri green mottle mosaic virus and Zucchini green mottle mosaic virus.

  18. SNP discovery through de novo deep sequencing using the next generation of DNA sequencers

    USDA-ARS?s Scientific Manuscript database

    The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....

  19. Correlation approach to identify coding regions in DNA sequences

    NASA Technical Reports Server (NTRS)

    Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

    1994-01-01

    Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.

  20. Dynamic Nucleotide Mutation Gradients and Control Region Usage in Squamate Reptile Mitochondrial Genomes

    PubMed Central

    Castoe, T.A.; Gu, W.; de Koning, A.P.J.; Daza, J.M.; Jiang, Z.J.; Parkinson, C.L.; Pollock, D.D.

    2010-01-01

    Gradients of nucleotide bias and substitution rates occur in vertebrate mitochondrial genomes due to the asymmetric nature of the replication process. The evolution of these gradients has previously been studied in detail in primates, but not in other vertebrate groups. From the primate study, the strengths of these gradients are known to evolve in ways that can substantially alter the substitution process, but it is unclear how rapidly they evolve over evolutionary time or how different they may be in different lineages or groups of vertebrates. Given the importance of mitochondrial genomes in phylogenetics and molecular evolutionary research, a better understanding of how asymmetric mitochondrial substitution gradients evolve would contribute key insights into how this gradient evolution may mislead evolutionary inferences, and how it may also be incorporated into new evolutionary models. Most snake mitochondrial genomes have an additional interesting feature, 2 nearly identical control regions, which vary among different species in the extent that they are used as origins of replication. Given the expanded sampling of complete snake genomes currently available, together with 2 additional snakes sequenced in this study, we reexamined gradient strength and CR usage in alethinophidian snakes as well as several lizards that possess dual CRs. Our results suggest that nucleotide substitution gradients (and corresponding nucleotide bias) and CR usage is highly labile over the ∼200 m.y. of squamate evolution, and demonstrates greater overall variability than previously shown in primates. The evidence for the existence of such gradients, and their ability to evolve rapidly and converge among unrelated species suggests that gradient dynamics could easily mislead phylogenetic and molecular evolutionary inferences, and argues strongly that these dynamics should be incorporated into phylogenetic models. PMID:20215734

  1. DNA sequencing using polymerase substrate-binding kinetics

    PubMed Central

    Previte, Michael John Robert; Zhou, Chunhong; Kellinger, Matthew; Pantoja, Rigo; Chen, Cheng-Yao; Shi, Jin; Wang, BeiBei; Kia, Amirali; Etchin, Sergey; Vieceli, John; Nikoomanzar, Ali; Bomati, Erin; Gloeckner, Christian; Ronaghi, Mostafa; He, Molly Min

    2015-01-01

    Next-generation sequencing (NGS) has transformed genomic research by decreasing the cost of sequencing. However, whole-genome sequencing is still costly and complex for diagnostics purposes. In the clinical space, targeted sequencing has the advantage of allowing researchers to focus on specific genes of interest. Routine clinical use of targeted NGS mandates inexpensive instruments, fast turnaround time and an integrated and robust workflow. Here we demonstrate a version of the Sequencing by Synthesis (SBS) chemistry that potentially can become a preferred targeted sequencing method in the clinical space. This sequencing chemistry uses natural nucleotides and is based on real-time recording of the differential polymerase/DNA-binding kinetics in the presence of correct or mismatch nucleotides. This ensemble SBS chemistry has been implemented on an existing Illumina sequencing platform with integrated cluster amplification. We discuss the advantages of this sequencing chemistry for targeted sequencing as well as its limitations for other applications. PMID:25612848

  2. The first genome sequence of a metatherian herpesvirus: Macropodid herpesvirus 1.

    PubMed

    Vaz, Paola K; Mahony, Timothy J; Hartley, Carol A; Fowler, Elizabeth V; Ficorilli, Nino; Lee, Sang W; Gilkerson, James R; Browning, Glenn F; Devlin, Joanne M

    2016-01-22

    While many placental herpesvirus genomes have been fully sequenced, the complete genome of a marsupial herpesvirus has not been described. Here we present the first genome sequence of a metatherian herpesvirus, Macropodid herpesvirus 1 (MaHV-1). The MaHV-1 viral genome was sequenced using an Illumina MiSeq sequencer, de novo assembly was performed and the genome was annotated. The MaHV-1 genome was 140 kbp in length and clustered phylogenetically with the primate simplexviruses, sharing 67% nucleotide sequence identity with Human herpesviruses 1 and 2. The MaHV-1 genome contained 66 predicted open reading frames (ORFs) homologous to those in other herpesvirus genomes, but lacked homologues of UL3, UL4, UL56 and glycoprotein J. This is the first alphaherpesvirus genome that has been found to lack the UL3 and UL4 homologues. We identified six novel ORFs and confirmed their transcription by RT-PCR. This is the first genome sequence of a herpesvirus that infects metatherians, a taxonomically unique mammalian clade. Members of the Simplexvirus genus are remarkably conserved, so the absence of ORFs otherwise retained in eutherian and avian alphaherpesviruses contributes to our understanding of the Alphaherpesvirinae. Further study of metatherian herpesvirus genetics and pathogenesis provides a unique approach to understanding herpesvirus-mammalian interactions.

  3. Non-redundant patent sequence databases with value-added annotations at two levels

    PubMed Central

    Li, Weizhong; McWilliam, Hamish; de la Torre, Ana Richart; Grodowski, Adam; Benediktovich, Irina; Goujon, Mickael; Nauche, Stephane; Lopez, Rodrigo

    2010-01-01

    The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/. PMID:19884134

  4. Non-redundant patent sequence databases with value-added annotations at two levels.

    PubMed

    Li, Weizhong; McWilliam, Hamish; de la Torre, Ana Richart; Grodowski, Adam; Benediktovich, Irina; Goujon, Mickael; Nauche, Stephane; Lopez, Rodrigo

    2010-01-01

    The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format: http://www.ebi.ac.uk/patentdata/nr/.

  5. Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

    PubMed Central

    Laehnemann, David; Borkhardt, Arndt

    2016-01-01

    Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159

  6. Identification of a novel species of papillomavirus in giraffe lesions using nanopore sequencing.

    PubMed

    Vanmechelen, Bert; Bertelsen, Mads Frost; Rector, Annabel; Van den Oord, Joost J; Laenen, Lies; Vergote, Valentijn; Maes, Piet

    2017-03-01

    Papillomaviridae form a large family of viruses that are known to infect a variety of vertebrates, including mammals, reptiles, birds and fish. Infections usually give rise to minor skin lesions but can in some cases lead to the development of malignant neoplasia. In this study, we identified a novel species of papillomavirus (PV), isolated from warts of four giraffes (Giraffa camelopardalis). The sequence of the L1 gene was determined and found to be identical for all isolates. Using nanopore sequencing, the full sequence of the PV genome could be determined. The coding region of the genome was found to contain seven open reading frames (ORF), encoding the early proteins E1, E2 and E5-E7 as well as the late proteins L1 and L2. In addition to these ORFs, a region located within the E2 gene is thought, based on sequence similarities to other papillomaviruses, to encode an E4 protein, although no start codon could be identified. Based on the sequence of the L1 gene, this novel PV was found to be most similar to Capreolus capreolus papillomavirus 1 (CcaPV1), with 67.96% nucleotide identity. We therefore suggest that the virus identified here is given the name Giraffa camelopardalis papillomavirus 1 (GcPV1) and is classified as a novel species within the genus Deltapapillomavirus, in line with the current guidelines for the nomenclature and classification of PVs. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Developmental rearrangement of cyanobacterial nif genes: nucleotide sequence, open reading frames, and cytochrome P-450 homology of the Anabaena sp. strain PCC 7120 nifD element.

    PubMed Central

    Lammers, P J; McLaughlin, S; Papin, S; Trujillo-Provencio, C; Ryncarz, A J

    1990-01-01

    An 11-kbp DNA element of unknown function interrupts the nifD gene in vegetative cells of Anabaena sp. strain PCC 7120. In developing heterocysts the nifD element excises from the chromosome via site-specific recombination between short repeat sequences that flank the element. The nucleotide sequence of the nifH-proximal half of the element was determined to elucidate the genetic potential of the element. Four open reading frames with the same relative orientation as the nifD element-encoded xisA gene were identified in the sequenced region. Each of the open reading frames was preceded by a reasonable ribosome-binding site and had biased codon utilization preferences consistent with low levels of expression. Open reading frame 3 was highly homologous with three cytochrome P-450 omega-hydroxylase proteins and showed regional homology to functionally significant domains common to the cytochrome P-450 superfamily. The sequence encoding open reading frame 2 was the most highly conserved portion of the sequenced region based on heterologous hybridization experiments with three genera of heterocystous cyanobacteria. Images PMID:2123860

  8. Some identities of generalized Fibonacci sequence

    NASA Astrophysics Data System (ADS)

    Chong, Chin-Yoon; Cheah, C. L.; Ho, C. K.

    2014-07-01

    We introduced the generalized Fibonacci sequence {Un} defined by U0 = 0, U1 = 1, and Un+2 = pUn+1+qUn for all p, q∈Z+ and for all non-negative integers n. In this paper, we obtained some recursive formulas of the sequence.

  9. Complete genome sequence of a Chinese isolate of pepper vein yellows virus and evolutionary analysis based on the CP, MP and RdRp coding regions.

    PubMed

    Liu, Maoyan; Liu, Xiangning; Li, Xun; Zhang, Deyong; Dai, Liangyin; Tang, Qianjun

    2016-03-01

    The genome sequence of pepper vein yellows virus (PeVYV) (PeVYV-HN, accession number KP326573), isolated from pepper plants (Capsicum annuum L.) grown at the Hunan Vegetables Institute (Changsha, Hunan, China), was determined by deep sequencing of small RNAs. The PeVYV-HN genome consists of 6244 nucleotides, contains six open reading frames (ORFs), and is similar to that of an isolate (AB594828) from Japan. Its genomic organization is similar to that of members of the genus Polerovirus. Sequence analysis revealed that PeVYV-HN shared 92% sequence identity with the Japanese PeVYV genome at both the nucleotide and amino acid levels. Evolutionary analysis based on the coat protein (CP), movement protein (MP), and RNA-dependent RNA polymerase (RdRP) showed that PeVYV could be divided into two major lineages corresponding to their geographical origins. The Asian isolates have a higher population expansion frequency than the African isolates. Negative selection and genetic drift (founder effect) were found to be the potential drivers of the molecular evolution of PeVYV. Moreover, recombination was not the distinct cause of PeVYV evolution. This is the first report of a complete genomic sequence of PeVYV in China.

  10. A Bioluminometric Method of DNA Sequencing

    NASA Technical Reports Server (NTRS)

    Ronaghi, Mostafa; Pourmand, Nader; Stolc, Viktor; Arnold, Jim (Technical Monitor)

    2001-01-01

    Pyrosequencing is a bioluminometric single-tube DNA sequencing method that takes advantage of co-operativity between four enzymes to monitor DNA synthesis. In this sequencing-by-synthesis method, a cascade of enzymatic reactions yields detectable light, which is proportional to incorporated nucleotides. Pyrosequencing has the advantages of accuracy, flexibility and parallel processing. It can be easily automated. Furthermore, the technique dispenses with the need for labeled primers, labeled nucleotides and gel-electrophoresis. In this chapter, the use of this technique for different applications is discussed.

  11. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences

    PubMed Central

    Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong

    2015-01-01

    Abstract We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate—slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory. PMID:25549288

  12. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

    PubMed

    Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong; Warnow, Tandy

    2015-05-01

    We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.

  13. Cloning and sequence analysis of sucrose phosphate synthase gene from varieties of Pennisetum species.

    PubMed

    Li, H C; Lu, H B; Yang, F Y; Liu, S J; Bai, C J; Zhang, Y W

    2015-03-31

    Sucrose phosphate synthase (SPS) is an enzyme used by higher plants for sucrose synthesis. In this study, three primer sets were designed on the basis of known SPS sequences from maize (GenBank: NM_001112224.1) and sugarcane (GenBank: JN584485.1), and five novel SPS genes were identified by RT-PCR from the genomes of Pennisetum spp (the hybrid P. americanum x P. purpureum, P. purpureum Schum., P. purpureum Schum. cv. Red, P. purpureum Schum. cv. Taiwan, and P. purpureum Schum. cv. Mott). The cloned sequences showed 99.9% identity and 80-88% similarity to the SPS sequences of other plants. The SPS gene of hybrid Pennisetum had one nucleotide and four amino acid polymorphisms compared to the other four germplasms, and cluster analysis was performed to assess genetic diversity in this species. Additional characterization of the SPS gene product can potentially allow Pennisetum to be exploited as a biofuel source.

  14. Using Single-Nucleotide Polymorphisms To Discriminate Disease-Associated from Carried Genomes of Neisseria meningitidis▿†

    PubMed Central

    Katz, Lee S.; Sharma, Nitya V.; Harcourt, Brian H.; Thomas, Jennifer Dolan; Wang, Xin; Mayer, Leonard W.; Jordan, I. King

    2011-01-01

    Neisseria meningitidis is one of the main agents of bacterial meningitis, causing substantial morbidity and mortality worldwide. However, most of the time N. meningitidis is carried as a commensal not associated with invasive disease. The genomic basis of the difference between disease-associated and carried isolates of N. meningitidis may provide critical insight into mechanisms of virulence, yet it has remained elusive. Here, we have taken a comparative genomics approach to interrogate the difference between disease-associated and carried isolates of N. meningitidis at the level of individual nucleotide variations (i.e., single nucleotide polymorphisms [SNPs]). We aligned complete genome sequences of 8 disease-associated and 4 carried isolates of N. meningitidis to search for SNPs that show mutually exclusive patterns of variation between the two groups. We found 63 SNPs that distinguish the 8 disease-associated genomes from the 4 carried genomes of N. meningitidis, which is far more than can be expected by chance alone given the level of nucleotide variation among the genomes. The putative list of SNPs that discriminate between disease-associated and carriage genomes may be expected to change with increased sampling or changes in the identities of the isolates being compared. Nevertheless, we show that these discriminating SNPs are more likely to reflect phenotypic differences than shared evolutionary history. Discriminating SNPs were mapped to genes, and the functions of the genes were evaluated for possible connections to virulence mechanisms. A number of overrepresented functional categories related to virulence were uncovered among SNP-associated genes, including genes related to the category “symbiosis, encompassing mutualism through parasitism.” PMID:21622743

  15. Homogeneity of the 16S rDNA sequence among geographically disparate isolates of Taylorella equigenitalis

    PubMed Central

    Matsuda, M; Tazumi, A; Kagawa, S; Sekizuka, T; Murayama, O; Moore, JE; Millar, BC

    2006-01-01

    Background At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis) are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Results Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. Conclusion High sequence similarity (99.5% or more) was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted. PMID:16398935

  16. Detection and molecular characterization of infectious bronchitis virus isolated from recent outbreaks in broiler flocks in Thailand.

    PubMed

    Pohuang, Tawatchai; Chansiripornchai, Niwat; Tawatsin, Achara; Sasipreeyajan, Jiroj

    2009-09-01

    Thirteen field isolates of infectious bronchitis virus (IBV) were isolated from broiler flocks in Thailand between January and June 2008. The 878-bp of the S1 gene covering a hypervariable region was amplified and sequenced. Phylogenetic analysis based on that region revealed that these viruses were separated into two groups (I and II). IBV isolates in group I were not related to other IBV strains published in the GenBank database. Group 1 nucleotide sequence identities were less than 85% and amino acid sequence identities less than 84% in common with IBVs published in the GenBank database. This group likely represents the strains indigenous to Thailand. The isolates in group II showed a close relationship with Chinese IBVs. They had nucleotide sequence identities of 97-98% and amino acid sequence identities 96-98% in common with Chinese IBVs (strain A2, SH and QXIBV). This finding indicated that the recent Thai IBVs evolved separately and at least two groups of viruses are circulating in Thailand.

  17. Individual sequences in large sets of gene sequences may be distinguished efficiently by combinations of shared sub-sequences

    PubMed Central

    Gibbs, Mark J; Armstrong, John S; Gibbs, Adrian J

    2005-01-01

    Background Most current DNA diagnostic tests for identifying organisms use specific oligonucleotide probes that are complementary in sequence to, and hence only hybridise with the DNA of one target species. By contrast, in traditional taxonomy, specimens are usually identified by 'dichotomous keys' that use combinations of characters shared by different members of the target set. Using one specific character for each target is the least efficient strategy for identification. Using combinations of shared bisectionally-distributed characters is much more efficient, and this strategy is most efficient when they separate the targets in a progressively binary way. Results We have developed a practical method for finding minimal sets of sub-sequences that identify individual sequences, and could be targeted by combinations of probes, so that the efficient strategy of traditional taxonomic identification could be used in DNA diagnosis. The sizes of minimal sub-sequence sets depended mostly on sequence diversity and sub-sequence length and interactions between these parameters. We found that 201 distinct cytochrome oxidase subunit-1 (CO1) genes from moths (Lepidoptera) were distinguished using only 15 sub-sequences 20 nucleotides long, whereas only 8–10 sub-sequences 6–10 nucleotides long were required to distinguish the CO1 genes of 92 species from the 9 largest orders of insects. Conclusion The presence/absence of sub-sequences in a set of gene sequences can be used like the questions in a traditional dichotomous taxonomic key; hybridisation probes complementary to such sub-sequences should provide a very efficient means for identifying individual species, subtypes or genotypes. Sequence diversity and sub-sequence length are the major factors that determine the numbers of distinguishing sub-sequences in any set of sequences. PMID:15817134

  18. Nature and distribution of feline sarcoma virus nucleotide sequences.

    PubMed Central

    Frankel, A E; Gilbert, J H; Porzig, K J; Scolnick, E M; Aaronson, S A

    1979-01-01

    The genomes of three independent isolates of feline sarcoma virus (FeSV) were compared by molecular hybridization techniques. Using complementary DNAs prepared from two strains, SM- and ST-FeSV, common complementary DNA'S were selected by sequential hybridization to FeSV and feline leukemia virus RNAs. These DNAs were shown to be highly related among the three independent sarcoma virus isolates. FeSV-specific complementary DNAs were prepared by selection for hybridization by the homologous FeSV RNA and against hybridization by fline leukemia virus RNA. Sarcoma virus-specific sequences of SM-FeSV were shown to differ from those of either ST- or GA-FeSV strains, whereas ST-FeSV-specific DNA shared extensive sequence homology with GA-FeSV. By molecular hybridization, each set of FeSV-specific sequences was demonstrated to be present in normal cat cellular DNA in approximately one copy per haploid genome and was conserved throughout Felidae. In contrast, FeSV-common sequences were present in multiple DNA copies and were found only in Mediterranean cats. The present results are consistent with the concept that each FeSV strain has arisen by a mechanism involving recombination between feline leukemia virus and cat cellular DNA sequences, the latter represented within the cat genome in a manner analogous to that of a cellular gene. PMID:225544

  19. Variation in the number of nucleoli and incomplete homogenization of 18S ribosomal DNA sequences in leaf cells of the cultivated Oriental ginseng (Panax ginseng Meyer).

    PubMed

    Chelomina, Galina N; Rozhkovan, Konstantin V; Voronova, Anastasia N; Burundukova, Olga L; Muzarok, Tamara I; Zhuravlev, Yuri N

    2016-04-01

    Wild ginseng, Panax ginseng Meyer, is an endangered species of medicinal plants. In the present study, we analyzed variations within the ribosomal DNA (rDNA) cluster to gain insight into the genetic diversity of the Oriental ginseng, P. ginseng, at artificial plant cultivation. The roots of wild P. ginseng plants were sampled from a nonprotected natural population of the Russian Far East. The slides were prepared from leaf tissues using the squash technique for cytogenetic analysis. The 18S rDNA sequences were cloned and sequenced. The distribution of nucleotide diversity, recombination events, and interspecific phylogenies for the total 18S rDNA sequence data set was also examined. In mesophyll cells, mononucleolar nuclei were estimated to be dominant (75.7%), while the remaining nuclei contained two to four nucleoli. Among the analyzed 18S rDNA clones, 20% were identical to the 18S rDNA sequence of P. ginseng from Japan, and other clones differed in one to six substitutions. The nucleotide polymorphism was more expressed at the positions 440-640 bp, and distributed in variable regions, expansion segments, and conservative elements of core structure. The phylogenetic analysis confirmed conspecificity of ginseng plants cultivated in different regions, with two fixed mutations between P. ginseng and other species. This study identified the evidences of the intragenomic nucleotide polymorphism in the 18S rDNA sequences of P. ginseng. These data suggest that, in cultivated plants, the observed genome instability may influence the synthesis of biologically active compounds, which are widely used in traditional medicine.

  20. Molecular Cloning and Sequencing of Hemoglobin-Beta Gene of Channel Catfish, Ictalurus Punctatus Rafinesque

    USDA-ARS?s Scientific Manuscript database

    : Hemoglobin-y gene of channel catfish , lctalurus punctatus, was cloned and sequenced . Total RNA from head kidneys was isolated, reverse transcribed and amplified . The sequence of the channel catfish hemoglobin-y gene consists of 600 nucleotides . Analysis of the nucleotide sequence reveals one o...

  1. Increased frequency of de novo copy number variants in congenital heart disease by integrative analysis of single nucleotide polymorphism array and exome sequence data.

    PubMed

    Glessner, Joseph T; Bick, Alexander G; Ito, Kaoru; Homsy, Jason; Rodriguez-Murillo, Laura; Fromer, Menachem; Mazaika, Erica; Vardarajan, Badri; Italia, Michael; Leipzig, Jeremy; DePalma, Steven R; Golhar, Ryan; Sanders, Stephan J; Yamrom, Boris; Ronemus, Michael; Iossifov, Ivan; Willsey, A Jeremy; State, Matthew W; Kaltman, Jonathan R; White, Peter S; Shen, Yufeng; Warburton, Dorothy; Brueckner, Martina; Seidman, Christine; Goldmuntz, Elizabeth; Gelb, Bruce D; Lifton, Richard; Seidman, Jonathan; Hakonarson, Hakon; Chung, Wendy K

    2014-10-24

    Congenital heart disease (CHD) is among the most common birth defects. Most cases are of unknown pathogenesis. To determine the contribution of de novo copy number variants (CNVs) in the pathogenesis of sporadic CHD. We studied 538 CHD trios using genome-wide dense single nucleotide polymorphism arrays and whole exome sequencing. Results were experimentally validated using digital droplet polymerase chain reaction. We compared validated CNVs in CHD cases with CNVs in 1301 healthy control trios. The 2 complementary high-resolution technologies identified 63 validated de novo CNVs in 51 CHD cases. A significant increase in CNV burden was observed when comparing CHD trios with healthy trios, using either single nucleotide polymorphism array (P=7×10(-5); odds ratio, 4.6) or whole exome sequencing data (P=6×10(-4); odds ratio, 3.5) and remained after removing 16% of de novo CNV loci previously reported as pathogenic (P=0.02; odds ratio, 2.7). We observed recurrent de novo CNVs on 15q11.2 encompassing CYFIP1, NIPA1, and NIPA2 and single de novo CNVs encompassing DUSP1, JUN, JUP, MED15, MED9, PTPRE SREBF1, TOP2A, and ZEB2, genes that interact with established CHD proteins NKX2-5 and GATA4. Integrating de novo variants in whole exome sequencing and CNV data suggests that ETS1 is the pathogenic gene altered by 11q24.2-q25 deletions in Jacobsen syndrome and that CTBP2 is the pathogenic gene in 10q subtelomeric deletions. We demonstrate a significantly increased frequency of rare de novo CNVs in CHD patients compared with healthy controls and suggest several novel genetic loci for CHD. © 2014 American Heart Association, Inc.

  2. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.

    PubMed

    Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook

    2014-11-01

    As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of

  3. Transcript-specific, single-nucleotide polymorphism discovery and linkage analysis in hexaploid bread wheat (Triticum aestivum L.).

    PubMed

    Allen, Alexandra M; Barker, Gary L A; Berry, Simon T; Coghill, Jane A; Gwilliam, Rhian; Kirby, Susan; Robinson, Phil; Brenchley, Rachel C; D'Amore, Rosalinda; McKenzie, Neil; Waite, Darren; Hall, Anthony; Bevan, Michael; Hall, Neil; Edwards, Keith J

    2011-12-01

    Food security is a global concern and substantial yield increases in cereal crops are required to feed the growing world population. Wheat is one of the three most important crops for human and livestock feed. However, the complexity of the genome coupled with a decline in genetic diversity within modern elite cultivars has hindered the application of marker-assisted selection (MAS) in breeding programmes. A crucial step in the successful application of MAS in breeding programmes is the development of cheap and easy to use molecular markers, such as single-nucleotide polymorphisms. To mine selected elite wheat germplasm for intervarietal single-nucleotide polymorphisms, we have used expressed sequence tags derived from public sequencing programmes and next-generation sequencing of normalized wheat complementary DNA libraries, in combination with a novel sequence alignment and assembly approach. Here, we describe the development and validation of a panel of 1114 single-nucleotide polymorphisms in hexaploid bread wheat using competitive allele-specific polymerase chain reaction genotyping technology. We report the genotyping results of these markers on 23 wheat varieties, selected to represent a broad cross-section of wheat germplasm including a number of elite UK varieties. Finally, we show that, using relatively simple technology, it is possible to rapidly generate a linkage map containing several hundred single-nucleotide polymorphism markers in the doubled haploid mapping population of Avalon × Cadenza. © 2011 The Authors. Plant Biotechnology Journal © 2011 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd.

  4. Deep sequencing revealed genome-wide single-nucleotide polymorphism and plasmid content of Erwinia amylovora strains isolated in Middle Atlas, Morocco.

    PubMed

    Hannou, Najat; Mondy, Samuel; Planamente, Sara; Moumni, Mohieddine; Llop, Pablo; López, María; Manceau, Charles; Barny, Marie-Anne; Faure, Denis

    2013-10-01

    Erwinia amylovora causes economic losses that affect pear and apple production in Morocco. Here, we report comparative genomics of four Moroccan E. amylovora strains with the European strain CFBP1430 and North-American strain ATCC49946. Analysis of single nucleotide polymorphisms (SNPs) revealed genetic homogeneity of Moroccan's strains and their proximity to the European strain CFBP1430. Moreover, the collected sequences allowed the assembly of a 65 kpb plasmid, which is highly similar to the plasmid pEI70 harbored by several European E. amylovora isolates. This plasmid was found in 33% of the 40 E. amylovora strains collected from several host plants in 2009 and 2010 in Morocco. Copyright © 2013 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  5. Bean common mosaic virus isolates causing different symptoms in asparagus bean in China differ greatly in the 5'-parts of their genomes.

    PubMed

    Zheng, Hongying; Chen, Jiong; Chen, Jianping; Adams, Michael J; Hou, Mingsheng

    2002-06-01

    Potyvirus isolates from asparagus bean ( Vigna sesquipedalis) plants in Zhejiang province, China, caused either rugose and vein banding mosaic symptoms (isolate R) or severe yellowing (isolate Y) in this host, but were otherwise similar in host range. Both isolates were completely sequenced and shown to be isolates of Bean common mosaic virus (BCMV). The complete sequences were 9992 (R) or 10062 (Y) nucleotides long and shared 91.7% identical nucleotides (93.2% identical amino acids) in their genomes and were more distantly related to the BCMV-Peanut stripe virus sequence (PStV). The isolates were much less similar to one another in the 5'-UTR and the N-terminal region of the P1 protein. In the P1, isolate Y was closer to PStV (76.1% identical amino acids) than to isolate R (64.8%). Phylogenetic analyses of the coat protein region showed that the new isolates grouped with other isolates from Vigna spp., forming the blackeye cowpea mosaic strain subgroup of BCMV with 94-98% nucleotides (96-99% amino acids) identical to one another and about 90% identity to other BCMV isolates. Other significant subgroupings amongst published BCMV isolates were detected.

  6. Molecular detection of kobuviruses in European roe deer (Capreolus capreolus) in Italy.

    PubMed

    Di Martino, Barbara; Di Profio, Federica; Melegari, Irene; Di Felice, Elisabetta; Robetto, Serena; Guidetti, Cristina; Orusa, Riccardo; Martella, Vito; Marsilio, Fulvio

    2015-08-01

    Kobuvirus RNA was found in 6.6 % (13/198) of stool specimens from roe deer (Capreolus capreolus) captured during the regular hunting season. Upon sequence analysis of a fragment of the 3D gene, nine strains displayed the highest nucleotide sequence identity (91.2-97.4 %) to bovine kobuviruses previously detected in either diarrhoeic or asymptomatic calves. Interestingly, four strains were genetically related to the newly discovered caprine kobuviruses (84.2-87.6 % nucleotide identity) identified in black goats in Korea.

  7. Genetic variation in potential Giardia vaccine candidates cyst wall protein 2 and α1-giardin.

    PubMed

    Radunovic, Matej; Klotz, Christian; Saghaug, Christina Skår; Brattbakk, Hans-Richard; Aebischer, Toni; Langeland, Nina; Hanevik, Kurt

    2017-08-01

    Giardia is a prevalent intestinal parasitic infection. The trophozoite structural protein a1-giardin (a1-g) and the cyst protein cyst wall protein 2 (CWP2) have shown promise as Giardia vaccine antigen candidates in murine models. The present study assesses the genetic diversity of a1-g and CWP2 between and within assemblages A and B in human clinical isolates. a1-g and CWP2 sequences were acquired from 15 Norwegian isolates by PCR amplification and 20 sequences from German cultured isolates by whole genome sequencing. Sequences were aligned to reference genomes from assemblage A2 and B to identify genetic variance. Genetic diversity was found between assemblage A and B reference sequences for both a1-g (90.8% nucleotide identity) and CWP2 (82.5% nucleotide identity). However, for a1-g, this translated into only 3 amino acid (aa) substitutions, while for CWP2 there were 41 aa substitutions, and also one aa deletion. Genetic diversity within assemblage B was larger; nucleotide identity 92.0% for a1-g and 94.3% for CWP2, than within assemblage A (nucleotide identity 99.0% for a1-g and 99.7% for CWP2). For CWP2, the diversity on both nucleotide and protein level was higher in the C-terminal end. Predicted antigenic epitopes were not affected for a1-g, but partially for CWP2. Despite genetic diversity in a1-g, we found aa sequence, characteristics, and antigenicity to be well preserved. CWP2 showed more aa variance and potential antigenic differences. Several CWP2 antigens might be necessary in a future Giardia vaccine to provide cross protection against both Giardia assemblages infecting humans.

  8. Extensive structural variations between mitochondrial genomes of CMS and normal peppers (Capsicum annuum L.) revealed by complete nucleotide sequencing.

    PubMed

    Jo, Yeong Deuk; Choi, Yoomi; Kim, Dong-Hwan; Kim, Byung-Dong; Kang, Byoung-Cheorl

    2014-07-04

    Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearrangements caused by recombination. However, the mitochondrial genome structure and the DNA rearrangements that may be related to CMS have not been characterized in Capsicum spp. We obtained the complete mitochondrial genome sequences of the pepper CMS line FS4401 (507,452 bp) and the fertile line Jeju (511,530 bp). Comparative analysis between mitochondrial genomes of peppers and tobacco that are included in Solanaceae revealed extensive DNA rearrangements and poor conservation in non-coding DNA. In comparison between pepper lines, FS4401 and Jeju mitochondrial DNAs contained the same complement of protein coding genes except for one additional copy of an atp6 gene (ψatp6-2) in FS4401. In terms of genome structure, we found eighteen syntenic blocks in the two mitochondrial genomes, which have been rearranged in each genome. By contrast, sequences between syntenic blocks, which were specific to each line, accounted for 30,380 and 17,847 bp in FS4401 and Jeju, respectively. The previously-reported CMS candidate genes, orf507 and ψatp6-2, were located on the edges of the largest sequence segments that were specific to FS4401. In this region, large number of small sequence segments which were absent or found on different locations in Jeju mitochondrial genome were combined together. The incorporation of repeats and overlapping of connected sequence segments by a few nucleotides implied that extensive rearrangements by homologous recombination might be involved in evolution of this region. Further analysis using mtDNA pairs from other plant species revealed common features of DNA regions around CMS-associated genes. Although large portion of sequence context was

  9. The Role of the Y-Chromosome in the Establishment of Murine Hybrid Dysgenesis and in the Analysis of the Nucleotide Sequence Organization, Genetic Transmission and Evolution of Repeated Sequences.

    NASA Astrophysics Data System (ADS)

    Nallaseth, Ferez Soli

    The Y-chromosome presents a unique cytogenetic framework for the evolution of nucleotide sequences. Alignment of nine Y-chromosomal fragments in their increasing Y-specific/non Y-specific (male/female) sequence divergence ratios was directly and inversely related to their interspersion on these two respective genomic fractions. Sequence analysis confirmed a direct relationship between divergence ratios and the Alu, LINE-1, Satellite and their derivative oligonucleotide contents. Thus their relocation on the Y-chromosome is followed by sequence divergence rather than the well documented concerted evolution of these non-coding progenitor repeated sequences. Five of the nine Y-chromosomal fragments are non-pseudoautosomal and transcribed into heterogeneous PolyA^+ RNA and thus can be retrotransposed. Evolutionary and computer analysis identified homologous oligonucleotide tracts in several human loci suggesting common and random mechanistic origins. Dysgenic genomes represent the accelerated evolution driving sequence divergence (McClintock, 1984). Sex reversal and sterility characterizing dysgenesis occurs in C57BL/6JY ^{rm Pos} but not in 129/SvY^{rm Pos} derivative strains. High frequency, random, multi-locus deletion products of the feral Y^{ rm Pos}-chromosome are generated in the germlines of F1(C57BL/6J X 129/SvY^{ rm Pos})(male) and C57BL/6JY ^{rm Pos}(male) but not in 129/SvY^{rm Pos}(male). Equal, 10^{-1}, 10^ {-2}, and 0 copies (relative to males) of Y^{rm Pos}-specific deletion products respectively characterize C57BL/6JY ^{rm Pos} (HC), (LC), (T) and (F) females. The testes determining loci of inactive Y^{rm Pos}-chromosomes in C57BL/6JY^{rm Pos} HC females are the preferentially deleted/rearranged Y ^{rm Pos}-sequences. Disruption of regulation of plasma testosterone and hepatic MUP-A mRNA levels, TRD of a 4.7 Kbp EcoR1 fragment suggest disruption of autosomal/X-chromosomal sequences. These data and the highly repeated progenitor (Alu, GATA, LINE-1

  10. Biochemical and molecular characterization of a novel choline-specific glycerophosphodiester phosphodiesterase belonging to the nucleotide pyrophosphatase/phosphodiesterase family.

    PubMed

    Sakagami, Hideki; Aoki, Junken; Natori, Yumiko; Nishikawa, Kiyotaka; Kakehi, Yoshiyuki; Natori, Yasuhiro; Arai, Hiroyuki

    2005-06-17

    Nucleotide pyrophosphatases/phosphodiesterases (NPPs) are ubiquitous membrane-associated or secreted ectoenzymes that release nucleoside 5'-monophosphate from a variety of nucleotides and nucleotide derivatives. The mammalian NPP family comprises seven members, but only three of these (NPP1-3) have been studied in some detail. Previously we showed that lysophospholipase D, which hydrolyzes lysophosphatidylcholine (LPC) to produce lysophosphatidic acid, is identical to NPP2. More recently an uncharacterized novel NPP member (NPP7) was shown to have alkaline sphingomyelinase activity. These findings raised the possibility that other members of the NPP family act on phospholipids. Here we show that the sixth member of the NPP family, NPP6, is a choline-specific glycerophosphodiester phosphodiesterase. The sequence of NPP6 encodes a transmembrane protein containing an NPP domain with significant homology to NPP4, NPP5, and NPP7/alkaline sphingomyelinase. When expressed in HeLa cells, NPP6 was detected in both the cells and the cell culture medium as judged by Western blotting and by enzymatic activity. Recombinant NPP6 efficiently hydrolyzed the classical substrate for phospholipase C, p-nitrophenyl phosphorylcholine, but not the classical nucleotide phosphodiesterase substrate, p-nitrophenyl thymidine 5'-monophosphate. In addition, NPP6 hydrolyzed LPC to form monoacylglycerol and phosphorylcholine but not lysophosphatidic acid, showing it has a lysophospholipase C activity. NPP6 showed a preference for LPC with short (12:0 and 14:0) or polyunsaturated (18:2 and 20:4) fatty acids. It also hydrolyzed glycerophosphorylcholine and sphingosylphosphorylcholine efficiently. In mice, NPP6 mRNA was predominantly detected in kidney with a lesser expression in brain and heart, and in human it was detected in kidney and brain. The present results suggest that NPP6 has a specific role through the hydrolysis of polyunsaturated LPC, glycerophosphorylcholine, or

  11. Complete mitochondrial genome sequence of black mustard (Brassica nigra; BB) and comparison with Brassica oleracea (CC) and Brassica carinata (BBCC).

    PubMed

    Yamagishi, Hiroshi; Tanaka, Yoshiyuki; Terachi, Toru

    2014-11-01

    Crop species of Brassica (Brassicaceae) consist of three monogenomic species and three amphidiploid species resulting from interspecific hybridizations among them. Until now, mitochondrial genome sequences were available for only five of these species. We sequenced the mitochondrial genome of the sixth species, Brassica nigra (nuclear genome constitution BB), and compared it with those of Brassica oleracea (CC) and Brassica carinata (BBCC). The genome was assembled into a 232 145 bp circular sequence that is slightly larger than that of B. oleracea (219 952 bp). The genome of B. nigra contained 33 protein-coding genes, 3 rRNA genes, and 17 tRNA genes. The cox2-2 gene present in B. oleracea was absent in B. nigra. Although the nucleotide sequences of 52 genes were identical between B. nigra and B. carinata, the second exon of rps3 showed differences including an insertion/deletion (indel) and nucleotide substitutions. A PCR test to detect the indel revealed intraspecific variation in rps3, and in one line of B. nigra it amplified a DNA fragment of the size expected for B. carinata. In addition, the B. carinata lines tested here produced DNA fragments of the size expected for B. nigra. The results indicate that at least two mitotypes of B. nigra were present in the maternal parents of B. carinata.

  12. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation.

    PubMed

    Kang, Hyun Min; Subramaniam, Meena; Targ, Sasha; Nguyen, Michelle; Maliskova, Lenka; McCarthy, Elizabeth; Wan, Eunice; Wong, Simon; Byrnes, Lauren; Lanata, Cristina M; Gate, Rachel E; Mostafavi, Sara; Marson, Alexander; Zaitlen, Noah; Criswell, Lindsey A; Ye, Chun Jimmie

    2018-01-01

    Droplet single-cell RNA-sequencing (dscRNA-seq) has enabled rapid, massively parallel profiling of transcriptomes. However, assessing differential expression across multiple individuals has been hampered by inefficient sample processing and technical batch effects. Here we describe a computational tool, demuxlet, that harnesses natural genetic variation to determine the sample identity of each droplet containing a single cell (singlet) and detect droplets containing two cells (doublets). These capabilities enable multiplexed dscRNA-seq experiments in which cells from unrelated individuals are pooled and captured at higher throughput than in standard workflows. Using simulated data, we show that 50 single-nucleotide polymorphisms (SNPs) per cell are sufficient to assign 97% of singlets and identify 92% of doublets in pools of up to 64 individuals. Given genotyping data for each of eight pooled samples, demuxlet correctly recovers the sample identity of >99% of singlets and identifies doublets at rates consistent with previous estimates. We apply demuxlet to assess cell-type-specific changes in gene expression in 8 pooled lupus patient samples treated with interferon (IFN)-β and perform eQTL analysis on 23 pooled samples.

  13. Predicting protein-binding RNA nucleotides with consideration of binding partners.

    PubMed

    Tuvshinjargal, Narankhuu; Lee, Wook; Park, Byungkyu; Han, Kyungsook

    2015-06-01

    In recent years several computational methods have been developed to predict RNA-binding sites in protein. Most of these methods do not consider interacting partners of a protein, so they predict the same RNA-binding sites for a given protein sequence even if the protein binds to different RNAs. Unlike the problem of predicting RNA-binding sites in protein, the problem of predicting protein-binding sites in RNA has received little attention mainly because it is much more difficult and shows a lower accuracy on average. In our previous study, we developed a method that predicts protein-binding nucleotides from an RNA sequence. In an effort to improve the prediction accuracy and usefulness of the previous method, we developed a new method that uses both RNA and protein sequence data. In this study, we identified effective features of RNA and protein molecules and developed a new support vector machine (SVM) model to predict protein-binding nucleotides from RNA and protein sequence data. The new model that used both protein and RNA sequence data achieved a sensitivity of 86.5%, a specificity of 86.2%, a positive predictive value (PPV) of 72.6%, a negative predictive value (NPV) of 93.8% and Matthews correlation coefficient (MCC) of 0.69 in a 10-fold cross validation; it achieved a sensitivity of 58.8%, a specificity of 87.4%, a PPV of 65.1%, a NPV of 84.2% and MCC of 0.48 in independent testing. For comparative purpose, we built another prediction model that used RNA sequence data alone and ran it on the same dataset. In a 10 fold-cross validation it achieved a sensitivity of 85.7%, a specificity of 80.5%, a PPV of 67.7%, a NPV of 92.2% and MCC of 0.63; in independent testing it achieved a sensitivity of 67.7%, a specificity of 78.8%, a PPV of 57.6%, a NPV of 85.2% and MCC of 0.45. In both cross-validations and independent testing, the new model that used both RNA and protein sequences showed a better performance than the model that used RNA sequence data alone in

  14. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.

    PubMed

    Bolleman, Jerven T; Mungall, Christopher J; Strozzi, Francesco; Baran, Joachim; Dumontier, Michel; Bonnal, Raoul J P; Buels, Robert; Hoehndorf, Robert; Fujisawa, Takatomo; Katayama, Toshiaki; Cock, Peter J A

    2016-06-13

    Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.

  15. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

    DOE PAGES

    Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco; ...

    2016-06-13

    Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less

  16. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco

    Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less

  17. Autochthonous sporadic acute hepatitis E caused by two distinct subgenotype 3b hepatitis E virus strains with only 90% nucleotide identity.

    PubMed

    Yamaguchi, Yasuko; Takagi, Hitoshi; Suzuki, Yuhei; Maruhashi, Kyoko; Kosone, Takashi; Kakizaki, Satoru; Sato, Ken; Yamada, Masanobu; Nagashima, Shigeo; Takahashi, Masaharu; Okamoto, Hiroaki

    2017-04-01

    Hepatitis E, which is caused by hepatitis E virus (HEV), is a public health concern in Japan, where the zoonotic food-borne transmission of HEV from domestic pigs and wild boars plays an important role. A 44-year-old Japanese man with autochthonous sporadic acute hepatitis E was admitted with general fatigue and moderate liver dysfunction. In the present study, two distinct HEV strains were recovered from the patient, who had consumed the raw or undercooked pig liver and intestine two or three times per week for 3 months before the disease onset. The recovered HEV strains were segregated into two clusters within subgenotype 3b, the open reading frame (ORF)1 and ORF2 sequences of which each showed ~10% difference, indicating HEV mixed infection. Because most notified patients with clinical HEV infection in Japan are diagnosed based on the detection of IgA-class HEV antibodies and because serum samples from only a limited number of HEV-infected patients are subjected to HEV RNA detection and nucleotide sequencing, it is very likely that patients with HEV mixed infection remain largely overlooked. The identification of sources of autochthonous HEV infection remains an important goal. Continued efforts to trace the sources of acute or chronic autochthonous HEV infection are warranted.

  18. Partial Shotgun Sequencing of the Boechera stricta Genome Reveals Extensive Microsynteny and Promoter Conservation with Arabidopsis1[W

    PubMed Central

    Windsor, Aaron J.; Schranz, M. Eric; Formanová, Nataša; Gebauer-Jung, Steffi; Bishop, John G.; Schnabelrauch, Domenica; Kroymann, Juergen; Mitchell-Olds, Thomas

    2006-01-01

    Comparative genomics provides insight into the evolutionary dynamics that shape discrete sequences as well as whole genomes. To advance comparative genomics within the Brassicaceae, we have end sequenced 23,136 medium-sized insert clones from Boechera stricta, a wild relative of Arabidopsis (Arabidopsis thaliana). A significant proportion of these sequences, 18,797, are nonredundant and display highly significant similarity (BLASTn e-value ≤ 10−30) to low copy number Arabidopsis genomic regions, including more than 9,000 annotated coding sequences. We have used this dataset to identify orthologous gene pairs in the two species and to perform a global comparison of DNA regions 5′ to annotated coding regions. On average, the 500 nucleotides upstream to coding sequences display 71.4% identity between the two species. In a similar analysis, 61.4% identity was observed between 5′ noncoding sequences of Brassica oleracea and Arabidopsis, indicating that regulatory regions are not as diverged among these lineages as previously anticipated. By mapping the B. stricta end sequences onto the Arabidopsis genome, we have identified nearly 2,000 conserved blocks of microsynteny (bracketing 26% of the Arabidopsis genome). A comparison of fully sequenced B. stricta inserts to their homologous Arabidopsis genomic regions indicates that indel polymorphisms >5 kb contribute substantially to the genome size difference observed between the two species. Further, we demonstrate that microsynteny inferred from end-sequence data can be applied to the rapid identification and cloning of genomic regions of interest from nonmodel species. These results suggest that among diploid relatives of Arabidopsis, small- to medium-scale shotgun sequencing approaches can provide rapid and cost-effective benefits to evolutionary and/or functional comparative genomic frameworks. PMID:16607030

  19. Sequence characterization of cDNA sequence of encoding of an antimicrobial Peptide with no disulfide bridge from the Iranian mesobuthus eupeus venomous glands.

    PubMed

    Farajzadeh-Sheikh, Ahmad; Jolodar, Abbas; Ghaemmaghami, Shamsedin

    2013-01-01

    Scorpion venom glands produce some antimicrobial peptides (AMP) that can rapidly kill a broad range of microbes and have additional activities that impact on the quality and effectiveness of innate responses and inflammation. In this study, we reported the identification of a cDNA sequence encoding cysteine-free antimicrobial peptides isolated from venomous glands of this species. Total RNA was extracted from the Iranian mesobuthus eupeus venom glands, and cDNA was synthesized by using the modified oligo (dT). The cDNA was used as the template for applying Semi-nested RT- PCR technique. PCR Products were used for direct nucleotide sequencing and the results were compared with Gen Bank database. A 213 BP cDNA fragment encoding the entire coding region of an antimicrobial toxin from the Iranian scorpion M. Eupeus venom glands were isolated. The full-length sequence of the coding region was 210 BP contained an open reading frame of 70 amino with a predicted molecular mass of 7970.48 Da and theoretical Pi of 9.10. The open reading frame consists of 210 BP encoding a precursor of 70 amino acid residues, including a signal peptide of 23 residues a propertied of 7 residues, and a mature peptide of 34 residues with no disulfide bridge. The peptide has detectable sequence identity to the Lesser Asian mesobuthus eupeus MeVAMP-2 (98%), MeVAMP-9 (60%) and several previously described AMPs from other scorpion venoms including mesobuthus martensii (94%) and buthus occitanus Israelis (82%). The secondary structure of the peptide mainly consisted of α-helical structure which was generally conserved by previously reported scorpion counterparts. The phylogenetic analysis showed that the Iranian MeAMP-like toxin was similar but not identical with that of venom antimicrobial peptides from lesser Asian scorpion mesobuthus eupeus.

  20. Massively parallel sequencing of 124 SNPs included in the precision ID identity panel in three East Asian minority ethnicities.

    PubMed

    Liu, Jing; Wang, Zheng; He, Guanglin; Zhao, Xueying; Wang, Mengge; Luo, Tao; Li, Chengtao; Hou, Yiping

    2018-07-01

    Massively parallel sequencing (MPS) technologies can sequence many targeted regions of multiple samples simultaneously and are gaining great interest in the forensic community. The Precision ID Identity Panel contains 90 autosomal SNPs and 34 upper Y-Clade SNPs, which was designed with small amplicons and optimized for forensic degraded or challenging samples. Here, 184 unrelated individuals from three East Asian minority ethnicities (Tibetan, Uygur and Hui) were analyzed using the Precision ID Identity Panel and the Ion PGM System. The sequencing performance and corresponding forensic statistical parameters of this MPS-SNP panel were investigated. The inter-population relationships and substructures among three investigated populations and 30 worldwide populations were further investigated using PCA, MDS, cladogram and STRUCTURE. No significant deviation from Hardy-Weinberg equilibrium (HWE) and Linkage Disequilibrium (LD) tests was observed across all 90 autosomal SNPs. The combined matching probability (CMP) for Tibetan, Uygur and Hui were 2.5880 × 10 -33 , 1.7480 × 10 -35 and 4.6326 × 10 -34 respectively, and the combined power of exclusion (CPE) were 0.999999386152271, 0.999999607712827 and 0.999999696360182 respectively. For 34 Y-SNPs, only 16 haplogroups were obtained, but the haplogroup distributions differ among the three populations. Tibetans from the Sino-Tibetan population and Hui with multiple ethnicities with an admixture population have genetic affinity with East Asian populations, while Uygurs of a Eurasian admixture population have similar genetic components to the South Asian populations and are distributed between East Asian and European populations. The aforementioned results suggest that the Precision ID Identity Panel is informative and polymorphic in three investigated populations and could be used as an effective tool for human forensics. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. Comparative and Evolutionary Analyses of Meloidogyne spp. Based on Mitochondrial Genome Sequences

    PubMed Central

    García, Laura Evangelina; Sánchez-Puerta, M. Virginia

    2015-01-01

    Molecular taxonomy and evolution of nematodes have been recently the focus of several studies. Mitochondrial sequences were proposed as an alternative for precise identification of Meloidogyne species, to study intraspecific variability and to follow maternal lineages. We characterized the mitochondrial genomes (mtDNAs) of the root knot nematodes M. floridensis, M. hapla and M. incognita. These were AT rich (81–83%) and highly compact, encoding 12 proteins, 2 rRNAs, and 22 tRNAs. Comparisons with published mtDNAs of M. chitwoodi, M. incognita (another strain) and M. graminicola revealed that they share protein and rRNA gene order but differ in the order of tRNAs. The mtDNAs of M. floridensis and M. incognita were strikingly similar (97–100% identity for all coding regions). In contrast, M. floridensis, M. chitwoodi, M. hapla and M. graminicola showed 65–84% nucleotide identity for coding regions. Variable mitochondrial sequences are potentially useful for evolutionary and taxonomic studies. We developed a molecular taxonomic marker by sequencing a highly-variable ~2 kb mitochondrial region, nad5-cox1, from 36 populations of root-knot nematodes to elucidate relationships within the genus Meloidogyne. Isolates of five species formed monophyletic groups and showed little intraspecific variability. We also present a thorough analysis of the mitochondrial region cox2-rrnS. Phylogenies based on either mitochondrial region had good discrimination power but could not discriminate between M. arenaria, M. incognita and M. floridensis. PMID:25799071

  2. Whole-genome sequencing reveals that Shewanella haliotis Kim et al. 2007 can be considered a later heterotypic synonym of Shewanella algae Simidu et al. 1990.

    PubMed

    Szeinbaum, Nadia; Kellum, Cailin E; Glass, Jennifer B; Janda, J Michael; DiChristina, Thomas J

    2018-04-01

    Previously, experimental DNA-DNA hybridization (DDH) between Shewanellahaliotis JCM 14758 T and Shewanellaalgae JCM 21037 T had suggested that the two strains could be considered different species, despite minimal phenotypic differences. The recent isolation of Shewanella sp. MN-01, with 99 % 16S rRNA gene identity to S. algae and S. haliotis, revealed a potential taxonomic problem between these two species. In this study, we reassessed the nomenclature of S. haliotis and S. algae using available whole-genome sequences. The whole-genome sequence of S. haliotis JCM 14758 T and ten S. algae strains showed ≥97.7 % average nucleotide identity and >78.9 % digital DDH, clearly above the recommended species thresholds. According to the rules of priority and in view of the results obtained, S. haliotis is to be considered a later heterotypic synonym of S. algae. Because the whole-genome sequence of Shewanella sp. strain MN-01 shares >99 % ANI with S. algae JCM 14758 T , it can be confidently identified as S. algae.

  3. Quantitative trait nucleotide analysis using Bayesian model selection.

    PubMed

    Blangero, John; Goring, Harald H H; Kent, Jack W; Williams, Jeff T; Peterson, Charles P; Almasy, Laura; Dyer, Thomas D

    2005-10-01

    Although much attention has been given to statistical genetic methods for the initial localization and fine mapping of quantitative trait loci (QTLs), little methodological work has been done to date on the problem of statistically identifying the most likely functional polymorphisms using sequence data. In this paper we provide a general statistical genetic framework, called Bayesian quantitative trait nucleotide (BQTN) analysis, for assessing the likely functional status of genetic variants. The approach requires the initial enumeration of all genetic variants in a set of resequenced individuals. These polymorphisms are then typed in a large number of individuals (potentially in families), and marker variation is related to quantitative phenotypic variation using Bayesian model selection and averaging. For each sequence variant a posterior probability of effect is obtained and can be used to prioritize additional molecular functional experiments. An example of this quantitative nucleotide analysis is provided using the GAW12 simulated data. The results show that the BQTN method may be useful for choosing the most likely functional variants within a gene (or set of genes). We also include instructions on how to use our computer program, SOLAR, for association analysis and BQTN analysis.

  4. Complete genome sequence of a novel avian paramyxovirus isolated from wild birds in South Korea.

    PubMed

    Jeong, Jipseol; Kim, Youngsik; An, Injung; Wang, Seung-Jun; Kim, Yongkwan; Lee, Hyun-Jeong; Choi, Kang-Seuk; Im, Se-Pyeong; Min, Wongi; Oem, Jae-Ku; Jheong, Weonhwa

    2018-01-01

    A novel avian paramyxovirus (APMV), Cheonsu1510, was isolated from wild bird feces in South Korea and serologically and genetically characterized. In hemagglutination inhibition tests, antiserum against Cheonsu1510 showed low reactivity with other APMVs and vice versa. The complete genome of Cheonsu1510 comprised 15,408 nucleotides, contained six open reading frames (3'-N-P-M-F-HN-L-5'), and showed low sequence identity to other APMVs (< 63%) and a unique genomic composition. Phylogenetic analysis revealed that Cheonsu1510 was related to but distinct from APMV-1, -9, and -15. These results suggest that Cheonsu1510 represents a new APMV serotype, APMV-17.

  5. Discovery, Validation and Characterization of 1039 Cattle Single Nucleotide Polymorphisms

    USDA-ARS?s Scientific Manuscript database

    We identified approximately 13000 putative single nucleotide polymorphisms (SNPs) by comparison of repeat-masked BAC-end sequences from the cattle RPCI-42 BAC library with whole-genome shotgun contigs of cattle genome assembly Btau 1.0. Genotyping of a subset of these SNPs was performed on a panel ...

  6. Evolution of Nucleotide Punctuation Marks: From Structural to Linear Signals.

    PubMed

    El Houmami, Nawal; Seligmann, Hervé

    2017-01-01

    We present an evolutionary hypothesis assuming that signals marking nucleotide synthesis (DNA replication and RNA transcription) evolved from multi- to unidimensional structures, and were carried over from transcription to translation. This evolutionary scenario presumes that signals combining secondary and primary nucleotide structures are evolutionary transitions. Mitochondrial replication initiation fits this scenario. Some observations reported in the literature corroborate that several signals for nucleotide synthesis function in translation, and vice versa. (a) Polymerase-induced frameshift mutations occur preferentially at translational termination signals (nucleotide deletion is interpreted as termination of nucleotide polymerization, paralleling the role of stop codons in translation). (b) Stem-loop hairpin presence/absence modulates codon-amino acid assignments, showing that translational signals sometimes combine primary and secondary nucleotide structures (here codon and stem-loop). (c) Homopolymer nucleotide triplets (AAA, CCC, GGG, TTT) cause transcriptional and ribosomal frameshifts. Here we find in recently described human mitochondrial RNAs that systematically lack mono-, dinucleotides after each trinucleotide (delRNAs) that delRNA triplets include 2x more homopolymers than mitogenome regions not covered by delRNA. Further analyses of delRNAs show that the natural circular code X (a little-known group of 20 translational signals enabling ribosomal frame retrieval consisting of 20 codons {AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC} universally overrepresented in coding versus other frames of gene sequences), regulates frameshift in transcription and translation. This dual transcription and translation role confirms for X the hypothesis that translational signals were carried over from transcriptional signals.

  7. Development of Prevotella intermedia-specific PCR primers based on the nucleotide sequences of a DNA probe Pig27.

    PubMed

    Kim, Min Jung; Hwang, Kyung Hwan; Lee, Young-Seok; Park, Jae-Yoon; Kook, Joong-Ki

    2011-03-01

    The aim of this study was to develop Prevotella intermedia-specific PCR primers based on the P. intermedia-specific DNA probe. The P. intermedia-specific DNA probe was screened by inverted dot blot hybridization and confirmed by Southern blot hybridization. The nucleotide sequences of the species-specific DNA probes were determined using a chain termination method. Southern blot analysis showed that the DNA probe, Pig27, detected only the genomic DNA of P. intermedia strains. PCR showed that the PCR primers, Pin-F1/Pin-R1, had species-specificity for P. intermedia. The detection limits of the PCR primer sets were 0.4pg of the purified genomic DNA of P. intermedia ATCC 49046. These results suggest that the PCR primers, Pin-F1/Pin-R1, could be useful in the detection of P. intermedia as well as in the development of a PCR kit in epidemiological studies related to periodontal diseases. Crown Copyright © 2010. Published by Elsevier B.V. All rights reserved.

  8. Sequence and phylogenetic analysis of chicken anaemia virus obtained from backyard and commercial chickens in Nigeria.

    PubMed

    Oluwayelu, D O; Todd, D; Olaleye, O D

    2008-12-01

    This work reports the first molecular analysis study of chicken anaemia virus (CAV) in backyard chickens in Africa using molecular cloning and sequence analysis to characterize CAV strains obtained from commercial chickens and Nigerian backyard chickens. Partial VP1 gene sequences were determined for three CAVs from commercial chickens and for six CAV variants present in samples from a backyard chicken. Multiple alignment analysis revealed that the 6% and 4% nucleotide diversity obtained respectively for the commercial and backyard chicken strains translated to only 2% amino acid diversity for each breed. Overall, the amino acid composition of Nigerian CAVs was found to be highly conserved. Since the partial VP1 gene sequence of two backyard chicken cloned CAV strains (NGR/CI-8 and NGR/CI-9) were almost identical and evolutionarily closely related to the commercial chicken strains NGR-1, and NGR-4 and NGR-5, respectively, we concluded that CAV infections had crossed the farm boundary.

  9. Complete genome sequence of Streptococcus troglodytae TKU31 isolated from the oral cavity of a chimpanzee (Pan troglodytes).

    PubMed

    Okamoto, Masaaki; Naito, Mariko; Miyanohara, Mayu; Imai, Susumu; Nomura, Yoshiaki; Saito, Wataru; Momoi, Yasuko; Takada, Kazuko; Miyabe-Nishiwaki, Takako; Tomonaga, Masaki; Hanada, Nobuhiro

    2016-12-01

    Streptococcus troglodytae TKU31 was isolated from the oral cavity of a chimpanzee (Pan troglodytes) and was found to be the most closely related species of the mutans group streptococci to Streptococcus mutans. The complete sequence of TKU31 genome consists of a single circular chromosome that is 2,097,874 base pairs long and has a G + C content of 37.18%. It possesses 2082 coding sequences (CDSs), 65 tRNAs and five rRNA operons (15 rRNAs). Two clustered regularly interspaced short palindromic repeats, six insertion sequences and two predicted prophage elements were identified. The genome of TKU31 harbors some putative virulence associated genes, including gtfB, gtfC and gtfD genes encoding glucosyltransferase and gbpA, gbpB, gbpC and gbpD genes encoding glucan-binding cell wall-anchored protein. The deduced amino acid identity of the rhamnose-glucose polysaccharide F gene (rgpF), which is one of the serotype determinants, is 91% identical with that of S. mutans LJ23 (serotype k) strain. However, two other virulence-associated genes cnm and cbm, which encode the collagen-binding proteins, were not found in the TKU31 genome. The complete genome sequence of S. troglodytae TKU31 has been deposited at DDBJ/European Nucleotide Archive/GenBank under the accession no. AP014612. © 2016 The Societies and John Wiley & Sons Australia, Ltd.

  10. Minimap2: pairwise alignment for nucleotide sequences.

    PubMed

    Li, Heng

    2018-05-10

    Recent advances in sequencing technologies promise ultra-long reads of ∼100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥ 100bp in length, ≥1kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. https://github.com/lh3/minimap2. hengli@broadinstitute.org.

  11. ABI Base Recall: Automatic Correction and Ends Trimming of DNA Sequences.

    PubMed

    Elyazghi, Zakaria; Yazouli, Loubna El; Sadki, Khalid; Radouani, Fouzia

    2017-12-01

    Automated DNA sequencers produce chromatogram files in ABI format. When viewing chromatograms, some ambiguities are shown at various sites along the DNA sequences, because the program implemented in the sequencing machine and used to call bases cannot always precisely determine the right nucleotide, especially when it is represented by either a broad peak or a set of overlaying peaks. In such cases, a letter other than A, C, G, or T is recorded, most commonly N. Thus, DNA sequencing chromatograms need manual examination: checking for mis-calls and truncating the sequence when errors become too frequent. The purpose of this paper is to develop a program allowing the automatic correction of these ambiguities. This application is a Web-based program powered by Shiny and runs under R platform for an easy exploitation. As a part of the interface, we added the automatic ends clipping option, alignment against reference sequences, and BLAST. To develop and test our tool, we collected several bacterial DNA sequences from different laboratories within Institut Pasteur du Maroc and performed both manual and automatic correction. The comparison between the two methods was carried out. As a result, we note that our program, ABI base recall, accomplishes good correction with a high accuracy. Indeed, it increases the rate of identity and coverage and minimizes the number of mismatches and gaps, hence it provides solution to sequencing ambiguities and saves biologists' time and labor.

  12. Diverse nucleotide compositions and sequence fluctuation in Rubisco protein genes

    NASA Astrophysics Data System (ADS)

    Holden, Todd; Dehipawala, S.; Cheung, E.; Bienaime, R.; Ye, J.; Tremberger, G., Jr.; Schneider, P.; Lieberman, D.; Cheung, T.

    2011-10-01

    The Rubisco protein-enzyme is arguably the most abundance protein on Earth. The biology dogma of transcription and translation necessitates the study of the Rubisco genes and Rubisco-like genes in various species. Stronger correlation of fractal dimension of the atomic number fluctuation along a DNA sequence with Shannon entropy has been observed in the studied Rubisco-like gene sequences, suggesting a more diverse evolutionary pressure and constraints in the Rubisco sequences. The strategy of using metal for structural stabilization appears to be an ancient mechanism, with data from the porphobilinogen deaminase gene in Capsaspora owczarzaki and Monosiga brevicollis. Using the chi-square distance probability, our analysis supports the conjecture that the more ancient Rubisco-like sequence in Microcystis aeruginosa would have experienced very different evolutionary pressure and bio-chemical constraint as compared to Bordetella bronchiseptica, the two microbes occupying either end of the correlation graph. Our exploratory study would indicate that high fractal dimension Rubisco sequence would support high carbon dioxide rate via the Michaelis- Menten coefficient; with implication for the control of the whooping cough pathogen Bordetella bronchiseptica, a microbe containing a high fractal dimension Rubisco-like sequence (2.07). Using the internal comparison of chi-square distance probability for 16S rRNA (~ E-22) versus radiation repair Rec-A gene (~ E-05) in high GC content Deinococcus radiodurans, our analysis supports the conjecture that high GC content microbes containing Rubisco-like sequence are likely to include an extra-terrestrial origin, relative to Deinococcus radiodurans. Similar photosynthesis process that could utilize host star radiation would not compete with radiation resistant process from the biology dogma perspective in environments such as Mars and exoplanets.

  13. Short-Sequence DNA Repeats in Prokaryotic Genomes

    PubMed Central

    van Belkum, Alex; Scherer, Stewart; van Alphen, Loek; Verbrugh, Henri

    1998-01-01

    Short-sequence DNA repeat (SSR) loci can be identified in all eukaryotic and many prokaryotic genomes. These loci harbor short or long stretches of repeated nucleotide sequence motifs. DNA sequence motifs in a single locus can be identical and/or heterogeneous. SSRs are encountered in many different branches of the prokaryote kingdom. They are found in genes encoding products as diverse as microbial surface components recognizing adhesive matrix molecules and specific bacterial virulence factors such as lipopolysaccharide-modifying enzymes or adhesins. SSRs enable genetic and consequently phenotypic flexibility. SSRs function at various levels of gene expression regulation. Variations in the number of repeat units per locus or changes in the nature of the individual repeat sequences may result from recombination processes or polymerase inadequacy such as slipped-strand mispairing (SSM), either alone or in combination with DNA repair deficiencies. These rather complex phenomena can occur with relative ease, with SSM approaching a frequency of 10−4 per bacterial cell division and allowing high-frequency genetic switching. Bacteria use this random strategy to adapt their genetic repertoire in response to selective environmental pressure. SSR-mediated variation has important implications for bacterial pathogenesis and evolutionary fitness. Molecular analysis of changes in SSRs allows epidemiological studies on the spread of pathogenic bacteria. The occurrence, evolution and function of SSRs, and the molecular methods used to analyze them are discussed in the context of responsiveness to environmental factors, bacterial pathogenicity, epidemiology, and the availability of full-genome sequences for increasing numbers of microorganisms, especially those that are medically relevant. PMID:9618442

  14. Genetic differentiation between fake abalone and genuine Haliotis species using the forensically informative nucleotide sequencing (FINS) method.

    PubMed

    Ha, Wai Y; Reid, David G; Kam, Wan L; Lau, Yuk Y; Sham, Wing C; Tam, Silvia Y K; Sin, Della W M; Mok, Chuen S

    2011-05-25

    Abalones ( Haliotis species) are a popular delicacy and commonly preserved in dried form either whole or in slices or small pieces for consumption in Asian countries. Driven by the huge profit from trading abalones, dishonest traders may substitute other molluscan species for processed abalone, of which the morphological characteristics are frequently lost in the processed form. For protection of consumer rights and law enforcement against fraud, there is a need for an effective methodology to differentiate between fake and genuine abalone. This paper describes a method (validated according to the international forensic guidelines provided by SWGDAM) for the identification of fake abalone species using forensically informative nucleotide sequence (FINS) analysis. A study of the local market revealed that many claimed "abalone slice" samples on sale are not genuine. The fake abalone samples were found to be either volutids of the genus Cymbium (93%) or the muricid Concholepas concholepas (7%). This is the first report of Cymbium species being used for the preparation and sale as "abalone" in dried sliced form in Hong Kong.

  15. Zn-metalloprotease sequences in extremophiles

    NASA Astrophysics Data System (ADS)

    Holden, T.; Dehipawala, S.; Golebiewska, U.; Cheung, E.; Tremberger, G., Jr.; Williams, E.; Schneider, P.; Gadura, N.; Lieberman, D.; Cheung, T.

    2010-09-01

    The Zn-metalloprotease family contains conserved amino acid structures such that the nucleotide fluctuation at the DNA level would exhibit correlated randomness as described by fractal dimension. A nucleotide sequence fractal dimension can be calculated from a numerical series consisting of the atomic numbers of each nucleotide. The structure's vibration modes can also be studied using a Gaussian Network Model. The vibration measure and fractal dimension values form a two-dimensional plot with a standard vector metric that can be used for comparison of structures. The preference for amino acid usage in extremophiles may suppress nucleotide fluctuations that could be analyzed in terms of fractal dimension and Shannon entropy. A protein level cold adaptation study of the thermolysin Zn-metalloprotease family using molecular dynamics simulation was reported recently and our results show that the associated nucleotide fluctuation suppression is consistent with a regression pattern generated from the sequences's fractal dimension and entropy values (R-square { 0.98, N =5). It was observed that cold adaptation selected for high entropy and low fractal dimension values. Extension to the Archaemetzincin M54 family in extremophiles reveals a similar regression pattern (R-square = 0.98, N = 6). It was observed that the metalloprotease sequences of extremely halophilic organisms possess high fractal dimension and low entropy values as compared with non-halophiles. The zinc atom is usually bonded to the histidine residue, which shows limited levels of vibration in the Gaussian Network Model. The variability of the fractal dimension and entropy for a given protein structure suggests that extremophiles would have evolved after mesophiles, consistent with the bias usage of non-prebiotic amino acids by extremophiles. It may be argued that extremophiles have the capacity to offer extinction protection during drastic changes in astrobiological environments.

  16. Gallium plasmonic nanoparticles for label-free DNA and single nucleotide polymorphism sensing

    NASA Astrophysics Data System (ADS)

    Marín, Antonio García; García-Mendiola, Tania; Bernabeu, Cristina Navio; Hernández, María Jesús; Piqueras, Juan; Pau, Jose Luis; Pariente, Félix; Lorenzo, Encarnación

    2016-05-01

    A label-free DNA and single nucleotide polymorphism (SNP) sensing method is described. It is based on the use of the pseudodielectric function of gallium plasmonic nanoparticles (GaNPs) deposited on Si (100) substrates under reversal of the polarization handedness condition. Under this condition, the pseudodielectric function is extremely sensitive to changes in the surrounding medium of the nanoparticle surface providing an excellent sensing platform competitive to conventional surface plasmon resonance. DNA sensing has been carried out by immobilizing a thiolated capture probe sequence from Helicobacter pylori onto GaNP/Si substrates; complementary target sequences of Helicobacter pylori can be quantified over the range of 10 pM to 3.0 nM with a detection limit of 6.0 pM and a linear correlation coefficient of R2 = 0.990. The selectivity of the device allows the detection of a single nucleotide polymorphism (SNP) in a specific sequence of Helicobacter pylori, without the need for a hybridization suppressor in solution such as formamide. Furthermore, it also allows the detection of this sequence in the presence of other pathogens, such as Escherichia coli in the sample. The broad applicability of the system was demonstrated by the detection of a specific gene mutation directly associated with cystic fibrosis in large genomic DNA isolated from blood cells.A label-free DNA and single nucleotide polymorphism (SNP) sensing method is described. It is based on the use of the pseudodielectric function of gallium plasmonic nanoparticles (GaNPs) deposited on Si (100) substrates under reversal of the polarization handedness condition. Under this condition, the pseudodielectric function is extremely sensitive to changes in the surrounding medium of the nanoparticle surface providing an excellent sensing platform competitive to conventional surface plasmon resonance. DNA sensing has been carried out by immobilizing a thiolated capture probe sequence from Helicobacter pylori

  17. Variation in the number of nucleoli and incomplete homogenization of 18S ribosomal DNA sequences in leaf cells of the cultivated Oriental ginseng (Panax ginseng Meyer)

    PubMed Central

    Chelomina, Galina N.; Rozhkovan, Konstantin V.; Voronova, Anastasia N.; Burundukova, Olga L.; Muzarok, Tamara I.; Zhuravlev, Yuri N.

    2015-01-01

    Background Wild ginseng, Panax ginseng Meyer, is an endangered species of medicinal plants. In the present study, we analyzed variations within the ribosomal DNA (rDNA) cluster to gain insight into the genetic diversity of the Oriental ginseng, P. ginseng, at artificial plant cultivation. Methods The roots of wild P. ginseng plants were sampled from a nonprotected natural population of the Russian Far East. The slides were prepared from leaf tissues using the squash technique for cytogenetic analysis. The 18S rDNA sequences were cloned and sequenced. The distribution of nucleotide diversity, recombination events, and interspecific phylogenies for the total 18S rDNA sequence data set was also examined. Results In mesophyll cells, mononucleolar nuclei were estimated to be dominant (75.7%), while the remaining nuclei contained two to four nucleoli. Among the analyzed 18S rDNA clones, 20% were identical to the 18S rDNA sequence of P. ginseng from Japan, and other clones differed in one to six substitutions. The nucleotide polymorphism was more expressed at the positions 440–640 bp, and distributed in variable regions, expansion segments, and conservative elements of core structure. The phylogenetic analysis confirmed conspecificity of ginseng plants cultivated in different regions, with two fixed mutations between P. ginseng and other species. Conclusion This study identified the evidences of the intragenomic nucleotide polymorphism in the 18S rDNA sequences of P. ginseng. These data suggest that, in cultivated plants, the observed genome instability may influence the synthesis of biologically active compounds, which are widely used in traditional medicine. PMID:27158239

  18. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

    PubMed

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske

    2007-02-14

    The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses

  19. Nucleotide sequence of the 3' terminal region of lettuce mosaic potyvirus RNA shows a Gln/Val dipeptide at the cleavage site between the polymerase and the coat protein.

    PubMed

    Dinant, S; Lot, H; Albouy, J; Kuziak, C; Meyer, M; Astier-Manifacier, S

    1991-01-01

    DNA complementary to the 3' terminal 1651 nucleotides of the genome of the common strain of lettuce mosaic virus (LMV-O) has been cloned and sequenced. Microsequencing of the N-terminus enabled localization of the coat protein gene in this sequence. It showed also that the LMV coat protein coding region is at the 3' end of the genome, and that the coat protein is processed from a larger protein by cleavage at an unusual Q/V dipeptide between the polymerase and the coat protein. This is the first report of such a site for cleavage of a potyvirus polyprotein, where only Q/A, Q/S, and Q/G cleavage sites have been reported. The LMV coat protein gene encodes a 278 amino acid polypeptide with a calculated Mr of 31,171 and is flanked by a region which has a high degree of homology with the putative polymerase and a 3' untranslated region of 211 nucleotides in length. Percentage of homology with the coat protein of other potyviruses confirms that LMV is a distinct member of this group. Moreover, amino acid homologies noticed with the coat protein of potexvirus, bymovirus, and carlavirus elongated plant viruses suggest a functional significance for the conserved domains.

  20. The use of sequence-based SSR mining for the development of a vast collection of microsatellites in Aquilegia Formosa

    Treesearch

    Brandon Schlautman; Vera Pfeiffer; Juan Zalapa; Johanne Brunet

    2014-01-01

    Numerous microsatellite markers were developed for Aquilegia formosafrom sequences deposited within the Expressed Sequence Tag (EST), Genomic Survey Sequence (GSS), and Nucleotide databases in NCBI. Microsatellites (SSRs) were identified and primers were designed for 9 SSR containing sequences in the Nucleotide database, 3803 sequences in the EST...