Sample records for acid sequence studies

  1. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  2. Chip-based sequencing nucleic acids

    DOEpatents

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  3. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  4. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  5. Dipeptide Sequence Determination: Analyzing Phenylthiohydantoin Amino Acids by HPLC

    NASA Astrophysics Data System (ADS)

    Barton, Janice S.; Tang, Chung-Fei; Reed, Steven S.

    2000-02-01

    Amino acid composition and sequence determination, important techniques for characterizing peptides and proteins, are essential for predicting conformation and studying sequence alignment. This experiment presents improved, fundamental methods of sequence analysis for an upper-division biochemistry laboratory. Working in pairs, students use the Edman reagent to prepare phenylthiohydantoin derivatives of amino acids for determination of the sequence of an unknown dipeptide. With a single HPLC technique, students identify both the N-terminal amino acid and the composition of the dipeptide. This method yields good precision of retention times and allows use of a broad range of amino acids as components of the dipeptide. Students learn fundamental principles and techniques of sequence analysis and HPLC.

  6. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.

  7. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-07-21

    A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.

  8. Solid phase sequencing of double-stranded nucleic acids

    DOEpatents

    Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

    2002-01-01

    This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.

  9. Detection of nucleic acid sequences by invader-directed cleavage

    DOEpatents

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  10. Mouse Vk gene classification by nucleic acid sequence similarity.

    PubMed

    Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

    1989-01-01

    Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.

  11. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2006-07-04

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  12. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  13. Nucleotide sequence analysis of the gene encoding the Deinococcus radiodurans surface protein, derived amino acid sequence, and complementary protein chemical studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peters, J.; Peters, M.; Lottspeich, F.

    1987-11-01

    The complete nucleotide sequence of the gene encoding the surface (hexagonally packed intermediate (HPI))-layer polypeptide of Deinococcus radiodurans Sark was determined and found to encode a polypeptide of 1036 amino acids. Amino acid sequence analysis of about 30% of the residues revealed that the mature polypeptide consists of at least 978 amino acids. The N terminus was blocked to Edman degradation. The results of proteolytic modification of the HPI layer in situ and M/sub r/ estimations of the HPI polypeptide expressed in Escherichia coli indicated that there is a leader sequence. The N-terminal region contained a very high percentage (29%)more » of threonine and serine, including a cluster of nine consecutive serine or threonine residues, whereas a stretch near the C terminus was extremely rich in aromatic amino acids (29%). The protein contained at least two disulfide bridges, as well as tightly bound reducing sugars and fatty acids.« less

  14. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... DEPARTMENT OF COMMERCE Patent and Trademark Office Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request... Patent applications that contain nucleotide and/or amino acid sequence disclosures must include a copy of...

  15. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  16. Soil amino acid composition across a boreal forest successional sequence

    Treesearch

    Nancy R. Werdin-Pfisterer; Knut Kielland; Richard D. Boone

    2009-01-01

    Soil amino acids are important sources of organic nitrogen for plant nutrition, yet few studies have examined which amino acids are most prevalent in the soil. In this study, we examined the composition, concentration, and seasonal patterns of soil amino acids across a primary successional sequence encompassing a natural gradient of plant productivity and soil...

  17. Kit for detecting nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2001-01-01

    A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the

  18. Amino acid sequence of the human fibronectin receptor

    PubMed Central

    1987-01-01

    The amino acid sequence deduced from cDNA of the human placental fibronectin receptor is reported. The receptor is composed of two subunits: an alpha subunit of 1,008 amino acids which is processed into two polypeptides disulfide bonded to one another, and a beta subunit of 778 amino acids. Each subunit has near its COOH terminus a hydrophobic segment. This and other sequence features suggest a structure for the receptor in which the hydrophobic segments serve as transmembrane domains anchoring each subunit to the membrane and dividing each into a large ectodomain and a short cytoplasmic domain. The alpha subunit ectodomain has five sequence elements homologous to consensus Ca2+- binding sites of several calcium-binding proteins, and the beta subunit contains a fourfold repeat strikingly rich in cysteine. The alpha subunit sequence is 46% homologous to the alpha subunit of the vitronectin receptor. The beta subunit is 44% homologous to the human platelet adhesion receptor subunit IIIa and 47% homologous to a leukocyte adhesion receptor beta subunit. The high degree of homology (85%) of the beta subunit with one of the polypeptides of a chicken adhesion receptor complex referred to as integrin complex strongly suggests that the latter polypeptide is the chicken homologue of the fibronectin receptor beta subunit. These receptor subunit homologies define a superfamily of adhesion receptors. The availability of the entire protein sequence for the fibronectin receptor will facilitate studies on the functions of these receptors. PMID:2958481

  19. Hybridization and sequencing of nucleic acids using base pair mismatches

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  20. WEB-server for search of a periodicity in amino acid and nucleotide sequences

    NASA Astrophysics Data System (ADS)

    E Frenkel, F.; Skryabin, K. G.; Korotkov, E. V.

    2017-12-01

    A new web server (http://victoria.biengi.ac.ru/splinter/login.php) was designed and developed to search for periodicity in nucleotide and amino acid sequences. The web server operation is based upon a new mathematical method of searching for multiple alignments, which is founded on the position weight matrices optimization, as well as on implementation of the two-dimensional dynamic programming. This approach allows the construction of multiple alignments of the indistinctly similar amino acid and nucleotide sequences that accumulated more than 1.5 substitutions per a single amino acid or a nucleotide without performing the sequences paired comparisons. The article examines the principles of the web server operation and two examples of studying amino acid and nucleotide sequences, as well as information that could be obtained using the web server.

  1. The amino acid sequence of Staphylococcus aureus penicillinase.

    PubMed Central

    Ambler, R P

    1975-01-01

    The amino acid sequence of the penicillinase (penicillin amido-beta-lactamhydrolase, EC 3.5.2.6) from Staphylococcus aureus strain PC1 was determined. The protein consists of a single polypeptide chain of 257 residues, and the sequence was determined by characterization of tryptic, chymotryptic, peptic and CNBr peptides, with some additional evidence from thermolysin and S. aureus proteinase peptides. A mistake in the preliminary report of the sequence is corrected; residues 113-116 are now thought to be -Lys-Lys-Val-Lys- rather than -Lys-Val-Lys-Lys-. Detailed evidence for the amino acid sequence has been deposited as Supplementary Publication SUP 50056 (91 pages) at the British Library (Lending Division), Boston Spa, Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1975) 145, 5. PMID:1218078

  2. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1997-01-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided.

  3. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1997-04-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided. 7 figs.

  4. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  5. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.

  6. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-03-24

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.

  7. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification.

    PubMed

    Sinclair, Robert M; Ravantti, Janne J; Bamford, Dennis H

    2017-04-15

    Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids. Importantly

  8. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

    PubMed Central

    Sinclair, Robert M.; Ravantti, Janne J.

    2017-01-01

    ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids

  9. Amino acid sequence of the Amur tiger prion protein.

    PubMed

    Wu, Changde; Pang, Wanyong; Zhao, Deming

    2006-10-01

    Prion diseases are fatal neurodegenerative disorders in human and animal associated with conformational conversion of a cellular prion protein (PrP(C)) into the pathologic isoform (PrP(Sc)). Various data indicate that the polymorphisms within the open reading frame (ORF) of PrP are associated with the susceptibility and control the species barrier in prion diseases. In the present study, partial Prnp from 25 Amur tigers (tPrnp) were cloned and screened for polymorphisms. Four single nucleotide polymorphisms (T423C, A501G, C511A, A610G) were found; the C511A and A610G nucleotide substitutions resulted in the amino acid changes Lysine171Glutamine and Alanine204Threoine, respectively. The tPrnp amino acid sequence is similar to house cat (Felis catus ) and sheep, but differs significantly from other two cat Prnp sequences that were previously deposited in GenBank.

  10. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  11. Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides

    NASA Astrophysics Data System (ADS)

    McMillen, Chelsea L.; Wright, Patience M.; Cassady, Carolyn J.

    2016-05-01

    Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.

  12. Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides.

    PubMed

    McMillen, Chelsea L; Wright, Patience M; Cassady, Carolyn J

    2016-05-01

    Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.

  13. Complete amino acid sequence of bovine colostrum low-Mr cysteine proteinase inhibitor.

    PubMed

    Hirado, M; Tsunasawa, S; Sakiyama, F; Niinobe, M; Fujii, S

    1985-07-01

    The complete amino acid sequence of bovine colostrum cysteine proteinase inhibitor was determined by sequencing native inhibitor and peptides obtained by cyanogen bromide degradation, Achromobacter lysylendopeptidase digestion and partial acid hydrolysis of reduced and S-carboxymethylated protein. Achromobacter peptidase digestion was successfully used to isolate two disulfide-containing peptides. The inhibitor consists of 112 amino acids with an Mr of 12787. Two disulfide bonds were established between Cys 66 and Cys 77 and between Cys 90 and Cys 110. A high degree of homology in the sequence was found between the colostrum inhibitor and human gamma-trace, human salivary acidic protein and chicken egg-white cystatin.

  14. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    DOEpatents

    Nolan, John P [Santa Fe, NM; White, P Scott [Los Alamos, NM

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  15. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    PubMed Central

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583

  16. Detection and isolation of nucleic acid sequences using a bifunctional hybridization probe

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2000-01-01

    A method for detecting and isolating a target sequence in a sample of nucleic acids is provided using a bifunctional hybridization probe capable of hybridizing to the target sequence that includes a detectable marker and a first complexing agent capable of forming a binding pair with a second complexing agent. A kit is also provided for detecting a target sequence in a sample of nucleic acids using a bifunctional hybridization probe according to this method.

  17. Protein location prediction using atomic composition and global features of the amino acid sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less

  18. Complete Amino Acid Sequence of a Copper/Zinc-Superoxide Dismutase from Ginger Rhizome.

    PubMed

    Nishiyama, Yuki; Fukamizo, Tamo; Yoneda, Kazunari; Araki, Tomohiro

    2017-04-01

    Superoxide dismutase (SOD) is an antioxidant enzyme protecting cells from oxidative stress. Ginger (Zingiber officinale) is known for its antioxidant properties, however, there are no data on SODs from ginger rhizomes. In this study, we purified SOD from the rhizome of Z. officinale (Zo-SOD) and determined its complete amino acid sequence using N terminal sequencing, amino acid analysis, and de novo sequencing by tandem mass spectrometry. Zo-SOD consists of 151 amino acids with two signature Cu/Zn-SOD motifs and has high similarity to other plant Cu/Zn-SODs. Multiple sequence alignment showed that Cu/Zn-binding residues and cysteines forming a disulfide bond, which are highly conserved in Cu/Zn-SODs, are also present in Zo-SOD. Phylogenetic analysis revealed that plant Cu/Zn-SODs clustered into distinct chloroplastic, cytoplasmic, and intermediate groups. Among them, only chloroplastic enzymes carried amino acid substitutions in the region functionally important for enzymatic activity, suggesting that chloroplastic SODs may have a function distinct from those of SODs localized in other subcellular compartments. The nucleotide sequence of the Zo-SOD coding region was obtained by reverse-translation, and the gene was synthesized, cloned, and expressed. The recombinant Zo-SOD demonstrated pH stability in the range of 5-10, which is similar to other reported Cu/Zn-SODs, and thermal stability in the range of 10-60 °C, which is higher than that for most plant Cu/Zn-SODs but lower compared to the enzyme from a Z. officinale relative Curcuma aromatica.

  19. Preferential amino acid sequences in alumina-catalyzed peptide bond formation.

    PubMed

    Bujdák, J; Rode, B M

    2002-05-21

    The catalytic effect of activated alumina on amino acid condensation was investigated. The readiness of amino acids to form peptide sequences was estimated on the basis of the yield of dipeptides and was found to decrease in the order glycine (Gly), alanine (Ala), leucine (Leu), valine (Val), proline (Pro). For example, approximately 15% Gly was converted to the dipeptide (Gly(2)), 5% to cyclic anhydride (cyc(Gly(2))) and small amounts of tri- (Gly(3)) and tetrapeptide (Gly(4)) were formed after 28 days. On the other hand, only trace amounts of Pro(2) were formed from proline under the same conditions. Preferential formation of certain sequences was observed in the mixed reaction systems containing two amino acids. For example, almost ten times more Gly-Val than Val-Gly was formed in the Gly+Val reaction system. The preferred sequences can be explained on the basis of an inductive effect that side groups have on the nucleophilicity and electrophilicity, respectively, of the amino and carboxyl groups. A comparison with published data of amino acid reactions in other reaction systems revealed that the main trends of preferential sequence formation were the same as those described for the salt-induced peptide formation (SIPF) reaction. The results of this work and other previously published papers show that alumina and related mineral surfaces might have played a crucial role in the prebiotic formation of the first peptides on the primitive earth.

  20. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  1. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2011-07-01 2011-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide and...

  2. The complete amino acid sequence of human skeletal-muscle fructose-bisphosphate aldolase.

    PubMed Central

    Freemont, P S; Dunbar, B; Fothergill-Gilmore, L A

    1988-01-01

    The complete amino acid sequence of human skeletal-muscle fructose-bisphosphate aldolase, comprising 363 residues, was determined. The sequence was deduced by automated sequencing of CNBr-cleavage, o-iodosobenzoic acid-cleavage, trypsin-digest and staphylococcal-proteinase-digest fragments. Comparison of the sequence with other class I aldolase sequences shows that the mammalian muscle isoenzyme is one of the most highly conserved enzymes known, with only about 2% of the residues changing per 100 million years. Non-mammalian aldolases appear to be evolving at the same rate as other glycolytic enzymes, with about 4% of the residues changing per 100 million years. Secondary-structure predictions are analysed in an accompanying paper [Sawyer, Fothergill-Gilmore & Freemont (1988) Biochem. J. 249, 789-793]. PMID:3355497

  3. Sequences Of Amino Acids For Human Serum Albumin

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  4. Evidence of Divergent Amino Acid Usage in Comparative Analyses of R5- and X4-Associated HIV-1 Vpr Sequences

    PubMed Central

    Antell, Gregory C.; Zhong, Wen; Kercher, Katherine; Passic, Shendra; Williams, Jean; Liu, Yucheng; James, Tony; Jacobson, Jeffrey M.; Szep, Zsofia

    2017-01-01

    Vpr is an HIV-1 accessory protein that plays numerous roles during viral replication, and some of which are cell type dependent. To test the hypothesis that HIV-1 tropism extends beyond the envelope into the vpr gene, studies were performed to identify the associations between coreceptor usage and Vpr variation in HIV-1-infected patients. Colinear HIV-1 Env-V3 and Vpr amino acid sequences were obtained from the LANL HIV-1 sequence database and from well-suppressed patients in the Drexel/Temple Medicine CNS AIDS Research and Eradication Study (CARES) Cohort. Genotypic classification of Env-V3 sequences as X4 (CXCR4-utilizing) or R5 (CCR5-utilizing) was used to group colinear Vpr sequences. To reveal the sequences associated with a specific coreceptor usage genotype, Vpr amino acid sequences were assessed for amino acid diversity and Jensen-Shannon divergence between the two groups. Five amino acid alphabets were used to comprehensively examine the impact of amino acid substitutions involving side chains with similar physiochemical properties. Positions 36, 37, 41, 89, and 96 of Vpr were characterized by statistically significant divergence across multiple alphabets when X4 and R5 sequence groups were compared. In addition, consensus amino acid switches were found at positions 37 and 41 in comparisons of the R5 and X4 sequence populations. These results suggest an evolutionary link between Vpr and gp120 in HIV-1-infected patients. PMID:28620613

  5. Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids

    PubMed Central

    Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

    2010-01-01

    Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614

  6. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data shall...

  7. Genome sequence of the highly weak-acid-tolerant Zygosaccharomyces bailii IST302, amenable to genetic manipulations and physiological studies.

    PubMed

    Palma, Margarida; Münsterkötter, Martin; Peça, João; Güldener, Ulrich; Sá-Correia, Isabel

    2017-06-01

    Zygosaccharomyces bailii is one of the most problematic spoilage yeast species found in the food and beverage industry particularly in acidic products, due to its exceptional resistance to weak acid stress. This article describes the annotation of the genome sequence of Z. bailii IST302, a strain recently proven to be amenable to genetic manipulations and physiological studies. The work was based on the annotated genomes of strain ISA1307, an interspecies hybrid between Z. bailii and a closely related species, and the Z. bailii reference strain CLIB 213T. The resulting genome sequence of Z. bailii IST302 is distributed through 105 scaffolds, comprising a total of 5142 genes and a size of 10.8 Mb. Contrasting with CLIB 213T, strain IST302 does not form cell aggregates, allowing its manipulation in the laboratory for genetic and physiological studies. Comparative cell cycle analysis with the haploid and diploid Saccharomyces cerevisiae strains BY4741 and BY4743, respectively, suggests that Z. bailii IST302 is haploid. This is an additional trait that makes this strain attractive for the functional analysis of non-essential genes envisaging the elucidation of mechanisms underlying its high tolerance to weak acid food preservatives, or the investigation and exploitation of the potential of this resilient yeast species as cell factory. © FEMS 2017.

  8. Amino acid sequence of a trypsin inhibitor from a Spirometra (Spirometra erinaceieuropaei).

    PubMed

    Sanda, A; Uchida, A; Itagaki, T; Kobayashi, H; Inokuchi, N; Koyama, T; Iwama, M; Ohgi, K; Irie, M

    2001-12-01

    A trypsin inhibitor that is highly homologous with bovine pancreatic trypsin inhibitor (BPTI) was co-purified along with RNase from Spirometra (Spirometra erinaceieuropaei). The amino acid sequence of this inhibitor (SETI) and the nucleotide sequence of the cDNA encoding this protein were determined by protein chemistry and gene technology. SETI contains 68 amino acid residues and has a molecular mass of 7,798 Da. SETI has 31 amino acid residues that are identical with BPTI's sequence, including 6 half-cystine and 5 aromatic amino acid residues. The active site Lys residue in BPTI is replaced by an Arg residue in SETI. SETI is an effective inhibitor of trypsin and moderately inhibits a-chymotrypsin, but less inhibits elastase or subtilisin. SETI was expressed by E. coli containing a PelB vector carrying the SETI encoding cDNA; an expression yield of 0.68 mg/l was obtained. The phylogenetic relationship of SETI and the other BPTI-like trypsin inhibitors was analyzed using most likelihood inference methods.

  9. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  10. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  11. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  12. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  13. 37 CFR 1.823 - Requirements for nucleotide and/or amino acid sequences as part of the application.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... and/or amino acid sequences as part of the application. 1.823 Section 1.823 Patents, Trademarks, and... Amino Acid Sequences § 1.823 Requirements for nucleotide and/or amino acid sequences as part of the... incorporation-by-reference of the Sequence Listing as required by § 1.52(e)(5). The presentation of the...

  14. NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

    PubMed

    Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N

    2016-11-01

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.

  15. Amino acid sequence of tyrosinase from Neurospora crassa.

    PubMed Central

    Lerch, K

    1978-01-01

    The amino-acid sequence of tyrosinase from Neurospora crassa (monophenol,dihydroxyphenylalanine:oxygen oxidoreductase, EC 1.14.18.1) is reported. This copper-containing oxidase consists of a single polypeptide chain of 407 amino acids. The primary structure was determined by automated and manual sequence analysis on fragments produced by cleavage with cyanogen bromide and on peptides obtained by digestion with trypsin, pepsin, thermolysin, or chymotrypsin. The amino terminus of the protein is acetylated and the single cysteinyl residue 96 is covalently linked via a thioether bridge to histidyl residue 94. The formation and the possible role of this unusual structure in Neurospora tyrosinase is discussed. Dye-sensitized photooxidation of apotyrosinase and active-site-directed inactivation of the native enzyme indicate the possible involvement of histidyl residues 188, 192, 289, and 305 or 306 as ligands to the active-site copper as well as in the catalytic mechanism of this monooxygenase. PMID:151279

  16. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  17. PubDNA Finder: a web database linking full-text articles to sequences of nucleic acids.

    PubMed

    García-Remesal, Miguel; Cuevas, Alejandro; Pérez-Rey, David; Martín, Luis; Anguita, Alberto; de la Iglesia, Diana; de la Calle, Guillermo; Crespo, José; Maojo, Víctor

    2010-11-01

    PubDNA Finder is an online repository that we have created to link PubMed Central manuscripts to the sequences of nucleic acids appearing in them. It extends the search capabilities provided by PubMed Central by enabling researchers to perform advanced searches involving sequences of nucleic acids. This includes, among other features (i) searching for papers mentioning one or more specific sequences of nucleic acids and (ii) retrieving the genetic sequences appearing in different articles. These additional query capabilities are provided by a searchable index that we created by using the full text of the 176 672 papers available at PubMed Central at the time of writing and the sequences of nucleic acids appearing in them. To automatically extract the genetic sequences occurring in each paper, we used an original method we have developed. The database is updated monthly by automatically connecting to the PubMed Central FTP site to retrieve and index new manuscripts. Users can query the database via the web interface provided. PubDNA Finder can be freely accessed at http://servet.dia.fi.upm.es:8080/pubdnafinder

  18. Complete complementary DNA-derived amino acid sequence of canine cardiac phospholamban.

    PubMed Central

    Fujii, J; Ueno, A; Kitano, K; Tanaka, S; Kadoma, M; Tada, M

    1987-01-01

    Complementary DNA (cDNA) clones specific for phospholamban of sarcoplasmic reticulum membranes have been isolated from a canine cardiac cDNA library. The amino acid sequence deduced from the cDNA sequence indicates that phospholamban consists of 52 amino acid residues and lacks an amino-terminal signal sequence. The protein has an inferred mol wt 6,080 that is in agreement with its apparent monomeric mol wt 6,000, estimated previously by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Phospholamban contains two distinct domains, a hydrophilic region at the amino terminus (domain I) and a hydrophobic region at the carboxy terminus (domain II). We propose that domain I is localized at the cytoplasmic surface and offers phosphorylatable sites whereas domain II is anchored into the sarcoplasmic reticulum membrane. PMID:3793929

  19. CDSbank: taxonomy-aware extraction, selection, renaming and formatting of protein-coding DNA or amino acid sequences.

    PubMed

    Hazes, Bart

    2014-02-28

    Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.

  20. The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase.

    PubMed Central

    Haggarty, N W; Dunbar, B; Fothergill, L A

    1983-01-01

    The complete amino acid sequence of human erythrocyte diphosphoglycerate mutase, comprising 239 residues, was determined. The sequence was deduced from the four cyanogen bromide fragments, and from the peptides derived from these fragments after digestion with a number of proteolytic enzymes. Comparison of this sequence with that of the yeast glycolytic enzyme, phosphoglycerate mutase, shows that these enzymes are 47% identical. Most, but not all, of the residues implicated as being important for the activity of the glycolytic mutase are conserved in the erythrocyte diphosphoglycerate mutase. PMID:6313356

  1. Amino acid selective unlabeling for sequence specific resonance assignments in proteins

    PubMed Central

    Krishnarjuna, B.; Jaipuria, Garima; Thakur, Anushikha

    2010-01-01

    Sequence specific resonance assignment constitutes an important step towards high-resolution structure determination of proteins by NMR and is aided by selective identification and assignment of amino acid types. The traditional approach to selective labeling yields only the chemical shifts of the particular amino acid being selected and does not help in establishing a link between adjacent residues along the polypeptide chain, which is important for sequential assignments. An alternative approach is the method of amino acid selective ‘unlabeling’ or reverse labeling, which involves selective unlabeling of specific amino acid types against a uniformly 13C/15N labeled background. Based on this method, we present a novel approach for sequential assignments in proteins. The method involves a new NMR experiment named, {12COi–15Ni+1}-filtered HSQC, which aids in linking the 1HN/15N resonances of the selectively unlabeled residue, i, and its C-terminal neighbor, i + 1, in HN-detected double and triple resonance spectra. This leads to the assignment of a tri-peptide segment from the knowledge of the amino acid types of residues: i − 1, i and i + 1, thereby speeding up the sequential assignment process. The method has the advantage of being relatively inexpensive, applicable to 2H labeled protein and can be coupled with cell-free synthesis and/or automated assignment approaches. A detailed survey involving unlabeling of different amino acid types individually or in pairs reveals that the proposed approach is also robust to misincorporation of 14N at undesired sites. Taken together, this study represents the first application of selective unlabeling for sequence specific resonance assignments and opens up new avenues to using this methodology in protein structural studies. Electronic supplementary material The online version of this article (doi:10.1007/s10858-010-9459-z) contains supplementary material, which is available to authorized users. PMID:21153044

  2. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F. William

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.

  3. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F.W.

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.

  4. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  5. N-Terminal Amino Acid Sequence Determination of Proteins by N-Terminal Dimethyl Labeling: Pitfalls and Advantages When Compared with Edman Degradation Sequence Analysis.

    PubMed

    Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P; Marians, Kenneth J; Erdjument-Bromage, Hediye

    2016-07-01

    In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods.

  6. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Myers, G.; Foley, B.; Korber, B.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived.more » Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.« less

  7. Streptococcal phosphoenolpyruvate-sugar phosphotransferase system: amino acid sequence and site of ATP-dependent phosphorylation of HPr

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deutscher, J.; Pevec, B.; Beyreuther, K.

    1986-10-21

    The amino acid sequence of histidine-containing protein (HPr) from Streptococcus faecalis has been determined by direct Edman degradation of intact HPr and by amino acid sequence analysis of tryptic peptides, V8 proteolyptic peptides, thermolytic peptides, and cyanogen bromide cleavage products. HPr from S. faecalis was found to contain 89 amino acid residues, corresponding to a molecular weight of 9438. The amino acid sequence of HPr from S. faecalis shows extended homology to the primary structure of HPr proteins from other bacteria. Besides the phosphoenolpyruvate-dependent phosphorylation of a histidyl residue in HPr, catalyzed by enzyme I of the bacterial phosphotransferase system,more » HPr was also found to be phosphorylated at a seryl residue in an ATP-dependent protein kinase catalyzed reaction. The site of ATP-dependent phosphorylation in HPr of S faecalis has now been determined. (/sup 32/P)P-Ser-HPr was digested with three different proteases, and in each case, a single labeled peptide was isolated. Following digestion with subtilisin, they obtained a peptide with the sequence -(P)Ser-Ile-Met-. Using chymotrypsin, they isolated a peptide with the sequence -Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-Gly-Val-Met-. The longest labeled peptide was obtained with V8 staphylococcal protease. According to amino acid analysis, this peptide contained 36 out of the 89 amino acid residues of HPr. The following sequence of 12 amino acid residues of the V8 peptide was determined: -Tyr-Lys-Gly-Lys-Ser-Val-Asn-Leu-Lys-(P)Ser-Ile-Met-. Thus, the site of ATP-dependent phosphorylation was determined to be Ser-46 within the primary structure of HPr.« less

  8. Diagnostics based on nucleic acid sequence variant profiling: PCR, hybridization, and NGS approaches.

    PubMed

    Khodakov, Dmitriy; Wang, Chunyan; Zhang, David Yu

    2016-10-01

    Nucleic acid sequence variations have been implicated in many diseases, and reliable detection and quantitation of DNA/RNA biomarkers can inform effective therapeutic action, enabling precision medicine. Nucleic acid analysis technologies being translated into the clinic can broadly be classified into hybridization, PCR, and sequencing, as well as their combinations. Here we review the molecular mechanisms of popular commercial assays, and their progress in translation into in vitro diagnostics. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  9. "De-novo" amino acid sequence elucidation of protein G'e by combined "top-down" and "bottom-up" mass spectrometry.

    PubMed

    Yefremova, Yelena; Al-Majdoub, Mahmoud; Opuni, Kwabena F M; Koy, Cornelia; Cui, Weidong; Yan, Yuetian; Gross, Michael L; Glocker, Michael O

    2015-03-01

    Mass spectrometric de-novo sequencing was applied to review the amino acid sequence of a commercially available recombinant protein G´ with great scientific and economic importance. Substantial deviations to the published amino acid sequence (Uniprot Q54181) were found by the presence of 46 additional amino acids at the N-terminus, including a so-called "His-tag" as well as an N-terminal partial α-N-gluconoylation and α-N-phosphogluconoylation, respectively. The unexpected amino acid sequence of the commercial protein G' comprised 241 amino acids and resulted in a molecular mass of 25,998.9 ± 0.2 Da for the unmodified protein. Due to the higher mass that is caused by its extended amino acid sequence compared with the original protein G' (185 amino acids), we named this protein "protein G'e." By means of mass spectrometric peptide mapping, the suggested amino acid sequence, as well as the N-terminal partial α-N-gluconoylations, was confirmed with 100% sequence coverage. After the protein G'e sequence was determined, we were able to determine the expression vector pET-28b from Novagen with the Xho I restriction enzyme cleavage site as the best option that was used for cloning and expressing the recombinant protein G'e in E. coli. A dissociation constant (K(d)) value of 9.4 nM for protein G'e was determined thermophoretically, showing that the N-terminal flanking sequence extension did not cause significant changes in the binding affinity to immunoglobulins.

  10. Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

    PubMed

    Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

    2014-09-18

    Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.

  11. Extension of the COG and arCOG databases by amino acid and nucleotide sequences

    PubMed Central

    Meereis, Florian; Kaufmann, Michael

    2008-01-01

    Background The current versions of the COG and arCOG databases, both excellent frameworks for studies in comparative and functional genomics, do not contain the nucleotide sequences corresponding to their protein or protein domain entries. Results Using sequence information obtained from GenBank flat files covering the completely sequenced genomes of the COG and arCOG databases, we constructed NUCOCOG (nucleotide sequences containing COG databases) as an extended version including all nucleotide sequences and in addition the amino acid sequences originally utilized to construct the current COG and arCOG databases. We make available three comprehensive single XML files containing the complete databases including all sequence information. In addition, we provide a web interface as a utility suitable to browse the NUCOCOG database for sequence retrieval. The database is accessible at . Conclusion NUCOCOG offers the possibility to analyze any sequence related property in the context of the COG and arCOG framework simply by using script languages such as PERL applied to a large but single XML document. PMID:19014535

  12. Correlation between fibroin amino acid sequence and physical silk properties.

    PubMed

    Fedic, Robert; Zurovec, Michal; Sehnal, Frantisek

    2003-09-12

    The fiber properties of lepidopteran silk depend on the amino acid repeats that interact during H-fibroin polymerization. The aim of our research was to relate repeat composition to insect biology and fiber strength. Representative regions of the H-fibroin genes were sequenced and analyzed in three pyralid species: wax moth (Galleria mellonella), European flour moth (Ephestia kuehniella), and Indian meal moth (Plodia interpunctella). The amino acid repeats are species-specific, evidently a diversification of an ancestral region of 43 residues, and include three types of regularly dispersed motifs: modifications of GSSAASAA sequence, stretches of tripeptides GXZ where X and Z represent bulky residues, and sequences similar to PVIVIEE. No concatenations of GX dipeptide or alanine, which are typical for Bombyx silkworms and Antheraea silk moths, respectively, were found. Despite different repeat structure, the silks of G. mellonella and E. kuehniella exhibit similar tensile strength as the Bombyx and Antheraea silks. We suggest that in these latter two species, variations in the repeat length obstruct repeat alignment, but sufficiently long stretches of iterated residues get superposed to interact. In the pyralid H-fibroins, interactions of the widely separated and diverse motifs depend on the precision of repeat matching; silk is strong in G. mellonella and E. kuehniella, with 2-3 types of long homogeneous repeats, and nearly 10 times weaker in P. interpunctella, with seven types of shorter erratic repeats. The high proportion of large amino acids in the H-fibroin of pyralids has probably evolved in connection with the spinning habit of caterpillars that live in protective silk tubes and spin continuously, enlarging the tubes on one end and partly devouring the other one. The silk serves as a depot of energetically rich and essential amino acids that may be scarce in the diet.

  13. Homology analyses of the protein sequences of fatty acid synthases from chicken liver, rat mammary gland, and yeast

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chang, Soo-Ik; Hammes, G.G.

    1989-11-01

    Homology analyses of the protein sequences of chicken liver and rat mammary gland fatty acid synthases were carried out. The amino acid sequences of the chicken and rat enzymes are 67% identical. If conservative substitutions are allowed, 78% of the amino acids are matched. A region of low homologies exists between the functional domains, in particular around amino acid residues 1059-1264 of the chicken enzyme. Homologies between the active sites of chicken and rat and of chicken and yeast enzymes have been analyzed by an alignment method. A high degree of homology exists between the active sites of the chickenmore » and rat enzymes. However, the chicken and yeast enzymes show a lower degree of homology. The DADPH-binding dinucleotide folds of the {beta}-ketoacyl reductase and the enoyl reductase sites were identified by comparison with a known consensus sequence for the DADP- and FAD-binding dinucleotide folds. The active sites of all of the enzymes are primarily in hydrophobic regions of the protein. This study suggests that the genes for the functional domains of fatty acid synthase were originally separated, and these genes were connected to each other by using different connecting nucleotide sequences in different species. An alternative explanation for the differences in rat and chicken is a common ancestry and mutations in the joining regions during evolution.« less

  14. Characterization of tannase protein sequences of bacteria and fungi: an in silico study.

    PubMed

    Banerjee, Amrita; Jana, Arijit; Pati, Bikash R; Mondal, Keshab C; Das Mohapatra, Pradeep K

    2012-04-01

    The tannase protein sequences of 149 bacteria and 36 fungi were retrieved from NCBI database. Among them only 77 bacterial and 31 fungal tannase sequences were taken which have different amino acid compositions. These sequences were analysed for different physical and chemical properties, superfamily search, multiple sequence alignment, phylogenetic tree construction and motif finding to find out the functional motif and the evolutionary relationship among them. The superfamily search for these tannase exposed the occurrence of proline iminopeptidase-like, biotin biosynthesis protein BioH, O-acetyltransferase, carboxylesterase/thioesterase 1, carbon-carbon bond hydrolase, haloperoxidase, prolyl oligopeptidase, C-terminal domain and mycobacterial antigens families and alpha/beta hydrolase superfamily. Some bacterial and fungal sequence showed similarity with different families individually. The multiple sequence alignment of these tannase protein sequences showed conserved regions at different stretches with maximum homology from amino acid residues 389-469 and 482-523 which could be used for designing degenerate primers or probes specific for tannase producing bacterial and fungal species. Phylogenetic tree showed two different clusters; one has only bacteria and another have both fungi and bacteria showing some relationship between these different genera. Although in second cluster near about all fungal species were found together in a corner which indicates the sequence level similarity among fungal genera. The distributions of fourteen motifs analysis revealed Motif 1 with a signature amino acid sequence of 29 amino acids, i.e. GCSTGGREALKQAQRWPHDYDGIIANNPA, was uniformly observed in 83.3 % of studied tannase sequences representing its participation with the structure and enzymatic function.

  15. Synthetic oligonucleotide probes deduced from amino acid sequence data. Theoretical and practical considerations.

    PubMed

    Lathe, R

    1985-05-05

    Synthetic probes deduced from amino acid sequence data are widely used to detect cognate coding sequences in libraries of cloned DNA segments. The redundancy of the genetic code dictates that a choice must be made between (1) a mixture of probes reflecting all codon combinations, and (2) a single longer "optimal" probe. The second strategy is examined in detail. The frequency of sequences matching a given probe by chance alone can be determined and also the frequency of sequences closely resembling the probe and contributing to the hybridization background. Gene banks cannot be treated as random associations of the four nucleotides, and probe sequences deduced from amino acid sequence data occur more often than predicted by chance alone. Probe lengths must be increased to confer the necessary specificity. Examination of hybrids formed between unique homologous probes and their cognate targets reveals that short stretches of perfect homology occurring by chance make a significant contribution to the hybridization background. Statistical methods for improving homology are examined, taking human coding sequences as an example, and considerations of codon utilization and dinucleotide frequencies yield an overall homology of greater than 82%. Recommendations for probe design and hybridization are presented, and the choice between using multiple probes reflecting all codon possibilities and a unique optimal probe is discussed.

  16. Complete cDNA sequence and amino acid analysis of a bovine ribonuclease K6 gene.

    PubMed

    Pietrowski, D; Förster, M

    2000-01-01

    The complete cDNA sequence of a ribonuclease k6 gene of Bos Taurus has been determined. It codes for a protein with 154 amino acids and contains the invariant cysteine, histidine and lysine residues as well as the characteristic motifs specific to ribonuclease active sites. The deduced protein sequence is 27 residues longer than other known ribonucleases k6 and shows amino acids exchanges which could reflect a strain specificity or polymorphism within the bovine genome. Based on sequence similarity we have termed the identified gene bovine ribonuclease k6 b (brk6b).

  17. Evolutionary Distance of Amino Acid Sequence Orthologs across Macaque Subspecies: Identifying Candidate Genes for SIV Resistance in Chinese Rhesus Macaques

    PubMed Central

    Ross, Cody T.; Roodgar, Morteza; Smith, David Glenn

    2015-01-01

    We use the Reciprocal Smallest Distance (RSD) algorithm to identify amino acid sequence orthologs in the Chinese and Indian rhesus macaque draft sequences and estimate the evolutionary distance between such orthologs. We then use GOanna to map gene function annotations and human gene identifiers to the rhesus macaque amino acid sequences. We conclude methodologically by cross-tabulating a list of amino acid orthologs with large divergence scores with a list of genes known to be involved in SIV or HIV pathogenesis. We find that many of the amino acid sequences with large evolutionary divergence scores, as calculated by the RSD algorithm, have been shown to be related to HIV pathogenesis in previous laboratory studies. Four of the strongest candidate genes for SIVmac resistance in Chinese rhesus macaques identified in this study are CDK9, CXCL12, TRIM21, and TRIM32. Additionally, ANKRD30A, CTSZ, GORASP2, GTF2H1, IL13RA1, MUC16, NMDAR1, Notch1, NT5M, PDCD5, RAD50, and TM9SF2 were identified as possible candidates, among others. We failed to find many laboratory experiments contrasting the effects of Indian and Chinese orthologs at these sites on SIVmac pathogenesis, but future comparative studies might hold fertile ground for research into the biological mechanisms underlying innate resistance to SIVmac in Chinese rhesus macaques. PMID:25884674

  18. Fatty Acid Profile and Unigene-Derived Simple Sequence Repeat Markers in Tung Tree (Vernicia fordii)

    PubMed Central

    Zhang, Lin; Jia, Baoguang; Tan, Xiaofeng; Thammina, Chandra S.; Long, Hongxu; Liu, Min; Wen, Shanna; Song, Xianliang; Cao, Heping

    2014-01-01

    Tung tree (Vernicia fordii) provides the sole source of tung oil widely used in industry. Lack of fatty acid composition and molecular markers hinders biochemical, genetic and breeding research. The objectives of this study were to determine fatty acid profiles and develop unigene-derived simple sequence repeat (SSR) markers in tung tree. Fatty acid profiles of 41 accessions showed that the ratio of α-eleostearic acid was increasing continuously with a parallel trend to the amount of tung oil accumulation while the ratios of other fatty acids were decreasing in different stages of the seeds and that α-eleostearic acid (18∶3) consisted of 77% of the total fatty acids in tung oil. Transcriptome sequencing identified 81,805 unigenes from tung cDNA library constructed using seed mRNA and discovered 6,366 SSRs in 5,404 unigenes. The di- and tri-nucleotide microsatellites accounted for 92% of the SSRs with AG/CT and AAG/CTT being the most abundant SSR motifs. Fifteen polymorphic genic-SSR markers were developed from 98 unigene loci tested in 41 cultivated tung accessions by agarose gel and capillary electrophoresis. Genbank database search identified 10 of them putatively coding for functional proteins. Quantitative PCR demonstrated that all 15 polymorphic SSR-associated unigenes were expressed in tung seeds and some of them were highly correlated with oil composition in the seeds. Dendrogram revealed that most of the 41 accessions were clustered according to the geographic region. These new polymorphic genic-SSR markers will facilitate future studies on genetic diversity, molecular fingerprinting, comparative genomics and genetic mapping in tung tree. The lipid profiles in the seeds of 41 tung accessions will be valuable for biochemical and breeding studies. PMID:25167054

  19. Amino acid sequence analysis of the annexin super-gene family of proteins.

    PubMed

    Barton, G J; Newman, R H; Freemont, P S; Crumpton, M J

    1991-06-15

    The annexins are a widespread family of calcium-dependent membrane-binding proteins. No common function has been identified for the family and, until recently, no crystallographic data existed for an annexin. In this paper we draw together 22 available annexin sequences consisting of 88 similar repeat units, and apply the techniques of multiple sequence alignment, pattern matching, secondary structure prediction and conservation analysis to the characterisation of the molecules. The analysis clearly shows that the repeats cluster into four distinct families and that greatest variation occurs within the repeat 3 units. Multiple alignment of the 88 repeats shows amino acids with conserved physicochemical properties at 22 positions, with only Gly at position 23 being absolutely conserved in all repeats. Secondary structure prediction techniques identify five conserved helices in each repeat unit and patterns of conserved hydrophobic amino acids are consistent with one face of a helix packing against the protein core in predicted helices a, c, d, e. Helix b is generally hydrophobic in all repeats, but contains a striking pattern of repeat-specific residue conservation at position 31, with Arg in repeats 4 and Glu in repeats 2, but unconserved amino acids in repeats 1 and 3. This suggests repeats 2 and 4 may interact via a buried saltbridge. The loop between predicted helices a and b of repeat 3 shows features distinct from the equivalent loop in repeats 1, 2 and 4, suggesting an important structural and/or functional role for this region. No compelling evidence emerges from this study for uteroglobin and the annexins sharing similar tertiary structures, or for uteroglobin representing a derivative of a primordial one-repeat structure that underwent duplication to give the present day annexins. The analyses performed in this paper are re-evaluated in the Appendix, in the light of the recently published X-ray structure for human annexin V. The structure confirms most of

  20. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2014-07-01 2014-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...

  1. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...

  2. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ...” means those amino acids other than “Xaa” and those nucleotide bases other than “n”defined in accordance... 37 Patents, Trademarks, and Copyrights 1 2012-07-01 2012-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences...

  3. Full genome virus detection in fecal samples using sensitive nucleic acid preparation, deep sequencing, and a novel iterative sequence classification algorithm.

    PubMed

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis.

  4. Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm

    PubMed Central

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106

  5. [Complete genome sequencing of polymalic acid-producing strain Aureobasidium pullulans CCTCC M2012223].

    PubMed

    Wang, Yongkang; Song, Xiaodan; Li, Xiaorong; Yang, Sang-tian; Zou, Xiang

    2017-01-04

    To explore the genome sequence of Aureobasidium pullulans CCTCC M2012223, analyze the key genes related to the biosynthesis of important metabolites, and provide genetic background for metabolic engineering. Complete genome of A. pullulans CCTCC M2012223 was sequenced by Illumina HiSeq high throughput sequencing platform. Then, fragment assembly, gene prediction, functional annotation, and GO/COG cluster were analyzed in comparison with those of other five A. pullulans varieties. The complete genome sequence of A. pullulans CCTCC M2012223 was 30756831 bp with an average GC content of 47.49%, and 9452 genes were successfully predicted. Genome-wide analysis showed that A. pullulans CCTCC M2012223 had the biggest genome assembly size. Protein sequences involved in the pullulan and polymalic acid pathway were highly conservative in all of six A. pullulans varieties. Although both A. pullulans CCTCC M2012223 and A. pullulans var. melanogenum have a close affinity, some point mutation and inserts were occurred in protein sequences involved in melanin biosynthesis. Genome information of A. pullulans CCTCC M2012223 was annotated and genes involved in melanin, pullulan and polymalic acid pathway were compared, which would provide a theoretical basis for genetic modification of metabolic pathway in A. pullulans.

  6. Complete amino acid sequence of ananain and a comparison with stem bromelain and other plant cysteine proteases.

    PubMed Central

    Lee, K L; Albee, K L; Bernasconi, R J; Edmunds, T

    1997-01-01

    The amino acid sequences of ananain (EC3.4.22.31) and stem bromelain (3.4.22.32), two cysteine proteases from pineapple stem, are similar yet ananain and stem bromelain possess distinct specificities towards synthetic peptide substrates and different reactivities towards the cysteine protease inhibitors E-64 and chicken egg white cystatin. We present here the complete amino acid sequence of ananain and compare it with the reported sequences of pineapple stem bromelain, papain and chymopapain from papaya and actinidin from kiwifruit. Ananain is comprised of 216 residues with a theoretical mass of 23464 Da. This primary structure includes a sequence insert between residues 170 and 174 not present in stem bromelain or papain and a hydrophobic series of amino acids adjacent to His-157. It is possible that these sequence differences contribute to the different substrate and inhibitor specificities exhibited by ananain and stem bromelain. PMID:9355753

  7. Evolution of sequence-defined highly functionalized nucleic acid polymers

    NASA Astrophysics Data System (ADS)

    Chen, Zhen; Lichtor, Phillip A.; Berliner, Adrian P.; Chen, Jonathan C.; Liu, David R.

    2018-03-01

    The evolution of sequence-defined synthetic polymers made of building blocks beyond those compatible with polymerase enzymes or the ribosome has the potential to generate new classes of receptors, catalysts and materials. Here we describe a ligase-mediated DNA-templated polymerization and in vitro selection system to evolve highly functionalized nucleic acid polymers (HFNAPs) made from 32 building blocks that contain eight chemically diverse side chains on a DNA backbone. Through iterated cycles of polymer translation, selection and reverse translation, we discovered HFNAPs that bind proprotein convertase subtilisin/kexin type 9 (PCSK9) and interleukin-6, two protein targets implicated in human diseases. Mutation and reselection of an active PCSK9-binding polymer yielded evolved polymers with high affinity (KD = 3 nM). This evolved polymer potently inhibited the binding between PCSK9 and the low-density lipoprotein receptor. Structure-activity relationship studies revealed that specific side chains at defined positions in the polymers are required for binding to their respective targets. Our findings expand the chemical space of evolvable polymers to include densely functionalized nucleic acids with diverse, researcher-defined chemical repertoires.

  8. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    PubMed

    Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

    2015-01-01

    Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  9. ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

    PubMed

    Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

    2012-09-08

    The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.

  10. Identification and Analysis of Novel Amino-Acid Sequence Repeats in Bacillus anthracis str. Ames Proteome Using Computational Tools

    PubMed Central

    Hemalatha, G. R.; Rao, D. Satyanarayana; Guruprasad, L.

    2007-01-01

    We have identified four repeats and ten domains that are novel in proteins encoded by the Bacillus anthracis str. Ames proteome using automated in silico methods. A “repeat” corresponds to a region comprising less than 55-amino-acid residues that occur more than once in the protein sequence and sometimes present in tandem. A “domain” corresponds to a conserved region with greater than 55-amino-acid residues and may be present as single or multiple copies in the protein sequence. These correspond to (1) 57-amino-acid-residue PxV domain, (2) 122-amino-acid-residue FxF domain, (3) 111-amino-acid-residue YEFF domain, (4) 109-amino-acid-residue IMxxH domain, (5) 103-amino-acid-residue VxxT domain, (6) 84-amino-acid-residue ExW domain, (7) 104-amino-acid-residue NTGFIG domain, (8) 36-amino-acid-residue NxGK repeat, (9) 95-amino-acid-residue VYV domain, (10) 75-amino-acid-residue KEWE domain, (11) 59-amino-acid-residue AFL domain, (12) 53-amino-acid-residue RIDVK repeat, (13) (a) 41-amino-acid-residue AGQF repeat and (b) 42-amino-acid-residue GSAL repeat. A repeat or domain type is characterized by specific conserved sequence motifs. We discuss the presence of these repeats and domains in proteins from other genomes and their probable secondary structure. PMID:17538688

  11. Identification of a novel bovine enterovirus possessing highly divergent amino acid sequences in capsid protein.

    PubMed

    Tsuchiaka, Shinobu; Rahpaya, Sayed Samim; Otomaru, Konosuke; Aoki, Hiroshi; Kishimoto, Mai; Naoi, Yuki; Omatsu, Tsutomu; Sano, Kaori; Okazaki-Terashima, Sachiko; Katayama, Yukie; Oba, Mami; Nagai, Makoto; Mizutani, Tetsuya

    2017-01-17

    Bovine enterovirus (BEV) belongs to the species Enterovirus E or F, genus Enterovirus and family Picornaviridae. Although numerous studies have identified BEVs in the feces of cattle with diarrhea, the pathogenicity of BEVs remains unclear. Previously, we reported the detection of novel kobu-like virus in calf feces, by metagenomics analysis. In the present study, we identified a novel BEV in diarrheal feces collected for that survey. Complete genome sequences were determined by deep sequencing in feces. Secondary RNA structure analysis of the 5' untranslated region (UTR), phylogenetic tree construction and pairwise identity analysis were conducted. The complete genome sequences of BEV were genetically distant from other EVs and the VP1 coding region contained novel and unique amino acid sequences. We named this strain as BEV AN12/Bos taurus/JPN/2014 (referred to as BEV-AN12). According to genome analysis, the genome length of this virus is 7414 nucleotides excluding the poly (A) tail and its genome consists of a 5'UTR, open reading frame encoding a single polyprotein, and 3'UTR. The results of secondary RNA structure analysis showed that in the 5'UTR, BEV-AN12 had an additional clover leaf structure and small stem loop structure, similarly to other BEVs. In pairwise identity analysis, BEV-AN12 showed high amino acid (aa) identities to Enterovirus F in the polyprotein, P2 and P3 regions (aa identity ≥82.4%). Therefore, BEV-AN12 is closely related to Enterovirus F. However, aa sequences in the capsid protein regions, particularly the VP1 encoding region, showed significantly low aa identity to other viruses in genus Enterovirus (VP1 aa identity ≤58.6%). In addition, BEV-AN12 branched separately from Enterovirus E and F in phylogenetic trees based on the aa sequences of P1 and VP1, although it clustered with Enterovirus F in trees based on sequences in the P2 and P3 genome region. We identified novel BEV possessing highly divergent aa sequences in the VP1 coding

  12. An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids

    PubMed Central

    Li, Yushuang; Yang, Jiasheng; Zhang, Yi

    2016-01-01

    In this paper, we have proposed a novel alignment-free method for comparing the similarity of protein sequences. We first encode a protein sequence into a 440 dimensional feature vector consisting of a 400 dimensional Pseudo-Markov transition probability vector among the 20 amino acids, a 20 dimensional content ratio vector, and a 20 dimensional position ratio vector of the amino acids in the sequence. By evaluating the Euclidean distances among the representing vectors, we compare the similarity of protein sequences. We then apply this method into the ND5 dataset consisting of the ND5 protein sequences of 9 species, and the F10 and G11 datasets representing two of the xylanases containing glycoside hydrolase families, i.e., families 10 and 11. As a result, our method achieves a correlation coefficient of 0.962 with the canonical protein sequence aligner ClustalW in the ND5 dataset, much higher than those of other 5 popular alignment-free methods. In addition, we successfully separate the xylanases sequences in the F10 family and the G11 family and illustrate that the F10 family is more heat stable than the G11 family, consistent with a few previous studies. Moreover, we prove mathematically an identity equation involving the Pseudo-Markov transition probability vector and the amino acids content ratio vector. PMID:27918587

  13. Human Retroviruses and AIDS. A compilation and analysis of nucleic acid and amino acid sequences: I--II; III--V

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Myers, G.; Korber, B.; Wain-Hobson, S.

    1993-12-31

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (I) HIV and SIV Nucleotide Sequences; (II) Amino Acid Sequences; (III) Analyses; (IV) Related Sequences; and (V) Database Communications. Information within all the parts is updated at least twice in each year, which accounts for the modes of binding and pagination in the compendium.

  14. GCPred: a web tool for guanylyl cyclase functional centre prediction from amino acid sequence.

    PubMed

    Xu, Nuo; Fu, Dongfang; Li, Shiang; Wang, Yuxuan; Wong, Aloysius

    2018-06-15

    GCPred is a webserver for the prediction of guanylyl cyclase (GC) functional centres from amino acid sequence. GCs are enzymes that generate the signalling molecule cyclic guanosine 3', 5'-monophosphate from guanosine-5'-triphosphate. A novel class of GC centres (GCCs) has been identified in complex plant proteins. Using currently available experimental data, GCPred is created to automate and facilitate the identification of similar GCCs. The server features GCC values that consider in its calculation, the physicochemical properties of amino acids constituting the GCC and the conserved amino acids within the centre. From user input amino acid sequence, the server returns a table of GCC values and graphs depicting deviations from mean values. The utility of this server is demonstrated using plant proteins and the human interleukin-1 receptor-associated kinase family of proteins as example. The GCPred server is available at http://gcpred.com. Supplementary data are available at Bioinformatics online.

  15. Nucleotide sequence of the phosphoglycerate kinase gene from the extreme thermophile Thermus thermophilus. Comparison of the deduced amino acid sequence with that of the mesophilic yeast phosphoglycerate kinase.

    PubMed Central

    Bowen, D; Littlechild, J A; Fothergill, J E; Watson, H C; Hall, L

    1988-01-01

    Using oligonucleotide probes derived from amino acid sequencing information, the structural gene for phosphoglycerate kinase from the extreme thermophile, Thermus thermophilus, was cloned in Escherichia coli and its complete nucleotide sequence determined. The gene consists of an open reading frame corresponding to a protein of 390 amino acid residues (calculated Mr 41,791) with an extreme bias for G or C (93.1%) in the codon third base position. Comparison of the deduced amino acid sequence with that of the corresponding mesophilic yeast enzyme indicated a number of significant differences. These are discussed in terms of the unusual codon bias and their possible role in enhanced protein thermal stability. Images Fig. 1. PMID:3052437

  16. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... in WIPO Standard ST.25 (1998), Appendix 2, Tables 1 and 3. This incorporation by reference was... ST.25 (1998), Appendix 2, Tables 1 and 3, shall be listed in a given sequence as “n” or “Xaa... acids. (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter...

  17. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... in WIPO Standard ST.25 (1998), Appendix 2, Tables 1 and 3. This incorporation by reference was... ST.25 (1998), Appendix 2, Tables 1 and 3, shall be listed in a given sequence as “n” or “Xaa... acids. (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter...

  18. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... in WIPO Standard ST.25 (1998), Appendix 2, Tables 1 and 3. This incorporation by reference was... ST.25 (1998), Appendix 2, Tables 1 and 3, shall be listed in a given sequence as “n” or “Xaa... acids. (1) The amino acids in a protein or peptide sequence shall be listed using the three-letter...

  19. Draft genome sequence of the docosahexaenoic acid producing thraustochytrid Aurantiochytrium sp. T66.

    PubMed

    Liu, Bin; Ertesvåg, Helga; Aasen, Inga Marie; Vadstein, Olav; Brautaset, Trygve; Heggeset, Tonje Marita Bjerkan

    2016-06-01

    Thraustochytrids are unicellular, marine protists, and there is a growing industrial interest in these organisms, particularly because some species, including strains belonging to the genus Aurantiochytrium, accumulate high levels of docosahexaenoic acid (DHA). Here, we report the draft genome sequence of Aurantiochytrium sp. T66 (ATCC PRA-276), with a size of 43 Mbp, and 11,683 predicted protein-coding sequences. The data has been deposited at DDBJ/EMBL/Genbank under the accession LNGJ00000000. The genome sequence will contribute new insight into DHA biosynthesis and regulation, providing a basis for metabolic engineering of thraustochytrids.

  20. ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

    PubMed Central

    2012-01-01

    Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836

  1. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Form and format for... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... Code for Information Interchange (ASCII) text. No other formats shall be allowed. (3) The computer...

  2. Sequencing, bioinformatic characterization and expression pattern of a putative amino acid transporter from the parasitic cestode Echinococcus granulosus.

    PubMed

    Camicia, Federico; Paredes, Rodolfo; Chalar, Cora; Galanti, Norbel; Kamenetzky, Laura; Gutierrez, Ariana; Rosenzvit, Mara C

    2008-03-31

    We have sequenced and partially characterized an Echinococcus granulosus cDNA, termed egat1, from a protoscolex signal sequence trap (SST) cDNA library. The isolated 1627 bp long cDNA contains an ORF of 489 amino acids and shows an amino acid identity of 30% with neutral and excitatory amino acid transporters members of the Dicarboxylate/Amino Acid Na+ and/or H+ Cation Symporter family (DAACS) (TC 2.A.23). Additional bioinformatics analysis of EgAT1, confirmed the results obtained by similarity searches and showed the presence of 9 to 10 transmembrane domains, consensus sequences for N-glycosylation between the third and fourth transmembrane domain, a highly similar hydropathy profile with ASCT1 (a known member of DAACS family), high score with SDF (Sodium Dicarboxilate Family) and similar motifs with EDTRANSPORT, a fingerprint of excitatory amino acid transporters. The localization of the putative amino acid transporter was analyzed by in situ hybridization and immunofluorescence in protoscoleces and associated germinal layer. The in situ hybridization labelling indicates the distribution of egat1 mRNA throughout the tegument. EgAT1 protein, which showed in Western blots a molecular mass of approximately 60 kD, is localized in the subtegumental region of the metacestode, particularly around suckers and rostellum of protoscoleces and layers from brood capsules. The sequence and expression analyses of EgAT1 pave the way for functional analysis of amino acids transporters of E. granulosus and its evaluation as new drug targets against cystic echinococcosis.

  3. Amino- and carboxyl-terminal amino acid sequences of proteins coded by gag gene of murine leukemia virus

    PubMed Central

    Oroszlan, Stephen; Henderson, Louis E.; Stephenson, John R.; Copeland, Terry D.; Long, Cedric W.; Ihle, James N.; Gilden, Raymond V.

    1978-01-01

    The amino- and carboxyl-terminal amino acid sequences of proteins (p10, p12, p15, and p30) coded by the gag gene of Rauscher and AKR murine leukemia viruses were determined. Among these proteins, p15 from both viruses appears to have a blocked amino end. Proline was found to be the common NH2 terminus of both p30s and both p12s, and alanine of both p10s. The amino-terminal sequences of p30s are identical, as are those of p10s, while the p12 sequences are clearly distinctive but also show substantial homology. The carboxyl-terminal amino acids of both viral p30s and p12s are leucine and phenylalanine, respectively. Rauscher leukemia virus p15 has tyrosine as the carboxyl terminus while AKR virus p15 has phenylalanine in this position. The compositional and sequence data provide definite chemical criteria for the identification of analogous gag gene products and for the comparison of viral proteins isolated in different laboratories. On the basis of amino acid sequences and the previously proposed H-p15-p12-p30-p10-COOH peptide sequence in the precursor polyprotein, a model for cleavage sites involved in the post-translational processing of the precursor coded for by the gag gene is proposed. PMID:206897

  4. Purification, amino acid sequence and characterisation of kangaroo IGF-I.

    PubMed

    Yandell, C A; Francis, G L; Wheldrake, J F; Upton, Z

    1998-01-01

    Insulin-like growth factor-I (IGF-I) and IGF-II have been purified to homogeneity from kangaroo (Macropus fuliginosus) serum, thus this represents the first report of the purification, sequencing and characterisation of marsupial IGFs. N-Terminal protein sequencing reveals that there are six amino acid differences between kangaroo and human IGF-I. Kangaroo IGF-II has been partially sequenced and no differences were found between human and kangaroo IGF-II in the 53 residues identified. Thus the IGFs appear to be remarkably structurally conserved during mammalian radiation. In addition, in vitro characterisation of kangaroo IGF-I demonstrated that the functional properties of human, kangaroo and chicken IGF-I are very similar. In an assay measuring the ability of the proteins to stimulate protein synthesis in rat L6 myoblasts, all IGF-I proteins were found to be equally potent. The ability of all three proteins to compete for binding with radiolabelled human IGF-I to type-1 IGF receptors in L6 myoblasts and in Sminthopsis crassicaudata transformed lung fibroblasts, a marsupial cell line, was comparable. Furthermore, kangaroo and human IGF-I react equally in a human IGF-I RIA using a human reference standard, radiolabelled human IGF-I and a polyclonal antibody raised against recombinant human IGF-I. This study indicates that not only is the primary structure of eutherian and metatherian IGF-I conserved, but also the proteins appear to be functionally similar.

  5. Sequence-dependent DNA deformability studied using molecular dynamics simulations.

    PubMed

    Fujii, Satoshi; Kono, Hidetoshi; Takenaka, Shigeori; Go, Nobuhiro; Sarai, Akinori

    2007-01-01

    Proteins recognize specific DNA sequences not only through direct contact between amino acids and bases, but also indirectly based on the sequence-dependent conformation and deformability of the DNA (indirect readout). We used molecular dynamics simulations to analyze the sequence-dependent DNA conformations of all 136 possible tetrameric sequences sandwiched between CGCG sequences. The deformability of dimeric steps obtained by the simulations is consistent with that by the crystal structures. The simulation results further showed that the conformation and deformability of the tetramers can highly depend on the flanking base pairs. The conformations of xATx tetramers show the most rigidity and are not affected by the flanking base pairs and the xYRx show by contrast the greatest flexibility and change their conformations depending on the base pairs at both ends, suggesting tetramers with the same central dimer can show different deformabilities. These results suggest that analysis of dimeric steps alone may overlook some conformational features of DNA and provide insight into the mechanism of indirect readout during protein-DNA recognition. Moreover, the sequence dependence of DNA conformation and deformability may be used to estimate the contribution of indirect readout to the specificity of protein-DNA recognition as well as nucleosome positioning and large-scale behavior of nucleic acids.

  6. Amino acid sequence of the smaller basic protein from rat brain myelin

    PubMed Central

    Dunkley, Peter R.; Carnegie, Patrick R.

    1974-01-01

    1. The complete amino acid sequence of the smaller basic protein from rat brain myelin was determined. This protein differs from myelin basic proteins of other species in having a deletion of a polypeptide of 40 amino acid residues from the centre of the molecule. 2. A detailed comparison is made of the constant and variable regions in a group of myelin basic proteins from six species. 3. An arginine residue in the rat protein was found to be partially methylated. The ratio of methylated to unmethylated arginine at this position differed from that found for the human basic protein. 4. Three tryptic peptides were isolated in more than one form. The differences between the two forms of each peptide are discussed in relation to the electrophoretic heterogeneity of myelin basic proteins, which is known to occur at alkaline pH values. 5. Detailed evidence for the amino acid sequence of the protein has been deposited as Supplementary Publication SUP 50029 at the British Library (Lending Division) (formerly the National Lending Library for Science and Technology), Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1973) 131, 5. PMID:4141893

  7. Predicting protein amidation sites by orchestrating amino acid sequence features

    NASA Astrophysics Data System (ADS)

    Zhao, Shuqiu; Yu, Hua; Gong, Xiujun

    2017-08-01

    Amidation is the fourth major category of post-translational modifications, which plays an important role in physiological and pathological processes. Identifying amidation sites can help us understanding the amidation and recognizing the original reason of many kinds of diseases. But the traditional experimental methods for predicting amidation sites are often time-consuming and expensive. In this study, we propose a computational method for predicting amidation sites by orchestrating amino acid sequence features. Three kinds of feature extraction methods are used to build a feature vector enabling to capture not only the physicochemical properties but also position related information of the amino acids. An extremely randomized trees algorithm is applied to choose the optimal features to remove redundancy and dependence among components of the feature vector by a supervised fashion. Finally the support vector machine classifier is used to label the amidation sites. When tested on an independent data set, it shows that the proposed method performs better than all the previous ones with the prediction accuracy of 0.962 at the Matthew's correlation coefficient of 0.89 and area under curve of 0.964.

  8. Solid phase sequencing of biopolymers

    DOEpatents

    Cantor, Charles; Koster, Hubert

    2010-09-28

    This invention relates to methods for detecting and sequencing target nucleic acid sequences, to mass modified nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probes comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include DNA or RNA in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated molecular weight analysis and identification of the target sequence.

  9. Conversion of amino-acid sequence in proteins to classical music: search for auditory patterns

    PubMed Central

    2007-01-01

    We have converted genome-encoded protein sequences into musical notes to reveal auditory patterns without compromising musicality. We derived a reduced range of 13 base notes by pairing similar amino acids and distinguishing them using variations of three-note chords and codon distribution to dictate rhythm. The conversion will help make genomic coding sequences more approachable for the general public, young children, and vision-impaired scientists. PMID:17477882

  10. The amino acid motif L/IIxxFE defines a novel actin-binding sequence in PDZ-RhoGEF

    PubMed Central

    Banerjee, Jayashree; Fischer, Christopher C.; Wedegaertner, Philip B.

    2009-01-01

    PDZ-RhoGEF is a member of the regulator of G protein signaling (RGS) domain-containing RhoGEFs (RGS-RhoGEFs) that link activated heterotrimeric G protein α subunits of the G12 family to activation of the small GTPase RhoA. Unique among the RGS-RhoGEFs, PDZ-RhoGEF contains a short sequence that localizes the protein to the actin cytoskeleton. In this report, we demonstrate that the actin-binding domain, located between amino acids 561–585, directly binds to F-actin in vitro. Extensive mutagenesis identifies isoleucine 568, isoleucine 569, phenylalanine 572, and glutamic acid 573 as necessary for binding to actin and for co-localization with the actin cytoskeleton in cells. These results define a novel actin-binding sequence in PDZ-RhoGEF with a critical amino acid motif of IIxxFE. Moreover, sequence analysis identifies a similar actin-binding motif in the N-terminus of the RhoGEF frabin, and, as with PDZ-RhoGEF, mutagenesis and actin interaction experiments demonstrate a motif of LIxxFE, consisting of the key amino acids leucine 23, isoleucine 24, phenylalanine 27, and glutamic acid 28. Taken together, results with PDZ-RhoGEF and frabin identify a novel actin binding sequence. Lastly, inducible dimerization of the actin-binding region of PDZ-RhoGEF revealed a dimerization-dependent actin bundling activity in vitro. PDZ-RhoGEF exists in cells as a dimer, raising the possibility that PDZ-RhoGEF could influence actin structure independent of its ability to activate RhoA. PMID:19618964

  11. alpha-Amylase gene of Streptomyces limosus: nucleotide sequence, expression motifs, and amino acid sequence homology to mammalian and invertebrate alpha-amylases.

    PubMed Central

    Long, C M; Virolle, M J; Chang, S Y; Chang, S; Bibb, M J

    1987-01-01

    The nucleotide sequence of the coding and regulatory regions of the alpha-amylase gene (aml) of Streptomyces limosus was determined. High-resolution S1 mapping was used to locate the 5' end of the transcript and demonstrated that the gene is transcribed from a unique promoter. The predicted amino acid sequence has considerable identity to mammalian and invertebrate alpha-amylases, but not to those of plant, fungal, or eubacterial origin. Consistent with this is the susceptibility of the enzyme to an inhibitor of mammalian alpha-amylases. The amino-terminal sequence of the extracellular enzyme was determined, revealing the presence of a typical signal peptide preceding the mature form of the alpha-amylase. Images PMID:3500166

  12. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Patel, Kamlesh D.

    2018-01-22

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  13. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Patel, Kamlesh D.

    2012-06-01

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  14. Partial amino acid sequence of the branched chain amino acid aminotransferase (TmB) of E. coli JA199 pDU11

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Feild, M.J.; Armstrong, F.B.

    1987-05-01

    E. coli JA199 pDU11 harbors a multicopy plasmid containing the ilv GEDAY gene cluster of S. typhimurium. TmB, gene product of ilv E, was purified, crystallized, and subjected to Edman degradation using a gas phase sequencer. The intact protein yielded an amino terminal 31 residue sequence. Both carboxymethylated apoenzyme and (/sup 3/H)-NaBH-reduced holoenzyme were then subjected to digestion by trypsin. The digests were fractionated using reversed phase HPLC, and the peptides isolated were sequenced. The borohydride-treated holoenzyme was used to isolate the cofactor-binding peptide. The peptide is 27 residues long and a comparison with known sequences of other aminotransferases revealedmore » limited homology. Peptides accounting for 211 of 288 predicted residues have been sequenced, including 9 residues of the carboxyl terminus. Comparison of peptides with the inferred amino acid sequence of the E. coli K-12 enzyme has helped determine the sequence of the amino terminal 59 residues; only two differences between the sequences are noted in this region.« less

  15. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... base or modified or unusual amino acid may be presented in a given sequence as the corresponding unmodified base or amino acid if the modified base or modified or unusual amino acid is one of those listed... the Feature section. Otherwise, each occurrence of a base or amino acid not appearing in WIPO Standard...

  16. Self-sequencing of amino acids and origins of polyfunctional protocells

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1984-01-01

    The role of proteins in the origin of living things is discussed. It has been experimentally established that amino acids can sequence themselves under simulated geological conditions with highly nonrandom products which accordingly contain diverse information. Multiple copies of each type of macromolecule are formed, resulting in greater power for any protoenzymic molecule than would accrue from a single copy of each type. Thermal proteins are readily incorporated into laboratory protocells. The experimental evidence for original polyfunctional protocells is discussed.

  17. Microwave-assisted acid and base hydrolysis of intact proteins containing disulfide bonds for protein sequence analysis by mass spectrometry.

    PubMed

    Reiz, Bela; Li, Liang

    2010-09-01

    Controlled hydrolysis of proteins to generate peptide ladders combined with mass spectrometric analysis of the resultant peptides can be used for protein sequencing. In this paper, two methods of improving the microwave-assisted protein hydrolysis process are described to enable rapid sequencing of proteins containing disulfide bonds and increase sequence coverage, respectively. It was demonstrated that proteins containing disulfide bonds could be sequenced by MS analysis by first performing hydrolysis for less than 2 min, followed by 1 h of reduction to release the peptides originally linked by disulfide bonds. It was shown that a strong base could be used as a catalyst for microwave-assisted protein hydrolysis, producing complementary sequence information to that generated by microwave-assisted acid hydrolysis. However, using either acid or base hydrolysis, amide bond breakages in small regions of the polypeptide chains of the model proteins (e.g., cytochrome c and lysozyme) were not detected. Dynamic light scattering measurement of the proteins solubilized in an acid or base indicated that protein-protein interaction or aggregation was not the cause of the failure to hydrolyze certain amide bonds. It was speculated that there were some unknown local structures that might play a role in preventing an acid or base from reacting with the peptide bonds therein. 2010 American Society for Mass Spectrometry. Published by Elsevier Inc. All rights reserved.

  18. Computational studies of sequence-specific driving forces in peptide self-assembly

    NASA Astrophysics Data System (ADS)

    Jeon, Joohyun

    Peptides are biopolymers made from various sequences of twenty different types of amino acids, connected by peptide bonds. There are practically an infinite number of possible sequences and tremendous possible combinations of peptide-peptide interactions. Recently, an increasing number of studies have shown a stark variety of peptide self-assembled nanomaterials whose detailed structures depend on their sequences and environmental factors; these have end uses in medical and bio-electronic applications, for example. To understand the underlying physics of complex peptide self-assembly processes and to delineate sequence specific effects, in this study, I use various simulation tools spanning all-atom molecular dynamics to simple lattice models and quantify the balance of interactions in the peptide self-assembly processes. In contrast to the existing view that peptides' aggregation propensities are proportional to the net sequence hydrophobicity and inversely proportional to the net charge, I show the more nuanced effects of electrostatic interactions, including the cooperative effects between hydrophobic and electrostatic interactions. Notably, I suggest rather unexpected, yet important roles of entropies in the small scale oligomerization processes. Overall, this study broadens our understanding of the role of thermodynamic driving forces in peptide self-assembly.

  19. Differences in acid tolerance between Bifidobacterium breve BB8 and its acid-resistant derivative B. breve BB8dpH, revealed by RNA-sequencing and physiological analysis.

    PubMed

    Yang, Xu; Hang, Xiaomin; Tan, Jing; Yang, Hong

    2015-06-01

    Bifidobacteria are common inhabitants of the human gastrointestinal tract, and their application has increased dramatically in recent years due to their health-promoting effects. The ability of bifidobacteria to tolerate acidic environments is particularly important for their function as probiotics because they encounter such environments in food products and during passage through the gastrointestinal tract. In this study, we generated a derivative, Bifidobacterium breve BB8dpH, which displayed a stable, acid-resistant phenotype. To investigate the possible reasons for the higher acid tolerance of B. breve BB8dpH, as compared with its parental strain B. breve BB8, a combined transcriptome and physiological approach was used to characterize differences between the two strains. An analysis of the transcriptome by RNA-sequencing indicated that the expression of 121 genes was increased by more than 2-fold, while the expression of 146 genes was reduced more than 2-fold, in B. breve BB8dpH. Validation of the RNA-sequencing data using real-time quantitative PCR analysis demonstrated that the RNA-sequencing results were highly reliable. The comparison analysis, based on differentially expressed genes, suggested that the acid tolerance of B. breve BB8dpH was enhanced by regulating the expression of genes involved in carbohydrate transport and metabolism, energy production, synthesis of cell envelope components (peptidoglycan and exopolysaccharide), synthesis and transport of glutamate and glutamine, and histidine synthesis. Furthermore, an analysis of physiological data showed that B. breve BB8dpH displayed higher production of exopolysaccharide and lower H(+)-ATPase activity than B. breve BB8. The results presented here will improve our understanding of acid tolerance in bifidobacteria, and they will lead to the development of new strategies to enhance the acid tolerance of bifidobacterial strains. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. Structure-related statistical singularities along protein sequences: a correlation study.

    PubMed

    Colafranceschi, Mauro; Colosimo, Alfredo; Zbilut, Joseph P; Uversky, Vladimir N; Giuliani, Alessandro

    2005-01-01

    A data set composed of 1141 proteins representative of all eukaryotic protein sequences in the Swiss-Prot Protein Knowledge base was coded by seven physicochemical properties of amino acid residues. The resulting numerical profiles were submitted to correlation analysis after the application of a linear (simple mean) and a nonlinear (Recurrence Quantification Analysis, RQA) filter. The main RQA variables, Recurrence and Determinism, were subsequently analyzed by Principal Component Analysis. The RQA descriptors showed that (i) within protein sequences is embedded specific information neither present in the codes nor in the amino acid composition and (ii) the most sensitive code for detecting ordered recurrent (deterministic) patterns of residues in protein sequences is the Miyazawa-Jernigan hydrophobicity scale. The most deterministic proteins in terms of autocorrelation properties of primary structures were found (i) to be involved in protein-protein and protein-DNA interactions and (ii) to display a significantly higher proportion of structural disorder with respect to the average data set. A study of the scaling behavior of the average determinism with the setting parameters of RQA (embedding dimension and radius) allows for the identification of patterns of minimal length (six residues) as possible markers of zones specifically prone to inter- and intramolecular interactions.

  1. Statistical distribution of amino acid sequences: a proof of Darwinian evolution.

    PubMed

    Eitner, Krystian; Koch, Uwe; Gaweda, Tomasz; Marciniak, Jedrzej

    2010-12-01

    The article presents results of the listing of the quantity of amino acids, dipeptides and tripeptides for all proteins available in the UNIPROT-TREMBL database and the listing for selected species and enzymes. UNIPROT-TREMBL contains protein sequences associated with computationally generated annotations and large-scale functional characterization. Due to the distinct metabolic pathways of amino acid syntheses and their physicochemical properties, the quantities of subpeptides in proteins vary. We have proved that the distribution of amino acids, dipeptides and tripeptides is statistical which confirms that the evolutionary biodiversity development model is subject to the theory of independent events. It seems interesting that certain short peptide combinations occur relatively rarely or even not at all. First, it confirms the Darwinian theory of evolution and second, it opens up opportunities for designing pharmaceuticals among rarely represented short peptide combinations. Furthermore, an innovative approach to the mass analysis of bioinformatic data is presented. eitner@amu.edu.pl Supplementary data are available at Bioinformatics online.

  2. Purification, characterization, gene cloning and nucleotide sequencing of D: -stereospecific amino acid amidase from soil bacterium: Delftia acidovorans.

    PubMed

    Hongpattarakere, Tipparat; Komeda, Hidenobu; Asano, Yasuhisa

    2005-12-01

    The D-amino acid amidase-producing bacterium was isolated from soil samples using an enrichment culture technique in medium broth containing D-phenylalanine amide as a sole source of nitrogen. The strain exhibiting the strongest activity was identified as Delftia acidovorans strain 16. This strain produced intracellular D-amino acid amidase constitutively. The enzyme was purified about 380-fold to homogeneity and its molecular mass was estimated to be about 50 kDa, on sodium dodecyl sulfate polyacrylamide gel electrophoresis. The enzyme was active preferentially toward D-amino acid amides rather than their L-counterparts. It exhibited strong amino acid amidase activity toward aromatic amino acid amides including D-phenylalanine amide, D-tryptophan amide and D-tyrosine amide, yet it was not specifically active toward low-molecular-weight D-amino acid amides such as D-alanine amide, L-alanine amide and L-serine amide. Moreover, it was not specifically active toward oligopeptides. The enzyme showed maximum activity at 40 degrees C and pH 8.5 and appeared to be very stable, with 92.5% remaining activity after the reaction was performed at 45 degrees C for 30 min. However, it was mostly inactivated in the presence of phenylmethanesulfonyl fluoride or Cd2+, Ag+, Zn2+, Hg2+ and As3+ . The NH2 terminal and internal amino acid sequences of the enzyme were determined; and the gene was cloned and sequenced. The enzyme gene damA encodes a 466-amino-acid protein (molecular mass 49,860.46 Da); and the deduced amino acid sequence exhibits homology to the D-amino acid amidase from Variovorax paradoxus (67.9% identity), the amidotransferase A subunit from Burkholderia fungorum (50% identity) and other enantioselective amidases.

  3. A Novel Phytase with Sequence Similarity to Purple Acid Phosphatases Is Expressed in Cotyledons of Germinating Soybean Seedlings 1

    PubMed Central

    Hegeman, Carla E.; Grabau, Elizabeth A.

    2001-01-01

    Phytic acid (myo-inositol hexakisphosphate) is the major storage form of phosphorus in plant seeds. During germination, stored reserves are used as a source of nutrients by the plant seedling. Phytic acid is degraded by the activity of phytases to yield inositol and free phosphate. Due to the lack of phytases in the non-ruminant digestive tract, monogastric animals cannot utilize dietary phytic acid and it is excreted into manure. High phytic acid content in manure results in elevated phosphorus levels in soil and water and accompanying environmental concerns. The use of phytases to degrade seed phytic acid has potential for reducing the negative environmental impact of livestock production. A phytase was purified to electrophoretic homogeneity from cotyledons of germinated soybeans (Glycine max L. Merr.). Peptide sequence data generated from the purified enzyme facilitated the cloning of the phytase sequence (GmPhy) employing a polymerase chain reaction strategy. The introduction of GmPhy into soybean tissue culture resulted in increased phytase activity in transformed cells, which confirmed the identity of the phytase gene. It is surprising that the soybean phytase was unrelated to previously characterized microbial or maize (Zea mays) phytases, which were classified as histidine acid phosphatases. The soybean phytase sequence exhibited a high degree of similarity to purple acid phosphatases, a class of metallophosphoesterases. PMID:11500558

  4. The isolation, purification and amino-acid sequence of insulin from the teleost fish Cottus scorpius (daddy sculpin).

    PubMed

    Cutfield, J F; Cutfield, S M; Carne, A; Emdin, S O; Falkmer, S

    1986-07-01

    Insulin from the principal islets of the teleost fish, Cottus scorpius (daddy sculpin), has been isolated and sequenced. Purification involved acid/alcohol extraction, gel filtration, and reverse-phase high-performance liquid chromatography to yield nearly 1 mg pure insulin/g wet weight islet tissue. Biological potency was estimated as 40% compared to porcine insulin. The sculpin insulin crystallised in the absence of zinc ions although zinc is known to be present in the islets in significant amounts. Two other hormones, glucagon and pancreatic polypeptide, were copurified with the insulin, and an N-terminal sequence for pancreatic polypeptide was determined. The primary structure of sculpin insulin shows a number of sequence changes unique so far amongst teleost fish. These changes occur at A14 (Arg), A15 (Val), and B2 (Asp). The B chain contains 29 amino acids and there is no N-terminal extension as seen with several other fish. Presumably as a result of the amino acid substitutions, sculpin insulin does not readily form crystals containing zinc-insulin hexamers, despite the presence of the coordinating B10 His.

  5. Design of nucleic acid sequences for DNA computing based on a thermodynamic approach

    PubMed Central

    Tanaka, Fumiaki; Kameda, Atsushi; Yamamoto, Masahito; Ohuchi, Azuma

    2005-01-01

    We have developed an algorithm for designing multiple sequences of nucleic acids that have a uniform melting temperature between the sequence and its complement and that do not hybridize non-specifically with each other based on the minimum free energy (ΔGmin). Sequences that satisfy these constraints can be utilized in computations, various engineering applications such as microarrays, and nano-fabrications. Our algorithm is a random generate-and-test algorithm: it generates a candidate sequence randomly and tests whether the sequence satisfies the constraints. The novelty of our algorithm is that the filtering method uses a greedy search to calculate ΔGmin. This effectively excludes inappropriate sequences before ΔGmin is calculated, thereby reducing computation time drastically when compared with an algorithm without the filtering. Experimental results in silico showed the superiority of the greedy search over the traditional approach based on the hamming distance. In addition, experimental results in vitro demonstrated that the experimental free energy (ΔGexp) of 126 sequences correlated well with ΔGmin (|R| = 0.90) than with the hamming distance (|R| = 0.80). These results validate the rationality of a thermodynamic approach. We implemented our algorithm in a graphic user interface-based program written in Java. PMID:15701762

  6. cis-β-Bromostyrene derivatives from cinnamic acids via a tandem substitutive bromination-decarboxylation sequence.

    PubMed

    Tang, Khanh G; Kent, Greggory T; Erden, Ihsan; Wu, Weiming

    2017-10-04

    cis -β-Bromostyrene derivatives were synthesized stereospecifically from cinnamic acids through β-lactone intermediates. The synthetic sequence did not require the purification of the β-lactone intermediates although they were found to be stable and readily purified in most cases.

  7. Nucleic Acid Amplification Testing and Sequencing Combined with Acid-Fast Staining in Needle Biopsy Lung Tissues for the Diagnosis of Smear-Negative Pulmonary Tuberculosis.

    PubMed

    Jiang, Faming; Huang, Weiwei; Wang, Ye; Tian, Panwen; Chen, Xuerong; Liang, Zongan

    2016-01-01

    Smear-negative pulmonary tuberculosis (PTB) is common and difficult to diagnose. In this study, we investigated the diagnostic value of nucleic acid amplification testing and sequencing combined with acid-fast bacteria (AFB) staining of needle biopsy lung tissues for patients with suspected smear-negative PTB. Patients with suspected smear-negative PTB who underwent percutaneous transthoracic needle biopsy between May 1, 2012, and June 30, 2015, were enrolled in this retrospective study. Patients with AFB in sputum smears were excluded. All lung biopsy specimens were fixed in formalin, embedded in paraffin, and subjected to acid-fast staining and tuberculous polymerase chain reaction (TB-PCR). For patients with positive AFB and negative TB-PCR results in lung tissues, probe assays and 16S rRNA sequencing were used for identification of nontuberculous mycobacteria (NTM). The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and diagnostic accuracy of PCR and AFB staining were calculated separately and in combination. Among the 220 eligible patients, 133 were diagnosed with TB (men/women: 76/57; age range: 17-80 years, confirmed TB: 9, probable TB: 124). Forty-eight patients who were diagnosed with other specific diseases were assigned as negative controls, and 39 patients with indeterminate final diagnosis were excluded from statistical analysis. The sensitivity, specificity, PPV, NPV, and accuracy of histological AFB (HAFB) for the diagnosis of smear-negative were 61.7% (82/133), 100% (48/48), 100% (82/82), 48.5% (48/181), and 71.8% (130/181), respectively. The sensitivity, specificity, PPV, and NPV of histological PCR were 89.5% (119/133), 95.8% (46/48), 98.3% (119/121), and 76.7% (46/60), respectively, demonstrating that histological PCR had significantly higher accuracy (91.2% [165/181]) than histological acid-fast staining (71.8% [130/181]), P < 0.001. Parallel testing of histological AFB staining and PCR showed the

  8. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, H.U.G.; Gray, J.W.

    1995-06-27

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers and probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity. 18 figs.

  9. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, Heinz-Ulrich G.; Gray, Joe W.

    1995-01-01

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers ard probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity.

  10. DNA Cloning of Plasmodium falciparum Circumsporozoite Gene: Amino Acid Sequence of Repetitive Epitope

    NASA Astrophysics Data System (ADS)

    Enea, Vincenzo; Ellis, Joan; Zavala, Fidel; Arnot, David E.; Asavanich, Achara; Masuda, Aoi; Quakyi, Isabella; Nussenzweig, Ruth S.

    1984-08-01

    A clone of complementary DNA encoding the circumsporozoite (CS) protein of the human malaria parasite Plasmodium falciparum has been isolated by screening an Escherichia coli complementary DNA library with a monoclonal antibody to the CS protein. The DNA sequence of the complementary DNA insert encodes a four-amino acid sequence: proline-asparagine-alanine-asparagine, tandemly repeated 23 times. The CS β -lactamase fusion protein specifically binds monoclonal antibodies to the CS protein and inhibits the binding of these antibodies to native Plasmodium falciparum CS protein. These findings provide a basis for the development of a vaccine against Plasmodium falciparum malaria.

  11. Hydroquinone: O-glucosyltransferase from cultivated Rauvolfia cells: enrichment and partial amino acid sequences.

    PubMed

    Arend, J; Warzecha, H; Stöckigt, J

    2000-01-01

    Plant cell suspension cultures of Rauvolfia are able to produce a high amount of arbutin by glucosylation of exogenously added hydroquinone. A four step purification procedure using anion exchange, hydrophobic interaction, hydroxyapatite-chromatography and chromatofocusing delivered in a yield of 0.5%, an approximately 390 fold enrichment of the involved glucosyltransferase. SDS-PAGE showed a M(r) for the enzyme of 52 kDa. Proteolysis of the pure enzyme with endoproteinase LysC revealed six peptide fragments with 9-23 amino acids which were sequenced. Sequence alignment of the six peptides showed high homologies to glycosyltransferases from other higher plants.

  12. Partial amino-acid sequence of the precursor of an immunoglobulin light chain containing NH2-terminal pyroglutamic acid.

    PubMed Central

    Burstein, Y; Kantour, F; Schechter, I

    1976-01-01

    Analyses of amino-acid sequences of the total cell-free products programmed by the mRNA of MOPC-104E gamma light (L)-chain show that over 95% of the products have sequences of a distinct protein that correspond to the L-chain precursor. In this precursor an extra piece is coupled to the NH2-terminus of the mature L-chain. Analyses of products labeled with [3H]alanine, [3H]leucine, and [3H]proline demonstrate that the extra piece is composed of at least 18 residues. Analyses of [35S]methione-labeled product indicate that the extra piece may contain an additional NH2-terminal methionine, which is detected in about 10% of the molecules. Partial recovery of the NJ2-terminal methionine (alanine, leucine, and proline are recovered in yields close to theoretical, greater than 95%) suggests that it is the initiator methionine, which is known to be short lived in eukaryotes due to rapid hydrolysis. Thus, the extra piece seems to be 19 residues in length, and it contains one methionine at the NH2-terminus, three alanines at positions 2, 12, and 17, and five leucines at positions 6, 8, 10, 11, and 13. The close gathering of leucine residues, as well as their abundance (26%), suggest that the extra piece would be quite hydrophobic. Hydrophobicity seems to be a general property of the extra piece, since similar clusters of leucine were found in the precursors of 3 KL-chains (Burstein, Y. & Schechter, I. (1976) Biochem. J. 157, 145-151). The NH2-terminus of the mature MOPC-104E gamma L-chain is blocked by pyroglutamic acid. The fact that in the precursor a peptide segment precedes this NH2-terminus establishes that pyroglutamic acid is not the initiator residue for synthesis of the L-chain. Apparently, the pyroglutamic acid is formed by cyclization of glutamic acid or glutamine during cleavage of the extra piece to yield the mature L-chain. Images PMID:822420

  13. Amino acid sequence of human cholinesterase. Annual report, 30 September 1984-30 September 1985

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lockridge, O.

    1985-10-01

    The active-site serine residue is located 198 amino acids from the N-terminal. The active-site peptide was isolated from three different genetic types of human serum cholinesterase: from usual, atypical, and atypical-silent genotypes. It was found that the amino acid sequence of the active-site peptide was identical in all three genotypes. Comparison of the complete sequences of cholinesterase from human serum and acetylcholinesterase from the electric organ of Torpedo californica shows an identity of 53%. Cholinesterase is of interest to the Department of Defense because cholinesterase protects against organophosphate poisons of the type used in chemical warfare. The structural results presentedmore » here will serve as the basis for cloning the gene for cholinesterase. The potential uses of large amounts of cholinesterase would be for cleaning up spills of organophosphates and possibly for detoxifying exposed personnel.« less

  14. The sequence of sequencers: The history of sequencing DNA

    PubMed Central

    Heather, James M.; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. PMID:26554401

  15. Two-level QSAR network (2L-QSAR) for peptide inhibitor design based on amino acid properties and sequence positions.

    PubMed

    Du, Q S; Ma, Y; Xie, N Z; Huang, R B

    2014-01-01

    In the design of peptide inhibitors the huge possible variety of the peptide sequences is of high concern. In collaboration with the fast accumulation of the peptide experimental data and database, a statistical method is suggested for peptide inhibitor design. In the two-level peptide prediction network (2L-QSAR) one level is the physicochemical properties of amino acids and the other level is the peptide sequence position. The activity contributions of amino acids are the functions of physicochemical properties and the sequence positions. In the prediction equation two weight coefficient sets {ak} and {bl} are assigned to the physicochemical properties and to the sequence positions, respectively. After the two coefficient sets are optimized based on the experimental data of known peptide inhibitors using the iterative double least square (IDLS) procedure, the coefficients are used to evaluate the bioactivities of new designed peptide inhibitors. The two-level prediction network can be applied to the peptide inhibitor design that may aim for different target proteins, or different positions of a protein. A notable advantage of the two-level statistical algorithm is that there is no need for host protein structural information. It may also provide useful insight into the amino acid properties and the roles of sequence positions.

  16. Analyses of mitochondrial amino acid sequence datasets support the proposal that specimens of Hypodontus macropi from three species of macropodid hosts represent distinct species

    PubMed Central

    2013-01-01

    Background Hypodontus macropi is a common intestinal nematode of a range of kangaroos and wallabies (macropodid marsupials). Based on previous multilocus enzyme electrophoresis (MEE) and nuclear ribosomal DNA sequence data sets, H. macropi has been proposed to be complex of species. To test this proposal using independent molecular data, we sequenced the whole mitochondrial (mt) genomes of individuals of H. macropi from three different species of hosts (Macropus robustus robustus, Thylogale billardierii and Macropus [Wallabia] bicolor) as well as that of Macropicola ocydromi (a related nematode), and undertook a comparative analysis of the amino acid sequence datasets derived from these genomes. Results The mt genomes sequenced by next-generation (454) technology from H. macropi from the three host species varied from 13,634 bp to 13,699 bp in size. Pairwise comparisons of the amino acid sequences predicted from these three mt genomes revealed differences of 5.8% to 18%. Phylogenetic analysis of the amino acid sequence data sets using Bayesian Inference (BI) showed that H. macropi from the three different host species formed distinct, well-supported clades. In addition, sliding window analysis of the mt genomes defined variable regions for future population genetic studies of H. macropi in different macropodid hosts and geographical regions around Australia. Conclusions The present analyses of inferred mt protein sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. PMID:24261823

  17. Amino-acid sequence and predicted three-dimensional structure of pea seed (Pisum sativum) ferritin.

    PubMed Central

    Lobreaux, S; Yewdall, S J; Briat, J F; Harrison, P M

    1992-01-01

    The iron storage protein, ferritin, is widely distributed in the living kingdom. Here the complete cDNA and derived amino-acid sequence of pea seed ferritin are described, together with its predicted secondary structure, namely a four-helix-bundle fold similar to those of mammalian ferritins, with a fifth short helix at the C-terminus. An N-terminal extension of 71 residues contains a transit peptide (first 47 residues) responsible for plastid targetting as in other plant ferritins, and this is cleaved before assembly. The second part of the extension (24 residues) belongs to the mature subunit; it is cleaved during germination. The amino-acid sequence of pea seed ferritin is aligned with those of other ferritins (49% amino-acid identity with H-chains and 40% with L-chains of human liver ferritin in the aligned region). A three-dimensional model has been constructed by fitting the aligned sequence to the coordinates of human H-chains, with appropriate modifications. A folded conformation with an 11-residue helix is predicted for the N-terminal extension. As in mammalian ferritins, 24 subunits assemble into a hollow shell. In pea seed ferritin, its N-terminal extension is exposed on the outside surface of the shell. Within each pea subunit is a ferroxidase centre resembling those of human ferritin H-chains except for a replacement of Glu-62 by His. The channel at the 4-fold-symmetry axes defined by E-helices, is predicted to be hydrophilic in plant ferritins, whereas it is hydrophobic in mammalian ferritins. Images Fig. 3. Fig. 5. Fig. 6. PMID:1472006

  18. Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions.

    PubMed

    Nishizawa, M; Nishizawa, K

    2000-10-01

    The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the 'between gene' GC content heterogeneity, which is linked to 'isochores', is a principal factor associated with the bias in substitution patterns in human, 'within gene' heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed.

  19. Amino acid and nucleotide recurrence in aligned sequences: synonymous substitution patterns in association with global and local base compositions

    PubMed Central

    Nishizawa, Manami; Nishizawa, Kazuhisa

    2000-01-01

    The tendency for repetitiveness of nucleotides in DNA sequences has been reported for a variety of organisms. We show that the tendency for repetitive use of amino acids is widespread and is observed even for segments conserved between human and Drosophila melanogaster at the level of >50% amino acid identity. This indicates that repetitiveness influences not only the weakly constrained segments but also those sequence segments conserved among phyla. Not only glutamine (Q) but also many of the 20 amino acids show a comparable level of repetitiveness. Repetitiveness in bases at codon position 3 is stronger for human than for D.melanogaster, whereas local repetitiveness in intron sequences is similar between the two organisms. While genes for immune system-specific proteins, but not ancient human genes (i.e. human homologs of Escherichia coli genes), have repetitiveness at codon bases 1 and 2, repetitiveness at codon base 3 for these groups is similar, suggesting that the human genome has at least two mechanisms generating local repetitiveness. Neither amino acid nor nucleotide repetitiveness is observed beyond the exon boundary, denying the possibility that such repetitiveness could mainly stem from natural selection on mRNA or protein sequences. Analyses of mammalian sequence alignments show that while the ‘between gene’ GC content heterogeneity, which is linked to ‘isochores’, is a principal factor associated with the bias in substitution patterns in human, ‘within gene’ heterogeneity in nucleotide composition is also associated with such bias on a more local scale. The relationship amongst the various types of repetitiveness is discussed. PMID:11000273

  20. Amino acid sequence of bovine muzzle epithelial desmocollin derived from cloned cDNA: a novel subtype of desmosomal cadherins.

    PubMed

    Koch, P J; Goldschmidt, M D; Walsh, M J; Zimbelmann, R; Schmelz, M; Franke, W W

    1991-05-01

    Desmosomes are cell-type-specific intercellular junctions found in epithelium, myocardium and certain other tissues. They consist of assemblies of molecules involved in the adhesion of specific cell types and in the anchorage of cell-type-specific cytoskeletal elements, the intermediate-size filaments, to the plasma membrane. To explore the individual desmosomal components and their functions we have isolated DNA clones encoding the desmosomal glycoprotein, desmocollin, using antibodies and a cDNA expression library from bovine muzzle epithelium. The cDNA-deduced amino-acid sequence of desmocollin (presently we cannot decide to which of the two desmocollins, DC I or DC II, this clone relates) defines a polypeptide with a calculated molecular weight of 85,000, with a single candidate sequence of 24 amino acids sufficiently long for a transmembrane arrangement, and an extracellular aminoterminal portion of 561 amino acid residues, compared to a cytoplasmic part of only 176 amino acids. Amino acid sequence comparisons have revealed that desmocollin is highly homologous to members of the cadherin family of cell adhesion molecules, including the previously sequenced desmoglein, another desmosome-specific cadherin. Using riboprobes derived from cDNAs for Northern-blot analyses, we have identified an mRNA of approximately 6 kb in stratified epithelia such as muzzle epithelium and tongue mucosa but not in two epithelial cell culture lines containing desmosomes and desmoplakins. The difference may indicate drastic differences in mRNA concentration or the existence of cell-type-specific desmocollin subforms. The molecular topology of desmocollin(s) is discussed in relation to possible functions of the individual molecular domains.

  1. The amino acid sequence around the active-site cysteine and histidine residues of stem bromelain

    PubMed Central

    Husain, S. S.; Lowe, G.

    1970-01-01

    Stem bromelain that had been irreversibly inhibited with 1,3-dibromo[2-14C]-acetone was reduced with sodium borohydride and carboxymethylated with iodoacetic acid. After digestion with trypsin and α-chymotrypsin three radioactive peptides were isolated chromatographically. The amino acid sequences around the cross-linked cysteine and histidine residues were determined and showed a high degree of homology with those around the active-site cysteine and histidine residues of papain and ficin. PMID:5420046

  2. Predicted secondary structure similarity in the absence of primary amino acid sequence homology: hepatitis B virus open reading frames.

    PubMed Central

    Schaeffer, E; Sninsky, J J

    1984-01-01

    Proteins that are related evolutionarily may have diverged at the level of primary amino acid sequence while maintaining similar secondary structures. Computer analysis has been used to compare the open reading frames of the hepatitis B virus to those of the woodchuck hepatitis virus at the level of amino acid sequence, and to predict the relative hydrophilic character and the secondary structure of putative polypeptides. Similarity is seen at the levels of relative hydrophilicity and secondary structure, in the absence of sequence homology. These data reinforce the proposal that these open reading frames encode viral proteins. Computer analysis of this type can be more generally used to establish structural similarities between proteins that do not share obvious sequence homology as well as to assess whether an open reading frame is fortuitous or codes for a protein. PMID:6585835

  3. The sequence of sequencers: The history of sequencing DNA.

    PubMed

    Heather, James M; Chain, Benjamin

    2016-01-01

    Determining the order of nucleic acid residues in biological samples is an integral component of a wide variety of research applications. Over the last fifty years large numbers of researchers have applied themselves to the production of techniques and technologies to facilitate this feat, sequencing DNA and RNA molecules. This time-scale has witnessed tremendous changes, moving from sequencing short oligonucleotides to millions of bases, from struggling towards the deduction of the coding sequence of a single gene to rapid and widely available whole genome sequencing. This article traverses those years, iterating through the different generations of sequencing technology, highlighting some of the key discoveries, researchers, and sequences along the way. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  4. Comparative sequence analysis of acid sensitive/resistance proteins in Escherichia coli and Shigella flexneri

    PubMed Central

    Manikandan, Selvaraj; Balaji, Seetharaaman; Kumar, Anil; Kumar, Rita

    2007-01-01

    The molecular basis for the survival of bacteria under extreme conditions in which growth is inhibited is a question of great current interest. A preliminary study was carried out to determine residue pattern conservation among the antiporters of enteric bacteria, responsible for extreme acid sensitivity especially in Escherichia coli and Shigella flexneri. Here we found the molecular evidence that proved the relationship between E. coli and S. flexneri. Multiple sequence alignment of the gadC coded acid sensitive antiporter showed many conserved residue patterns at regular intervals at the N-terminal region. It was observed that as the alignment approaches towards the C-terminal, the number of conserved residues decreases, indicating that the N-terminal region of this protein has much active role when compared to the carboxyl terminal. The motif, FHLVFFLLLGG, is well conserved within the entire gadC coded protein at the amino terminal. The motif is also partially conserved among other antiporters (which are not coded by gadC) but involved in acid sensitive/resistance mechanism. Phylogenetic cluster analysis proves the relationship of Escherichia coli and Shigella flexneri. The gadC coded proteins are converged as a clade and diverged from other antiporters belongs to the amino acid-polyamine-organocation (APC) superfamily. PMID:21670792

  5. From amino acid sequence to bioactivity: The biomedical potential of antitumor peptides.

    PubMed

    Blanco-Míguez, Aitor; Gutiérrez-Jácome, Alberto; Pérez-Pérez, Martín; Pérez-Rodríguez, Gael; Catalán-García, Sandra; Fdez-Riverola, Florentino; Lourenço, Anália; Sánchez, Borja

    2016-06-01

    Chemoprevention is the use of natural and/or synthetic substances to block, reverse, or retard the process of carcinogenesis. In this field, the use of antitumor peptides is of interest as, (i) these molecules are small in size, (ii) they show good cell diffusion and permeability, (iii) they affect one or more specific molecular pathways involved in carcinogenesis, and (iv) they are not usually genotoxic. We have checked the Web of Science Database (23/11/2015) in order to collect papers reporting on bioactive peptide (1691 registers), which was further filtered searching terms such as "antiproliferative," "antitumoral," or "apoptosis" among others. Works reporting the amino acid sequence of an antiproliferative peptide were kept (60 registers), and this was complemented with the peptides included in CancerPPD, an extensive resource for antiproliferative peptides and proteins. Peptides were grouped according to one of the following mechanism of action: inhibition of cell migration, inhibition of tumor angiogenesis, antioxidative mechanisms, inhibition of gene transcription/cell proliferation, induction of apoptosis, disorganization of tubulin structure, cytotoxicity, or unknown mechanisms. The main mechanisms of action of those antiproliferative peptides with known amino acid sequences are presented and finally, their potential clinical usefulness and future challenges on their application is discussed. © 2016 The Protein Society.

  6. From amino acid sequence to bioactivity: The biomedical potential of antitumor peptides

    PubMed Central

    Blanco‐Míguez, Aitor; Gutiérrez‐Jácome, Alberto; Pérez‐Pérez, Martín; Pérez‐Rodríguez, Gael; Catalán‐García, Sandra; Fdez‐Riverola, Florentino; Lourenço, Anália

    2016-01-01

    Abstract Chemoprevention is the use of natural and/or synthetic substances to block, reverse, or retard the process of carcinogenesis. In this field, the use of antitumor peptides is of interest as, (i) these molecules are small in size, (ii) they show good cell diffusion and permeability, (iii) they affect one or more specific molecular pathways involved in carcinogenesis, and (iv) they are not usually genotoxic. We have checked the Web of Science Database (23/11/2015) in order to collect papers reporting on bioactive peptide (1691 registers), which was further filtered searching terms such as “antiproliferative,” “antitumoral,” or “apoptosis” among others. Works reporting the amino acid sequence of an antiproliferative peptide were kept (60 registers), and this was complemented with the peptides included in CancerPPD, an extensive resource for antiproliferative peptides and proteins. Peptides were grouped according to one of the following mechanism of action: inhibition of cell migration, inhibition of tumor angiogenesis, antioxidative mechanisms, inhibition of gene transcription/cell proliferation, induction of apoptosis, disorganization of tubulin structure, cytotoxicity, or unknown mechanisms. The main mechanisms of action of those antiproliferative peptides with known amino acid sequences are presented and finally, their potential clinical usefulness and future challenges on their application is discussed. PMID:27010507

  7. Complete nucleotide and derived amino acid sequence of cDNA encoding the mitochondrial uncoupling protein of rat brown adipose tissue: lack of a mitochondrial targeting presequence.

    PubMed Central

    Ridley, R G; Patel, H V; Gerber, G E; Morton, R C; Freeman, K B

    1986-01-01

    A cDNA clone spanning the entire amino acid sequence of the nuclear-encoded uncoupling protein of rat brown adipose tissue mitochondria has been isolated and sequenced. With the exception of the N-terminal methionine the deduced N-terminus of the newly synthesized uncoupling protein is identical to the N-terminal 30 amino acids of the native uncoupling protein as determined by protein sequencing. This proves that the protein contains no N-terminal mitochondrial targeting prepiece and that a targeting region must reside within the amino acid sequence of the mature protein. Images PMID:3012461

  8. TaALMT1 promoter sequence compositions, acid tolerance, and Al tolerance in wheat cultivars and landraces from Sichuan in China.

    PubMed

    Han, C; Dai, S F; Liu, D C; Pu, Z J; Wei, Y M; Zheng, Y L; Wen, D J; Zhao, L; Yan, Z H

    2013-11-18

    Previous genetic studies on wheat from various sources have indicated that aluminum (Al) tolerance may have originated independently in USA, Brazil, and China. Here, TaALMT1 promoter sequences of 92 landraces and cultivars from Sichuan, China, were sequenced. Five promoter types (I', II, III, IV, and V) were observed in 39 cultivars, and only three promoter types (I, II, and III) were observed in 53 landraces. Among the wheat collections worldwide, only the Chinese Spring (CS) landrace native to Sichuan, China, carried the TaALMT1 promoter type III. Besides CS, two other Sichuan-bred landraces and six cultivars with TaALMT1 promoter type III were identified in this study. In the phylogenetic tree constructed based on the TaALMT1 promoter sequences, type III formed a separate branch, which was supported by a high bootstrap value. It is likely that TaALMT1 promoter type III originated from Sichuan-bred wheat landraces of China. In addition, the landraces with promoter type I showed the lowest Al tolerance among all landraces and cultivars. Furthermore, the cultivars with promoter type IV showed better Al tolerance than landraces with promoter type II. A comparison of acid tolerance and Al tolerance between cultivars and landraces showed that the landraces had better acid tolerance than the cultivars, whereas the cultivars showed better Al tolerance than the landraces. Moreover, significant difference in Al tolerance was also observed between the cultivars raised by the National Ministry of Agriculture and by Sichuan Province. Among the landraces from different regions, those from the East showed better acid tolerance and Al tolerance than those from the South and West of Sichuan. Additional Al-tolerant and acid-tolerant wheat lines were also identified.

  9. Isolation, sequencing and expression of RED, a novel human gene encoding an acidic-basic dipeptide repeat.

    PubMed

    Assier, E; Bouzinba-Segard, H; Stolzenberg, M C; Stephens, R; Bardos, J; Freemont, P; Charron, D; Trowsdale, J; Rich, T

    1999-04-16

    A novel human gene RED, and the murine homologue, MuRED, were cloned. These genes were named after the extensive stretch of alternating arginine (R) and glutamic acid (E) or aspartic acid (D) residues that they contain. We term this the 'RED' repeat. The genes of both species were expressed in a wide range of tissues and we have mapped the human gene to chromosome 5q22-24. MuRED and RED shared 98% sequence identity at the amino acid level. The open reading frame of both genes encodes a 557 amino acid protein. RED fused to a fluorescent tag was expressed in nuclei of transfected cells and localised to nuclear dots. Co-localisation studies showed that these nuclear dots did not contain either PML or Coilin, which are commonly found in the POD or coiled body nuclear compartments. Deletion of the amino terminal 265 amino acids resulted in a failure to sort efficiently to the nucleus, though nuclear dots were formed. Deletion of a further 50 amino acids from the amino terminus generates a protein that can sort to the nucleus but is unable to generate nuclear dots. Neither construct localised to the nucleolus. The characteristics of RED and its nuclear localisation implicate it as a regulatory protein, possibly involved in transcription.

  10. Genome Sequence of Lactobacillus rhamnosus Strain CASL, an Efficient l-Lactic Acid Producer from Cheap Substrate Cassava

    PubMed Central

    Yu, Bo; Su, Fei; Wang, Limin; Zhao, Bo; Qin, Jiayang; Ma, Cuiqing; Xu, Ping; Ma, Yanhe

    2011-01-01

    Lactobacillus rhamnosus is a type of probiotic bacteria with industrial potential for l-lactic acid production. We announce the draft genome sequence of L. rhamnosus CASL (2,855,156 bp with a G+C content of 46.6%), which is an efficient producer of l-lactic acid from cheap, nonfood substrate cassava with a high production titer. PMID:22123765

  11. Fluorescence energy transfer as a probe for nucleic acid structures and sequences.

    PubMed Central

    Mergny, J L; Boutorine, A S; Garestier, T; Belloc, F; Rougée, M; Bulychev, N V; Koshkin, A A; Bourson, J; Lebedev, A V; Valeur, B

    1994-01-01

    The primary or secondary structure of single-stranded nucleic acids has been investigated with fluorescent oligonucleotides, i.e., oligonucleotides covalently linked to a fluorescent dye. Five different chromophores were used: 2-methoxy-6-chloro-9-amino-acridine, coumarin 500, fluorescein, rhodamine and ethidium. The chemical synthesis of derivatized oligonucleotides is described. Hybridization of two fluorescent oligonucleotides to adjacent nucleic acid sequences led to fluorescence excitation energy transfer between the donor and the acceptor dyes. This phenomenon was used to probe primary and secondary structures of DNA fragments and the orientation of oligodeoxynucleotides synthesized with the alpha-anomers of nucleoside units. Fluorescence energy transfer can be used to reveal the formation of hairpin structures and the translocation of genes between two chromosomes. PMID:8152922

  12. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification

    PubMed Central

    Schouten, Jan P.; McElgunn, Cathal J.; Waaijer, Raymond; Zwijnenburg, Danny; Diepvens, Filip; Pals, Gerard

    2002-01-01

    We describe a new method for relative quantification of 40 different DNA sequences in an easy to perform reaction requiring only 20 ng of human DNA. Applications shown of this multiplex ligation-dependent probe amplification (MLPA) technique include the detection of exon deletions and duplications in the human BRCA1, MSH2 and MLH1 genes, detection of trisomies such as Down’s syndrome, characterisation of chromosomal aberrations in cell lines and tumour samples and SNP/mutation detection. Relative quantification of mRNAs by MLPA will be described elsewhere. In MLPA, not sample nucleic acids but probes added to the samples are amplified and quantified. Amplification of probes by PCR depends on the presence of probe target sequences in the sample. Each probe consists of two oligonucleotides, one synthetic and one M13 derived, that hybridise to adjacent sites of the target sequence. Such hybridised probe oligonucleotides are ligated, permitting subsequent amplification. All ligated probes have identical end sequences, permitting simultaneous PCR amplification using only one primer pair. Each probe gives rise to an amplification product of unique size between 130 and 480 bp. Probe target sequences are small (50–70 nt). The prerequisite of a ligation reaction provides the opportunity to discriminate single nucleotide differences. PMID:12060695

  13. Relative quantification of 40 nucleic acid sequences by multiplex ligation-dependent probe amplification.

    PubMed

    Schouten, Jan P; McElgunn, Cathal J; Waaijer, Raymond; Zwijnenburg, Danny; Diepvens, Filip; Pals, Gerard

    2002-06-15

    We describe a new method for relative quantification of 40 different DNA sequences in an easy to perform reaction requiring only 20 ng of human DNA. Applications shown of this multiplex ligation-dependent probe amplification (MLPA) technique include the detection of exon deletions and duplications in the human BRCA1, MSH2 and MLH1 genes, detection of trisomies such as Down's syndrome, characterisation of chromosomal aberrations in cell lines and tumour samples and SNP/mutation detection. Relative quantification of mRNAs by MLPA will be described elsewhere. In MLPA, not sample nucleic acids but probes added to the samples are amplified and quantified. Amplification of probes by PCR depends on the presence of probe target sequences in the sample. Each probe consists of two oligonucleotides, one synthetic and one M13 derived, that hybridise to adjacent sites of the target sequence. Such hybridised probe oligonucleotides are ligated, permitting subsequent amplification. All ligated probes have identical end sequences, permitting simultaneous PCR amplification using only one primer pair. Each probe gives rise to an amplification product of unique size between 130 and 480 bp. Probe target sequences are small (50-70 nt). The prerequisite of a ligation reaction provides the opportunity to discriminate single nucleotide differences.

  14. Experimental and Theoretical Studies on the Nazarov Cyclization/Wagner-Meerwein Rearrangement Sequence

    PubMed Central

    Lebœuf, David; Ciesielski, Jennifer

    2012-01-01

    Highly functionalized cyclopentenones can be generated stereospecifically by a chemoselective copper(II)-mediated Nazarov/Wagner-Meerwein rearrangement sequence of divinyl ketones. A detailed investigation of this sequence is described including a study of substrate scope and limitations. After the initial 4π electrocyclization, this reaction proceeds via two different sequential [1,2]-shifts, with selectivity that depends upon either migratory ability or the steric bulkiness of the substituents at C1 and C5. This methodology allows the creation of vicinal stereogenic centers, including adjacent quaternary centers. This sequence can also be achieved by using a catalytic amount of copper(II) in combination with NaBAr4f, a weak Lewis acid. During the study of the scope of the reaction, a partial or complete E / Z isomerization of the enone moiety was observed in some cases prior to the cyclization, which resulted in a mixture of diastereomeric products. Use of a Cu(II)-bisoxazoline complex prevented the isomerization, allowing high diastereoselectivity to be obtained in all substrate types. In addition, the reaction sequence was studied by DFT computations at the UB3LYP/6-31G(d,p) level, which are consistent with the proposed sequences observed, including E / Z isomerizations and chemoselective Wagner-Meerwein shifts. PMID:22471833

  15. Sequence Diversity Diagram for comparative analysis of multiple sequence alignments.

    PubMed

    Sakai, Ryo; Aerts, Jan

    2014-01-01

    The sequence logo is a graphical representation of a set of aligned sequences, commonly used to depict conservation of amino acid or nucleotide sequences. Although it effectively communicates the amount of information present at every position, this visual representation falls short when the domain task is to compare between two or more sets of aligned sequences. We present a new visual presentation called a Sequence Diversity Diagram and validate our design choices with a case study. Our software was developed using the open-source program called Processing. It loads multiple sequence alignment FASTA files and a configuration file, which can be modified as needed to change the visualization. The redesigned figure improves on the visual comparison of two or more sets, and it additionally encodes information on sequential position conservation. In our case study of the adenylate kinase lid domain, the Sequence Diversity Diagram reveals unexpected patterns and new insights, for example the identification of subgroups within the protein subfamily. Our future work will integrate this visual encoding into interactive visualization tools to support higher level data exploration tasks.

  16. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences

    PubMed Central

    Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong

    2015-01-01

    Abstract We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate—slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory. PMID:25549288

  17. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

    PubMed

    Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong; Warnow, Tandy

    2015-05-01

    We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.

  18. Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition.

    PubMed

    Tamura, Takeyuki; Akutsu, Tatsuya

    2007-11-30

    Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on http://sunflower.kuicr.kyoto-u.ac.jp/~tamura/slpfa.html.

  19. Construction Strategy for an Internal Amplification Control for Real-Time Diagnostic Assays Using Nucleic Acid Sequence-Based Amplification: Development and Clinical Application

    PubMed Central

    Rodríguez-Lázaro, David; D'Agostino, Martin; Pla, Maria; Cook, Nigel

    2004-01-01

    An important analytical control in molecular amplification-based methods is an internal amplification control (IAC), which should be included in each reaction mixture. An IAC is a nontarget nucleic acid sequence which is coamplified simultaneously with the target sequence. With negative results for the target nucleic acid, the absence of an IAC signal indicates that amplification has failed. A general strategy for the construction of an IAC for inclusion in molecular beacon-based real-time nucleic acid sequence-based amplification (NASBA) assays is presented. Construction proceeds in two phases. In the first phase, a double-stranded DNA molecule that contains nontarget sequences flanked by target sequences complementary to the NASBA primers is produced. At the 5′ end of this DNA molecule is a T7 RNA polymerase binding sequence. In the second phase of construction, RNA transcripts are produced from the DNA by T7 RNA polymerase. This RNA is the IAC; it is amplified by the target NASBA primers and is detected by a molecular beacon probe complementary to the internal nontarget sequences. As a practical example, an IAC for use in an assay for the detection of Mycobacterium avium subsp. paratuberculosis is described, its incorporation and optimization within the assay are detailed, and its application to spiked and natural clinical samples is shown to illustrate the correct interpretation of the diagnostic results. PMID:15583319

  20. Amino acid sequences of peptides from a chymotryptic digest of a urea-soluble protein fraction (U.S.3) from oxidized wool

    PubMed Central

    Corfield, M. C.; Fletcher, J. C.

    1969-01-01

    1. A chymotryptic digest of the protein fraction U.S.3. from oxidized wool was separated into 51 peptide fractions by chromatography on a column of cation-exchange resin. 2. The less acidic fractions were separated into their component peptides by a combination of cation-exchange-resin chromatography, paper chromatography and paper electrophoresis. 3. The amino acid sequences of 34 of these peptides were elucidated, and those of 14 others partially determined. 4. Overlaps between the tryptic and chymotryptic peptides from fraction U.S.3 have enabled ten extended amino acid sequences to be deduced, the longest containing 20 amino acid residues. 5. The relevance of the results to the structures of the helical and non-helical regions of wool is discussed. PMID:5395876

  1. An improved TCF sequence for biobleaching kenaf pulp: influence of the hexenuronic acid content and the use of xylanase.

    PubMed

    Andreu, Glòria; Vidal, Teresa

    2014-01-01

    Enzymatic delignification with laccase from Trametes villosa used in combination with chemical mediators (acetosyringone, acetovanillone and 1-hydroxybenzotriazole) to improve the totally chlorine-free (TCF) bleaching of kenaf pulp was studied. The best final pulp properties were obtained by using an LHBTQPo sequence developed by incorporating a laccase-mediator stage into an industrial bleaching sequence involving chelation and peroxide stages. The new sequence resulted in increased kenaf pulp delignification (90.4%) and brightness (77.2%ISO) relative to a conventional TCF chemical sequence (74.5% delignification and 74.5% brightness). Also, the sequence provided bleached kenaf fibers with high cellulose content (pulp viscosity of 890 g·mL(-1) vs 660 g·mL(-1)). Scanning electron micrographs revealed that xylanase altered fiber surfaces and facilitated reagent access as a result. However, the LHBTX (xylanase) stage removed 21% of hexenuronic acids in kenaf pulp. These recalcitrant compounds spent additional bleaching reagents and affected pulp properties after peroxide stage. Copyright © 2013 Elsevier Ltd. All rights reserved.

  2. Formation of specific amino acid sequences during carbodiimide-mediated condensation of amino acids in aqueous solution, and computer-simulated sequence generation

    NASA Astrophysics Data System (ADS)

    Hartmann, Jürgen; Nawroth, Thomas; Dose, Klaus

    1984-12-01

    Carbodiimide-mediated peptide synthesis in aqueous solution has been studied with respect to self-ordering of amino acids. The copolymerisation of amino acids in the presence of glutamic acid or pyroglutamic acid leads to short pyroglutamyl peptides. Without pyroglutamic acid the formation of higher polymers is favoured. The interactions of the amino acids and the peptides, however, are very complex. Therefore, the experimental results are rather difficult to explain. Some of the experimental results, however, can be explained with the aid of computer simulation programs. Regarding only the tripeptide fraction the copolymerisation of pyroGlu, Ala and Leu, as well as the simulated copolymerisation lead to pyroGlu-Ala-Leu as the main reaction product. The amino acid composition of the insoluble peptides formed during the copolymerisation of Ser, Gly, Ala, Val, Phe, Leu and Ile corresponds in part to the computer-simulated copolymerisation data.

  3. fCCAC: functional canonical correlation analysis to evaluate covariance between nucleic acid sequencing datasets.

    PubMed

    Madrigal, Pedro

    2017-03-01

    Computational evaluation of variability across DNA or RNA sequencing datasets is a crucial step in genomic science, as it allows both to evaluate reproducibility of biological or technical replicates, and to compare different datasets to identify their potential correlations. Here we present fCCAC, an application of functional canonical correlation analysis to assess covariance of nucleic acid sequencing datasets such as chromatin immunoprecipitation followed by deep sequencing (ChIP-seq). We show how this method differs from other measures of correlation, and exemplify how it can reveal shared covariance between histone modifications and DNA binding proteins, such as the relationship between the H3K4me3 chromatin mark and its epigenetic writers and readers. An R/Bioconductor package is available at http://bioconductor.org/packages/fCCAC/ . pmb59@cam.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  4. Prediction of beta-turns from amino acid sequences using the residue-coupled model.

    PubMed

    Guruprasad, K; Shukla, S

    2003-04-01

    We evaluated the prediction of beta-turns from amino acid sequences using the residue-coupled model with an enlarged representative protein data set selected from the Protein Data Bank. Our results show that the probability values derived from a data set comprising 425 protein chains yielded an overall beta-turn prediction accuracy 68.74%, compared with 94.7% reported earlier on a data set of 30 proteins using the same method. However, we noted that the overall beta-turn prediction accuracy using probability values derived from the 30-protein data set reduces to 40.74% when tested on the data set comprising 425 protein chains. In contrast, using probability values derived from the 425 data set used in this analysis, the overall beta-turn prediction accuracy yielded consistent results when tested on either the 30-protein data set (64.62%) used earlier or a more recent representative data set comprising 619 protein chains (64.66%) or on a jackknife data set comprising 476 representative protein chains (63.38%). We therefore recommend the use of probability values derived from the 425 representative protein chains data set reported here, which gives more realistic and consistent predictions of beta-turns from amino acid sequences.

  5. The shikimate pathway: review of amino acid sequence, function and three-dimensional structures of the enzymes.

    PubMed

    Mir, Rafia; Jallu, Shais; Singh, T P

    2015-06-01

    The aromatic compounds such as aromatic amino acids, vitamin K and ubiquinone are important prerequisites for the metabolism of an organism. All organisms can synthesize these aromatic metabolites through shikimate pathway, except for mammals which are dependent on their diet for these compounds. The pathway converts phosphoenolpyruvate and erythrose 4-phosphate to chorismate through seven enzymatically catalyzed steps and chorismate serves as a precursor for the synthesis of variety of aromatic compounds. These enzymes have shown to play a vital role for the viability of microorganisms and thus are suggested to present attractive molecular targets for the design of novel antimicrobial drugs. This review focuses on the seven enzymes of the shikimate pathway, highlighting their primary sequences, functions and three-dimensional structures. The understanding of their active site amino acid maps, functions and three-dimensional structures will provide a framework on which the rational design of antimicrobial drugs would be based. Comparing the full length amino acid sequences and the X-ray crystal structures of these enzymes from bacteria, fungi and plant sources would contribute in designing a specific drug and/or in developing broad-spectrum compounds with efficacy against a variety of pathogens.

  6. Ultra high-throughput nucleic acid sequencing as a tool for virus discovery in the turkey gut.

    USDA-ARS?s Scientific Manuscript database

    Recently, the use of the next generation of nucleic acid sequencing technology (i.e., 454 pyrosequencing, as developed by Roche/454 Life Sciences) has allowed an in-depth look at the uncultivated microorganisms present in complex environmental samples, including samples with agricultural importance....

  7. Fast computational methods for predicting protein structure from primary amino acid sequence

    DOEpatents

    Agarwal, Pratul Kumar [Knoxville, TN

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  8. Complete amino acid sequence of the myoglobin from the Pacific sei whale, Balaenoptera borealis.

    PubMed

    Jones, B N; Rothgeb, T M; England, R D; Gurd, F R

    1979-04-25

    The complete amino acid sequence of the major component myoglobin from Pacific sei whale, Balaenoptera borealis, was determined by specific cleavage of the protein to obtain large peptides which are readily degraded by the automatic sequencer. The acetimidated apomyoglobin was selectively cleaved at its two methionyl residues with cyanogen bromide and at its three arginyl residues by trypsin. From the sequence analysis of four of these peptides and the apomyoglobin, over 75% of the covalent structure of the protein was obtained. The remainder of the primary structure was determined by the sequence analysis of peptides that resulted from further digestion of the amino-terminal and central cyanogen bromide fragments. The amino-terminal fragment was specifically cleaved at its two tryptophanyl residues with N-chlorosuccinimide and the central cyanogen bromide fragment was cleaved at its glutamyl residues with staphylococcal protease and at its single tyrosyl residue with N-bromosuccinimide. The primary structure of this myoglobin proved identical with that from the gray whale but differs from that of the finback whale at four positions, from that of the minke whale at three positions and from the myoglobin of the humpback whale at one position. The above sequence identities and differences reflect the close taxonomic relationship of these five species of Cetacea.

  9. Codes in the codons: construction of a codon/amino acid periodic table and a study of the nature of specific nucleic acid-protein interactions.

    PubMed

    Benyo, B; Biro, J C; Benyo, Z

    2004-01-01

    The theory of "codon-amino acid coevolution" was first proposed by Woese in 1967. It suggests that there is a stereochemical matching - that is, affinity - between amino acids and certain of the base triplet sequences that code for those amino acids. We have constructed a common periodic table of codons and amino acids, where the nucleic acid table showed perfect axial symmetry for codons and the corresponding amino acid table also displayed periodicity regarding the biochemical properties (charge and hydrophobicity) of the 20 amino acids and the position of the stop signals. The table indicates that the middle (2/sup nd/) amino acid in the codon has a prominent role in determining some of the structural features of the amino acids. The possibility that physical contact between codons and amino acids might exist was tested on restriction enzymes. Many recognition site-like sequences were found in the coding sequences of these enzymes and as many as 73 examples of codon-amino acid co-location were observed in the 7 known 3D structures (December 2003) of endonuclease-nucleic acid complexes. These results indicate that the smallest possible units of specific nucleic acid-protein interaction are indeed the stereochemically compatible codons and amino acids.

  10. Amino acid sequence surrounding the chondroitin sulfate attachment site of thrombomodulin regulates chondroitin polymerization.

    PubMed

    Izumikawa, Tomomi; Kitagawa, Hiroshi

    2015-05-01

    Thrombomodulin (TM) is a cell-surface glycoprotein and a critical mediator of endothelial anticoagulant function. TM exists as both a chondroitin sulfate (CS) proteoglycan (PG) form and a non-PG form lacking a CS chain (α-TM); therefore, TM can be described as a part-time PG. Previously, we reported that α-TM bears an immature, truncated linkage tetrasaccharide structure (GlcAβ1-3Galβ1-3Galβ1-4Xyl). However, the biosynthetic mechanism to generate part-time PGs remains unclear. In this study, we used several mutants to demonstrate that the amino acid sequence surrounding the CS attachment site influences the efficiency of chondroitin polymerization. In particular, the presence of acidic residues surrounding the CS attachment site was indispensable for the elongation of CS. In addition, mutants defective in CS elongation did not exhibit anti-coagulant activity, as in the case with α-TM. Together, these data support a model for CS chain assembly in which specific core protein determinants are recognized by a key biosynthetic enzyme involved in chondroitin polymerization. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Synthesis and evaluations of an acid-cleavable, fluorescently labeled nucleotide as a reversible terminator for DNA sequencing.

    PubMed

    Tan, Lianjiang; Liu, Yazhi; Li, Xiaowei; Wu, Xin-Yan; Gong, Bing; Shen, Yu-Mei; Shao, Zhifeng

    2016-02-11

    An acid-cleavable linker based on a dimethylketal moiety was synthesized and used to connect a nucleotide with a fluorophore to produce a 3'-OH unblocked nucleotide analogue as an excellent reversible terminator for DNA sequencing by synthesis.

  12. Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.

    PubMed

    Tan, Yen Hock; Huang, He; Kihara, Daisuke

    2006-08-15

    Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.

  13. Nucleotide and deduced amino acid sequence of the envelope gene of the Vasilchenko strain of TBE virus; comparison with other flaviviruses.

    PubMed

    Gritsun, T S; Frolova, T V; Pogodina, V V; Lashkevich, V A; Venugopal, K; Gould, E A

    1993-02-01

    A strain of tick-borne encephalitis virus known as Vasilchenko (Vs) exhibits relatively low virulence characteristics in monkeys, Syrian hamsters and humans. The gene encoding the envelope glycoprotein of this virus was cloned and sequenced. Alignment of the sequence with those of other known tick-borne flaviviruses and identification of the recognised amino acid genetic marker EHLPTA confirmed its identity as a member of the TBE complex. However, Vs virus was distinguishable from eastern and western tick-borne serotypes by the presence of the sequence AQQ at amino acid positions 232-234 and also by the presence of other specific amino acid substitutions which may be genetic markers for these viruses and could determine their pathogenetic characteristics. When compared with other tick-borne flaviviruses, Vs virus had 12 unique amino acid substitutions including an additional potential glycosylation site at position (315-317). The Vs virus strain shared closest nucleotide and amino acid homology (84.5% and 95.5% respectively) with western and far eastern strains of tick-borne encephalitis virus. Comparison with the far eastern serotype of tick-borne encephalitis virus, by cross-immunoelectrophoresis of Vs virions and PAGE analysis of the extracted virion proteins, revealed differences in surface charge and virus stability that may account for the different virulence characteristics of Vs virus. These results support and enlarge upon previous data obtained from molecular and serological analysis.

  14. Cleavage of nucleic acids

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor L.; Brow, Mary Ann D.; Dahlberg, James E.

    2007-12-11

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  15. Cleavage of nucleic acids

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow; Mary Ann D.; Dahlberg, James E.

    2010-11-09

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  16. Cleavage of nucleic acids

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann D.; Dahlberg, James E.

    2000-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  17. Nucleic acid detection assays

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann; Dahlberg, James E.

    2005-04-05

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  18. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    PubMed Central

    Rhee, Mun Su; Moritz, Brélan E.; Xie, Gary; Glavina del Rio, T.; Dalin, E.; Tice, H.; Bruce, D.; Goodwin, L.; Chertkov, O.; Brettin, T.; Han, C.; Detter, C.; Pitluck, S.; Land, Miriam L.; Patel, Milind; Ou, Mark; Harbrucker, Roberta; Ingram, Lonnie O.; Shanmugam, K. T.

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 °C and pH 5.0 and ferments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 °C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemicellulose. This bacterium is also considered as a potential probiotic. Complete genome sequence of a representative strain, B. coagulans strain 36D1, is presented and discussed. PMID:22675583

  19. Full genome sequence of Rocio virus reveal substantial variations from the prototype Rocio virus SPH 34675 sequence.

    PubMed

    Setoh, Yin Xiang; Amarilla, Alberto A; Peng, Nias Y; Slonchak, Andrii; Periasamy, Parthiban; Figueiredo, Luiz T M; Aquino, Victor H; Khromykh, Alexander A

    2018-01-01

    Rocio virus (ROCV) is an arbovirus belonging to the genus Flavivirus, family Flaviviridae. We present an updated sequence of ROCV strain SPH 34675 (GenBank: AY632542.4), the only available full genome sequence prior to this study. Using next-generation sequencing of the entire genome, we reveal substantial sequence variation from the prototype sequence, with 30 nucleotide differences amounting to 14 amino acid changes, as well as significant changes to predicted 3'UTR RNA structures. Our results present an updated and corrected sequence of a potential emerging human-virulent flavivirus uniquely indigenous to Brazil (GenBank: MF461639).

  20. The hypervariable region 1 protein of hepatitis C virus broadly reactive with sera of patients with chronic hepatitis C has a similar amino acid sequence with the consensus sequence.

    PubMed

    Watanabe, K; Yoshioka, K; Ito, H; Ishigami, M; Takagi, K; Utsunomiya, S; Kobayashi, M; Kishimoto, H; Yano, M; Kakumu, S

    1999-11-10

    Hypervariable region 1 (HVR1) proteins of hepatitis C virus (HCV) have been reported to react broadly with sera of patients with HCV infection. However, the variability of the broad reactivity of individual HVR1 proteins has not been elucidated. We assessed the reactivity of 25 different HVR1 proteins (genotype 1b) with sera of 81 patients with HCV infection (genotype 1b) by Western blot. HVR1 proteins reacted with 2-60 sera. The number of sera reactive with each HVR1 protein significantly correlated with the number of amino acid residues identical to the consensus sequence defined by Puntoriero et al. (G. Puntoriero, A. Lahm, S. Zucchelli, B. B. Ercole, R. Tafi, M. Penzzanera, M. U. Mondelli, R. Cortese, A. Tramontano, G. Galfre', and A. Nicosia. 1998. EMBO J. 17, 3521-3533. ) (r = 0.561, P < 0.005). The most widely reactive HVR1 protein, 12-22, had a sequence similar to the consensus sequence. The peptide with C-terminal 13-amino-acids sequence of HVR1 protein 12-22 (NH2-CSFTSLFTPGPSQK) was injected into rabbits as an immunogen. The rabbit immune sera reacted with 9 of 25 HVR1 proteins of genotype 1b including HVR1 protein 12-22 and with 3 of 12 proteins of genotype 2a. These results indicate that the HVR1 protein broadly reactive with patients' sera has a sequence similar to the consensus sequence, can induce broadly reactive sera, and could be one of the candidate immunogens in a prophylactic vaccine against HCV. Copyright 1999 Academic Press.

  1. Position-dependent effects of locked nucleic acid (LNA) on DNA sequencing and PCR primers

    PubMed Central

    Levin, Joshua D.; Fiala, Dean; Samala, Meinrado F.; Kahn, Jason D.; Peterson, Raymond J.

    2006-01-01

    Genomes are becoming heavily annotated with important features. Analysis of these features often employs oligonucleotides that hybridize at defined locations. When the defined location lies in a poor sequence context, traditional design strategies may fail. Locked Nucleic Acid (LNA) can enhance oligonucleotide affinity and specificity. Though LNA has been used in many applications, formal design rules are still being defined. To further this effort we have investigated the effect of LNA on the performance of sequencing and PCR primers in AT-rich regions, where short primers yield poor sequencing reads or PCR yields. LNA was used in three positional patterns: near the 5′ end (LNA-5′), near the 3′ end (LNA-3′) and distributed throughout (LNA-Even). Quantitative measures of sequencing read length (Phred Q30 count) and real-time PCR signal (cycle threshold, CT) were characterized using two-way ANOVA. LNA-5′ increased the average Phred Q30 score by 60% and it was never observed to decrease performance. LNA-5′ generated cycle thresholds in quantitative PCR that were comparable to high-yielding conventional primers. In contrast, LNA-3′ and LNA-Even did not improve read lengths or CT. ANOVA demonstrated the statistical significance of these results and identified significant interaction between the positional design rule and primer sequence. PMID:17071964

  2. Comparison of the nucleotide and amino acid sequences of the RsrI and EcoRI restriction endonucleases.

    PubMed

    Stephenson, F H; Ballard, B T; Boyer, H W; Rosenberg, J M; Greene, P J

    1989-12-21

    The RsrI endonuclease, a type-II restriction endonuclease (ENase) found in Rhodobacter sphaeroides, is an isoschizomer of the EcoRI ENase. A clone containing an 11-kb BamHI fragment was isolated from an R. sphaeroides genomic DNA library by hybridization with synthetic oligodeoxyribonucleotide probes based on the N-terminal amino acid (aa) sequence of RsrI. Extracts of E. coli containing a subclone of the 11-kb fragment display RsrI activity. Nucleotide sequence analysis reveals an 831-bp open reading frame encoding a polypeptide of 277 aa. A 50% identity exists within a 266-aa overlap between the deduced aa sequences of RsrI and EcoRI. Regions of 75-100% aa sequence identity correspond to key structural and functional regions of EcoRI. The type-II ENases have many common properties, and a common origin might have been expected. Nevertheless, this is the first demonstration of aa sequence similarity between ENases produced by different organisms.

  3. Draft Genome Sequence of the Butyric Acid Producer Clostridium tyrobutyricum Strain CIP I-776 (IFP923).

    PubMed

    Wasels, François; Clément, Benjamin; Lopes Ferreira, Nicolas

    2016-03-03

    Here, we report the draft genome sequence of Clostridium tyrobutyricum CIP I-776 (IFP923), an efficient producer of butyric acid. The genome consists of a single chromosome of 3.19 Mb and provides useful data concerning the metabolic capacities of the strain. Copyright © 2016 Wasels et al.

  4. Sequence dependent aggregation of peptides and fibril formation

    NASA Astrophysics Data System (ADS)

    Hung, Nguyen Ba; Le, Duy-Manh; Hoang, Trinh X.

    2017-09-01

    Deciphering the links between amino acid sequence and amyloid fibril formation is key for understanding protein misfolding diseases. Here we use Monte Carlo simulations to study the aggregation of short peptides in a coarse-grained model with hydrophobic-polar (HP) amino acid sequences and correlated side chain orientations for hydrophobic contacts. A significant heterogeneity is observed in the aggregate structures and in the thermodynamics of aggregation for systems of different HP sequences and different numbers of peptides. Fibril-like ordered aggregates are found for several sequences that contain the common HPH pattern, while other sequences may form helix bundles or disordered aggregates. A wide variation of the aggregation transition temperatures among sequences, even among those of the same hydrophobic fraction, indicates that not all sequences undergo aggregation at a presumable physiological temperature. The transition is found to be the most cooperative for sequences forming fibril-like structures. For a fibril-prone sequence, it is shown that fibril formation follows the nucleation and growth mechanism. Interestingly, a binary mixture of peptides of an aggregation-prone and a non-aggregation-prone sequence shows the association and conversion of the latter to the fibrillar structure. Our study highlights the role of a sequence in selecting fibril-like aggregates and also the impact of a structural template on fibril formation by peptides of unrelated sequences.

  5. Large-Scale Concatenation cDNA Sequencing

    PubMed Central

    Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

    1997-01-01

    A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174

  6. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3

    PubMed Central

    Xiao, Jingfa; Hao, Lirui; Crowley, David E.; Zhang, Zhewen; Yu, Jun; Huang, Ning; Huo, Mingxin; Wu, Jiayan

    2015-01-01

    Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals. PMID:26301592

  7. Structure and Sequence Search on Aptamer-Protein Docking

    NASA Astrophysics Data System (ADS)

    Xiao, Jiajie; Bonin, Keith; Guthold, Martin; Salsbury, Freddie

    2015-03-01

    Interactions between proteins and deoxyribonucleic acid (DNA) play a significant role in the living systems, especially through gene regulation. However, short nucleic acids sequences (aptamers) with specific binding affinity to specific proteins exhibit clinical potential as therapeutics. Our capillary and gel electrophoresis selection experiments show that specific sequences of aptamers can be selected that bind specific proteins. Computationally, given the experimentally-determined structure and sequence of a thrombin-binding aptamer, we can successfully dock the aptamer onto thrombin in agreement with experimental structures of the complex. In order to further study the conformational flexibility of this thrombin-binding aptamer and to potentially develop a predictive computational model of aptamer-binding, we use GPU-enabled molecular dynamics simulations to both examine the conformational flexibility of the aptamer in the absence of binding to thrombin, and to determine our ability to fold an aptamer. This study should help further de-novo predictions of aptamer sequences by enabling the study of structural and sequence-dependent effects on aptamer-protein docking specificity.

  8. Genome Sequence of Lactobacillus sakei LK-145 Isolated from a Japanese Sake Cellar as a High Producer of d-Amino Acids

    PubMed Central

    Kato, Shiro

    2017-01-01

    ABSTRACT This announcement reports the complete genome sequence of strain LK-145 of Lactobacillus sakei isolated from a Japanese sake cellar as a potent strain for the production of large amounts of d-amino acids. Three putative genes encoding an amino acid racemase were identified. PMID:28818888

  9. Genome Sequence of Lactobacillus plantarum Strain UCMA 3037.

    PubMed

    Naz, Saima; Tareb, Raouf; Bernardeau, Marion; Vaisse, Melissa; Lucchetti-Miganeh, Celine; Rechenmann, Mathias; Vernoux, Jean-Paul

    2013-05-23

    Nucleic acid of the strain Lactobacillus plantarum UCMA 3037, isolated from raw milk camembert cheese in our laboratory, was sequenced. We present its draft genome sequence with the aim of studying its functional properties and relationship to the cheese ecosystem.

  10. Graphene Nanopores for Protein Sequencing.

    PubMed

    Wilson, James; Sloman, Leila; He, Zhiren; Aksimentiev, Aleksei

    2016-07-19

    An inexpensive, reliable method for protein sequencing is essential to unraveling the biological mechanisms governing cellular behavior and disease. Current protein sequencing methods suffer from limitations associated with the size of proteins that can be sequenced, the time, and the cost of the sequencing procedures. Here, we report the results of all-atom molecular dynamics simulations that investigated the feasibility of using graphene nanopores for protein sequencing. We focus our study on the biologically significant phenylalanine-glycine repeat peptides (FG-nups)-parts of the nuclear pore transport machinery. Surprisingly, we found FG-nups to behave similarly to single stranded DNA: the peptides adhere to graphene and exhibit step-wise translocation when subject to a transmembrane bias or a hydrostatic pressure gradient. Reducing the peptide's charge density or increasing the peptide's hydrophobicity was found to decrease the translocation speed. Yet, unidirectional and stepwise translocation driven by a transmembrane bias was observed even when the ratio of charged to hydrophobic amino acids was as low as 1:8. The nanopore transport of the peptides was found to produce stepwise modulations of the nanopore ionic current correlated with the type of amino acids present in the nanopore, suggesting that protein sequencing by measuring ionic current blockades may be possible.

  11. Genome wide identification of microRNAs involved in fatty acid and lipid metabolism of Brassica napus by small RNA and degradome sequencing.

    PubMed

    Wang, Zhiwei; Qiao, Yan; Zhang, Jingjing; Shi, Wenhui; Zhang, Jinwen

    2017-07-01

    Rapeseed (Brassica napus) is an important cash crop considered as the third largest oil crop worldwide. Rapeseed oil contains various saturation or unsaturation fatty acids, these fatty acids, whose could incorporation with TAG form into lipids stored in seeds play various roles in the metabolic activity. The different fatty acids in B. napus seeds determine oil quality, define if the oil is edible or must be used as industrial material. miRNAs are kind of non-coding sRNAs that could regulate gene expressions through post-transcriptional modification to their target transcripts playing important roles in plant metabolic activities. We employed high-throughput sequencing to identify the miRNAs and their target transcripts involved in fatty acids and lipids metabolism in different development of B. napus seeds. As a result, we identified 826 miRNA sequences, including 523 conserved and 303 newly miRNAs. From the degradome sequencing, we found 589 mRNA could be targeted by 236 miRNAs, it includes 49 novel miRNAs and 187 conserved miRNAs. The miRNA-target couple suggests that bna-5p-163957_18, bna-5p-396192_7, miR9563a-p3, miR9563b-p5, miR838-p3, miR156e-p3, miR159c and miR1134 could target PDP, LACS9, MFPA, ADSL1, ACO32, C0401, GDL73, PlCD6, OLEO3 and WSD1. These target transcripts are involving in acetyl-CoA generate and carbon chain desaturase, regulating the levels of very long chain fatty acids, β-oxidation and lipids transport and metabolism process. At the same, we employed the q-PCR to valid the expression of miRNAs and their target transcripts that involve in fatty acid and lipid metabolism, the result suggested that the miRNA and their transcript expression are negative correlation, which in accord with the expression of miRNA and its target transcript. The study findings suggest that the identified miRNA may play important role in the fatty acids and lipids metabolism in seeds of B. napus. Copyright © 2017 The Author(s). Published by Elsevier B.V. All

  12. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.

    PubMed

    Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook

    2014-11-01

    As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of

  13. Comparative In silico Study of Sex-Determining Region Y (SRY) Protein Sequences Involved in Sex-Determining.

    PubMed

    Vakili Azghandi, Masoume; Nasiri, Mohammadreza; Shamsa, Ali; Jalali, Mohsen; Shariati, Mohammad Mahdi

    2016-04-01

    The SRY gene (SRY) provides instructions for making a transcription factor called the sex-determining region Y protein. The sex-determining region Y protein causes a fetus to develop as a male. In this study, SRY of 15 spices included of human, chimpanzee, dog, pig, rat, cattle, buffalo, goat, sheep, horse, zebra, frog, urial, dolphin and killer whale were used for determine of bioinformatic differences. Nucleotide sequences of SRY were retrieved from the NCBI databank. Bioinformatic analysis of SRY is done by CLC Main Workbench version 5.5 and ClustalW (http:/www.ebi.ac.uk/clustalw/) and MEGA6 softwares. The multiple sequence alignment results indicated that SRY protein sequences from Orcinus orca (killer whale) and Tursiopsaduncus (dolphin) have least genetic distance of 0.33 in these 15 species and are 99.67% identical at the amino acid level. Homosapiens and Pantroglodytes (chimpanzee) have the next lowest genetic distance of 1.35 and are 98.65% identical at the amino acid level. These findings indicate that the SRY proteins are conserved in the 15 species, and their evolutionary relationships are similar.

  14. Terminal sequence importance of de novo proteins from binary-patterned library: stable artificial proteins with 11- or 12-amino acid alphabet.

    PubMed

    Okura, Hiromichi; Takahashi, Tsuyoshi; Mihara, Hisakazu

    2012-06-01

    Successful approaches of de novo protein design suggest a great potential to create novel structural folds and to understand natural rules of protein folding. For these purposes, smaller and simpler de novo proteins have been developed. Here, we constructed smaller proteins by removing the terminal sequences from stable de novo vTAJ proteins and compared stabilities between mutant and original proteins. vTAJ proteins were screened from an α3β3 binary-patterned library which was designed with polar/ nonpolar periodicities of α-helix and β-sheet. vTAJ proteins have the additional terminal sequences due to the method of constructing the genetically repeated library sequences. By removing the parts of the sequences, we successfully obtained the stable smaller de novo protein mutants with fewer amino acid alphabets than the originals. However, these mutants showed the differences on ANS binding properties and stabilities against denaturant and pH change. The terminal sequences, which were designed just as flexible linkers not as secondary structure units, sufficiently affected these physicochemical details. This study showed implications for adjusting protein stabilities by designing N- and C-terminal sequences.

  15. Genome sequence of the thermophilic strain Bacillus coagulans 2-6, an efficient producer of high-optical-purity L-lactic acid.

    PubMed

    Su, Fei; Yu, Bo; Sun, Jibin; Ou, Hong-Yu; Zhao, Bo; Wang, Limin; Qin, Jiayang; Tang, Hongzhi; Tao, Fei; Jarek, Michael; Scharfe, Maren; Ma, Cuiqing; Ma, Yanhe; Xu, Ping

    2011-09-01

    Bacillus coagulans 2-6 is an efficient producer of lactic acid. The genome of B. coagulans 2-6 has the smallest genome among the members of the genus Bacillus known to date. The frameshift mutation at the start of the d-lactate dehydrogenase sequence might be responsible for the production of high-optical-purity l-lactic acid.

  16. Taste, umami-enhance effect and amino acid sequence of peptides separated from silkworm pupa hydrolysate.

    PubMed

    Yu, Zilin; Jiang, Hongrui; Guo, Rongcan; Yang, Bo; You, Gang; Zhao, Mouming; Liu, Xiaoling

    2018-06-01

    Four umami peptides were separated and purified by ultrafiltration, gel filtration chromatography and identified by ultra-performance liquid chromatography tandem mass-spectrometry (UPLC-MS/MS), the amino acid sequences of four peptides are Val-Pro-Tyr (VPY), Thr-Ala-Tyr (TAY), Ala-Ala-Pro-Tyr (AAPY) and Gly-Phe-Pro (GFP). The result illustrates that the umami amino acids are not the content of umami peptides, but bitter amino acids are included. The threshold of VPY, TAY, AAPY and GFP were 1.65 mmol/L, 1.76 mmol/L, 2.97 mmol/L and 6.26 mmol/L, respectively. The peptide TAY, VPY and AAPY had an umami-enhancement effect on the monosodium glutamate (MSG) + sodium chloride (NaCl) solution, their concentrations were 2.5 g/L, 5 g/L and 5 g/L, respectively, while GFP has no significant umami-enhancement effect in solution. In addition, the peptides have better taste than its composing amino acids, which indicates that the taste of peptide does not depend on its composing amino acids. Copyright © 2018. Published by Elsevier Ltd.

  17. The nucleotide sequence of HLA-B{sup *}2704 reveals a new amino acid substitution in exon 4 which is also present in HLA-B{sup *}2706

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rudwaleit, M.; Bowness, P.; Wordsworth, P.

    1996-12-31

    The HLA-B27 subtype HLA-B{sup *}2704 is virtually absent in Caucasians but common in Orientals, where it is associated with ankylosing spondylitis. The amino acid sequence of HLA-B{sup *}2704 has been established by peptide mapping and was shown to differ by two amino acids from HLA-B{sup *}2705, HLA-B{sup *}2704 is characterized by a serine for aspartic acid substitution at position 77 and glutamic acid for valine at position 152. To date, however, no nucleotide sequence confirming these changes at the DNA level has been published. 13 refs., 2 figs.

  18. Nucleic acid arrays and methods of synthesis

    DOEpatents

    Sabanayagam, Chandran R.; Sano, Takeshi; Misasi, John; Hatch, Anson; Cantor, Charles

    2001-01-01

    The present invention generally relates to high density nucleic acid arrays and methods of synthesizing nucleic acid sequences on a solid surface. Specifically, the present invention contemplates the use of stabilized nucleic acid primer sequences immobilized on solid surfaces, and circular nucleic acid sequence templates combined with the use of isothermal rolling circle amplification to thereby increase nucleic acid sequence concentrations in a sample or on an array of nucleic acid sequences.

  19. Chameleon sequences in neurodegenerative diseases.

    PubMed

    Bahramali, Golnaz; Goliaei, Bahram; Minuchehr, Zarrin; Salari, Ali

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to "helix to strand (HE)", "helix to coil (HC)" and "strand to coil (CE)" alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases. Copyright © 2016 Elsevier Inc. All rights reserved.

  20. Chameleon sequences in neurodegenerative diseases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bahramali, Golnaz; Goliaei, Bahram, E-mail: goliaei@ut.ac.ir; Minuchehr, Zarrin, E-mail: minuchehr@nigeb.ac.ir

    2016-03-25

    Chameleon sequences can adopt either alpha helix sheet or a coil conformation. Defining chameleon sequences in PDB (Protein Data Bank) may yield to an insight on defining peptides and proteins responsible in neurodegeneration. In this research, we benefitted from the large PDB and performed a sequence analysis on Chameleons, where we developed an algorithm to extract peptide segments with identical sequences, but different structures. In order to find new chameleon sequences, we extracted a set of 8315 non-redundant protein sequences from the PDB with an identity less than 25%. Our data was classified to “helix to strand (HE)”, “helix tomore » coil (HC)” and “strand to coil (CE)” alterations. We also analyzed the occurrence of singlet and doublet amino acids and the solvent accessibility in the chameleon sequences; we then sorted out the proteins with the most number of chameleon sequences and named them Chameleon Flexible Proteins (CFPs) in our dataset. Our data revealed that Gly, Val, Ile, Tyr and Phe, are the major amino acids in Chameleons. We also found that there are proteins such as Insulin Degrading Enzyme IDE and GTP-binding nuclear protein Ran (RAN) with the most number of chameleons (640 and 405 respectively). These proteins have known roles in neurodegenerative diseases. Therefore it can be inferred that other CFP's can serve as key proteins in neurodegeneration, and a study on them can shed light on curing and preventing neurodegenerative diseases.« less

  1. NGSCheckMate: software for validating sample identity in next-generation sequencing studies within and across data types.

    PubMed

    Lee, Sejoon; Lee, Soohyun; Ouellette, Scott; Park, Woong-Yang; Lee, Eunjung A; Park, Peter J

    2017-06-20

    In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (>0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. https://github.com/parklab/NGSCheckMate. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Invasive cleavage of nucleic acids

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann D.; Dahlberg, James E.

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  3. Invasive cleavage of nucleic acids

    DOEpatents

    Prudent, James R.; Hall, Jeff G.; Lyamichev, Victor I.; Brow, Mary Ann D.; Dahlberg, James E.

    2002-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof.

  4. Single-cell sequencing unveils the lifestyle and CRISPR-based population history of Hydrotalea sp. in acid mine drainage.

    PubMed

    Medeiros, J D; Leite, L R; Pylro, V S; Oliveira, F S; Almeida, V M; Fernandes, G R; Salim, A C M; Araújo, F M G; Volpini, A C; Oliveira, G; Cuadros-Orellana, S

    2017-10-01

    Acid mine drainage (AMD) is characterized by an acid and metal-rich run-off that originates from mining systems. Despite having been studied for many decades, much remains unknown about the microbial community dynamics in AMD sites, especially during their early development, when the acidity is moderate. Here, we describe draft genome assemblies from single cells retrieved from an early-stage AMD sample. These cells belong to the genus Hydrotalea and are closely related to Hydrotalea flava. The phylogeny and average nucleotide identity analysis suggest that all single amplified genomes (SAGs) form two clades that may represent different strains. These cells have the genomic potential for denitrification, copper and other metal resistance. Two coexisting CRISPR-Cas loci were recovered across SAGs, and we observed heterogeneity in the population with regard to the spacer sequences, together with the loss of trailer-end spacers. Our results suggest that the genomes of Hydrotalea sp. strains studied here are adjusting to a quickly changing selective pressure at the microhabitat scale, and an important form of this selective pressure is infection by foreign DNA. © 2017 John Wiley & Sons Ltd.

  5. Comparative RNA-Sequence Transcriptome Analysis of Phenolic Acid Metabolism in Salvia miltiorrhiza, a Traditional Chinese Medicine Model Plant

    PubMed Central

    Song, Zhenqiao; Guo, Linlin; Liu, Tian; Lin, Caicai; Wang, Jianhua

    2017-01-01

    Salvia miltiorrhiza Bunge is an important traditional Chinese medicine (TCM). In this study, two S. miltiorrhiza genotypes (BH18 and ZH23) with different phenolic acid concentrations were used for de novo RNA sequencing (RNA-seq). A total of 170,787 transcripts and 56,216 unigenes were obtained. There were 670 differentially expressed genes (DEGs) identified between BH18 and ZH23, 250 of which were upregulated in ZH23, with genes involved in the phenylpropanoid biosynthesis pathway being the most upregulated genes. Nine genes involved in the lignin biosynthesis pathway were upregulated in BH18 and thus result in higher lignin content in BH18. However, expression profiles of most genes involved in the core common upstream phenylpropanoid biosynthesis pathway were higher in ZH23 than that in BH18. These results indicated that genes involved in the core common upstream phenylpropanoid biosynthesis pathway might play an important role in downstream secondary metabolism and demonstrated that lignin biosynthesis was a putative partially competing pathway with phenolic acid biosynthesis. The results of this study expanded our understanding of the regulation of phenolic acid biosynthesis in S. miltiorrhiza. PMID:28194403

  6. Genome Sequence of Sphingomonas wittichii DP58, the First Reported Phenazine-1-Carboxylic Acid-Degrading Strain

    PubMed Central

    Ma, Zhiwei; Shen, Xuemei; Wang, Wei; Peng, Huasong; Xu, Ping; Zhang, Xuehong

    2012-01-01

    Sphingomonas wittichii DP58 (CCTCC M 2012027), the first reported phenazine-1-carboxylic acid (PCA)-degrading strain, was isolated from pimiento rhizosphere soils. Here we present a 5.6-Mb assembly of its genome. This sequence would contribute to the elucidation of the molecular mechanism of PCA degradation to improve the antifungal's effectiveness or remove superfluous PCA. PMID:22689229

  7. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  8. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  9. Structural studies of polypeptides: Mechanism of immunoglobin catalysis and helix propagation in hybrid sequence, disulfide containing peptides

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Storrs, Richard Wood

    1992-08-01

    Catalytic immunoglobin fragments were studied Nuclear Magnetic Resonance spectroscopy to identify amino acid residues responsible for the catalytic activity. Small, hybrid sequence peptides were analyzed for helix propagation following covalent initiation and for activity related to the protein from which the helical sequence was derived. Hydrolysis of p-nitrophenyl carbonates and esters by specific immunoglobins is thought to involve charge complementarity. The pK of the transition state analog P-nitrophenyl phosphate bound to the immunoglobin fragment was determined by 31P-NMR to verify the juxtaposition of a positively charged amino acid to the binding/catalytic site. Optical studies of immunoglobin mediated photoreversal of cis,more » syn cyclobutane thymine dimers implicated tryptophan as the photosensitizing chromophore. Research shows the chemical environment of a single tryptophan residue is altered upon binding of the thymine dimer. This tryptophan residue was localized to within 20 Å of the binding site through the use of a nitroxide paramagnetic species covalently attached to the thymine dimer. A hybrid sequence peptide was synthesized based on the bee venom peptide apamin in which the helical residues of apamin were replaced with those from the recognition helix of the bacteriophage 434 repressor protein. Oxidation of the disufide bonds occured uniformly in the proper 1-11, 3-15 orientation, stabilizing the 434 sequence in an α-helix. The glycine residue stopped helix propagation. Helix propagation in 2,2,2-trifluoroethanol mixtures was investigated in a second hybrid sequence peptide using the apamin-derived disulfide scaffold and the S-peptide sequence. The helix-stop signal previously observed was not observed in the NMR NOESY spectrum. Helical connectivities were seen throughout the S-peptide sequence. The apamin/S-peptide hybrid binded to the S-protein (residues 21-166 of ribonuclease A) and reconstituted enzymatic activity.« less

  10. Structural studies of polypeptides: Mechanism of immunoglobin catalysis and helix propagation in hybrid sequence, disulfide containing peptides

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Storrs, R.W.

    1992-08-01

    Catalytic immunoglobin fragments were studied Nuclear Magnetic Resonance spectroscopy to identify amino acid residues responsible for the catalytic activity. Small, hybrid sequence peptides were analyzed for helix propagation following covalent initiation and for activity related to the protein from which the helical sequence was derived. Hydrolysis of p-nitrophenyl carbonates and esters by specific immunoglobins is thought to involve charge complementarity. The pK of the transition state analog P-nitrophenyl phosphate bound to the immunoglobin fragment was determined by [sup 31]P-NMR to verify the juxtaposition of a positively charged amino acid to the binding/catalytic site. Optical studies of immunoglobin mediated photoreversal ofmore » cis, syn cyclobutane thymine dimers implicated tryptophan as the photosensitizing chromophore. Research shows the chemical environment of a single tryptophan residue is altered upon binding of the thymine dimer. This tryptophan residue was localized to within 20 [Angstrom] of the binding site through the use of a nitroxide paramagnetic species covalently attached to the thymine dimer. A hybrid sequence peptide was synthesized based on the bee venom peptide apamin in which the helical residues of apamin were replaced with those from the recognition helix of the bacteriophage 434 repressor protein. Oxidation of the disufide bonds occured uniformly in the proper 1-11, 3-15 orientation, stabilizing the 434 sequence in an [alpha]-helix. The glycine residue stopped helix propagation. Helix propagation in 2,2,2-trifluoroethanol mixtures was investigated in a second hybrid sequence peptide using the apamin-derived disulfide scaffold and the S-peptide sequence. The helix-stop signal previously observed was not observed in the NMR NOESY spectrum. Helical connectivities were seen throughout the S-peptide sequence. The apamin/S-peptide hybrid binded to the S-protein (residues 21-166 of ribonuclease A) and reconstituted enzymatic activity.« less

  11. Amino acid "little Big Bang": representing amino acid substitution matrices as dot products of Euclidian vectors.

    PubMed

    Zimmermann, Karel; Gibrat, Jean-François

    2010-01-04

    Sequence comparisons make use of a one-letter representation for amino acids, the necessary quantitative information being supplied by the substitution matrices. This paper deals with the problem of finding a representation that provides a comprehensive description of amino acid intrinsic properties consistent with the substitution matrices. We present a Euclidian vector representation of the amino acids, obtained by the singular value decomposition of the substitution matrices. The substitution matrix entries correspond to the dot product of amino acid vectors. We apply this vector encoding to the study of the relative importance of various amino acid physicochemical properties upon the substitution matrices. We also characterize and compare the PAM and BLOSUM series substitution matrices. This vector encoding introduces a Euclidian metric in the amino acid space, consistent with substitution matrices. Such a numerical description of the amino acid is useful when intrinsic properties of amino acids are necessary, for instance, building sequence profiles or finding consensus sequences, using machine learning algorithms such as Support Vector Machine and Neural Networks algorithms.

  12. First draft genome sequencing of indole acetic acid producing and plant growth promoting fungus Preussia sp. BSL10.

    PubMed

    Khan, Abdul Latif; Asaf, Sajjad; Khan, Abdur Rahim; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung

    2016-05-10

    Preussia sp. BSL10, family Sporormiaceae, was actively producing phytohormone (indole-3-acetic acid) and extra-cellular enzymes (phosphatases and glucosidases). The fungus was also promoting the growth of arid-land tree-Boswellia sacra. Looking at such prospects of this fungus, we sequenced its draft genome for the first time. The Illumina based sequence analysis reveals an approximate genome size of 31.4Mbp for Preussia sp. BSL10. Based on ab initio gene prediction, total 32,312 coding sequences were annotated consisting of 11,967 coding genes, pseudogenes, and 221 tRNA genes. Furthermore, 321 carbohydrate-active enzymes were predicted and classified into many functional families. Copyright © 2016 Elsevier B.V. All rights reserved.

  13. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, M.S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device. 27 figs.

  14. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2003-08-19

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  15. Influence of the Amino Acid Sequence on Protein-Mineral Interactions in Soil

    NASA Astrophysics Data System (ADS)

    Chacon, S. S.; Reardon, P. N.; Purvine, S.; Lipton, M. S.; Washton, N.; Kleber, M.

    2017-12-01

    The intimate associations between protein and mineral surfaces have profound impacts on nutrient cycling in soil. Proteins are an important source of organic C and N, and a subset of proteins, extracellular enzymes (EE), can catalyze the depolymerization of soil organic matter (SOM). Our goal was to determine how variation in the amino acid sequence could influence a protein's susceptibility to become chemically altered by mineral surfaces to infer the fate of adsorbed EE function in soil. We hypothesized that (1) addition of charged amino acids would enhance the adsorption onto oppositely charged mineral surfaces (2) addition of aromatic amino acids would increase adsorption onto zero charged surfaces (3) Increase adsorption of modified proteins would enhance their susceptibility to alterations by redox active minerals. To test these hypotheses, we generated three engineered proxies of a model protein Gb1 (IEP 4.0, 6.2 kDA) by inserting either negatively charged, positively charged or aromatic amino acids in the second loop. These modified proteins were allowed to interact with functionally different mineral surfaces (goethite, montmorillonite, kaolinite and birnessite) at pH 5 and 7. We used LC-MS/MS and solution-state Heteronuclear Single Quantum Coherence Spectroscopy NMR to observe modifications on engineered proteins as a consequence to mineral interactions. Preliminary results indicate that addition of any amino acids to a protein increase its susceptibility to fragmentation and oxidation by redox active mineral surfaces, and alter adsorption to the other mineral surfaces. This suggest that not all mineral surfaces in soil may act as sorbents for EEs and chemical modification of their structure should also be considered as an explanation for decrease in EE activity. Fragmentation of proteins by minerals can bypass the need to produce proteases, but microbial acquisition of other nutrients that require enzymes such as cellulases, ligninases or phosphatases

  16. Amino acid sequences of ribosomal proteins S11 from Bacillus stearothermophilus and S19 from Halobacterium marismortui. Comparison of the ribosomal protein S11 family.

    PubMed

    Kimura, M; Kimura, J; Hatakeyama, T

    1988-11-21

    The complete amino acid sequences of ribosomal proteins S11 from the Gram-positive eubacterium Bacillus stearothermophilus and of S19 from the archaebacterium Halobacterium marismortui have been determined. A search for homologous sequences of these proteins revealed that they belong to the ribosomal protein S11 family. Homologous proteins have previously been sequenced from Escherichia coli as well as from chloroplast, yeast and mammalian ribosomes. A pairwise comparison of the amino acid sequences showed that Bacillus protein S11 shares 68% identical residues with S11 from Escherichia coli and a slightly lower homology (52%) with the homologous chloroplast protein. The halophilic protein S19 is more related to the eukaryotic (45-49%) than to the eubacterial counterparts (35%).

  17. Variation of amino acid sequences of serum amyloid a (SAA) and immunohistochemical analysis of amyloid a (AA) in Japanese domestic cats.

    PubMed

    Tei, Meina; Uchida, Kazuyuki; Chambers, James K; Watanabe, Ken-Ichi; Tamamoto, Takashi; Ohno, Koichi; Nakayama, Hiroyuki

    2018-02-02

    Amyloid A (AA) amyloidosis, a fatal systemic amyloid disease, occurs secondary to chronic inflammatory conditions in humans. Although persistently elevated serum amyloid A (SAA) levels are required for its pathogenesis, not all individuals with chronic inflammation necessarily develop AA amyloidosis. Furthermore, many diseases in cats are associated with the elevated production of SAA, whereas only a small number actually develop AA amyloidosis. We hypothesized that a genetic mutation in the SAA gene may strongly contribute to the pathogenesis of feline AA amyloidosis. In the present study, genomic DNA from four Japanese domestic cats (JDCs) with AA amyloidosis and from five without amyloidosis was analyzed using polymerase chain reaction (PCR) amplification and direct sequencing. We identified the novel variation combination of 45R-51A in the deduced amino acid sequences of four JDCs with amyloidosis and five without. However, there was no relationship between amino acid variations and the distribution of AA amyloid deposits, indicating that differences in SAA sequences do not contribute to the pathogenesis of AA amyloidosis. Immunohistochemical analysis using antisera against the three different parts of the feline SAA protein-i.e., the N-terminal, central, and C-terminal regions-revealed that feline AA contained the C-terminus, unlike human AA. These results indicate that the cleavage and degradation of the C-terminus are not essential for amyloid fibril formation in JDCs.

  18. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1999-10-26

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  19. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2001-06-05

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  20. Nucleic acid detection kits

    DOEpatents

    Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann; Kwiatkowski, Robert W.; Vavra, Stephanie H.

    2005-03-29

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of nucleic acid from various viruses in a sample.

  1. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  2. Method of increasing conversion of a fatty acid to its corresponding dicarboxylic acid

    DOEpatents

    Craft, David L.; Wilson, C. Ron; Eirich, Dudley; Zhang, Yeyan

    2004-09-14

    A nucleic acid sequence including a CYP promoter operably linked to nucleic acid encoding a heterologous protein is provided to increase transcription of the nucleic acid. Expression vectors and host cells containing the nucleic acid sequence are also provided. The methods and compositions described herein are especially useful in the production of polycarboxylic acids by yeast cells.

  3. Amino acid sequences of peptides from a tryptic digest of a urea-soluble protein fraction (U.S.3) from oxidized wool

    PubMed Central

    Corfield, M. C.; Fletcher, J. C.; Robson, A.

    1967-01-01

    1. A tryptic digest of the protein fraction U.S.3 from oxidized wool has been separated into 32 peptide fractions by cation-exchange resin chromatography. 2. Most of these fractions have been resolved into their component peptides by a combination of the techniques of cation-exchange resin chromatography, paper chromatography and paper electrophoresis. 3. The amino acid compositions of 58 of the peptides in the digest present in the largest amounts have been determined. 4. The amino acid sequences of 38 of these have been completely elucidated and those of six others partially derived. 5. These findings indicate that the parent protein in wool from which the protein fraction U.S.3 is derived has a minimum molecular weight of 74000. 6. The structures of wool proteins are discussed in the light of the peptide sequences determined, and, in particular, of those sequences in fraction U.S.3 that could not be elucidated. PMID:16742497

  4. An improved procedure, involving mass spectrometry, for N-terminal amino acid sequence determination of proteins which are N alpha-blocked.

    PubMed Central

    Rose, K; Kocher, H P; Blumberg, B M; Kolakofsky, D

    1984-01-01

    A modification to a previously described procedure [Gray & del Valle (1970) Biochemistry 9, 2134-2137; Rose, Simona & Offord (1983) Biochem. J. 215, 261-272] for mass-spectral identification of the N-terminal regions of proteins is shown to be useful in cases where the N-terminus is blocked. Three proteins were studied: vesicular-stomatitis-virus N protein, Sendai-virus NP protein, and a rabbit immunoglobulin lambda-light chain. These proteins, found to be blocked at the N-terminus with either the acetyl group or a pyroglutamic acid residue, had all failed to yield to attempted Edman degradation, in one case even after attempted enzymic removal of the pyroglutamic acid residue. The N-terminal regions of all three proteins were sequenced by using the new procedure. PMID:6421284

  5. The myoglobin of Emperor penguin (Aptenodytes forsteri): amino acid sequence and functional adaptation to extreme conditions.

    PubMed

    Tamburrini, M; Romano, M; Giardina, B; di Prisco, G

    1999-02-01

    In the framework of a study on molecular adaptations of the oxygen-transport and storage systems to extreme conditions in Antarctic marine organisms, we have investigated the structure/function relationship in Emperor penguin (Aptenodytes forsteri) myoglobin, in search of correlation with the bird life style. In contrast with previous reports, the revised amino acid sequence contains one additional residue and 15 differences. The oxygen-binding parameters seem well adapted to the diving behaviour of the penguin and to the environmental conditions of the Antarctic habitat. Addition of lactate has no major effect on myoglobin oxygenation over a large temperature range. Therefore, metabolic acidosis does not impair myoglobin function under conditions of prolonged physical effort, such as diving.

  6. Isolation and amino acid sequences of squirrel monkey (Saimiri sciurea) insulin and glucagon

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yu, Jinghua; Eng, J.; Yalow, R.S.

    1990-12-01

    It was reported two decades ago that insulin was not detectable in the glucose-stimulated state in Saimiri sciurea, the New World squirrel monkey, by a radioimmunoassay system developed with guinea pig anti-pork insulin antibody and labeled park insulin. With the same system, reasonable levels were observed in rhesus monkeys and chimpanzees. This suggested that New World monkeys, like the New World hystricomorph rodents such as the guinea pig and the coypu, might have insulins whose sequences differ markedly from those of Old World mammals. In this report the authors describe the purification and amino acid sequences of squirrel monkey insulinmore » and glucagon. They demonstrate that the substitutions at B29, B27, A2, A4, and A17 of squirrel monkey insulin are identical with those previously found in another New World primate, the owl monkey (Aotus trivirgatus). The immunologic cross-reactivity of this insulin in their immunoassay system is only a few percent of that of human insulin. It appears that the peptides of the New World monkeys have diverged less from those of the Old World mammals than have those of the New World hystricomorph rodents. The striking improvements in peptide purification and sequencing have the potential for adding new information concerning the evolutionary divergence of species.« less

  7. Cloning, sequencing, and expression of cDNA for human. beta. -glucuronidase

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oshima, A.; Kyle, J.W.; Miller, R.D.

    1987-02-01

    The authors report here the cDNA sequence for human placental ..beta..-glucuronidase (..beta..-D-glucuronoside glucuronosohydrolase, EC 3.2.1.31) and demonstrate expression of the human enzyme in transfected COS cells. They also sequenced a partial cDNA clone from human fibroblasts that contained a 153-base-pair deletion within the coding sequence and found a second type of cDNA clone from placenta that contained the same deletion. Nuclease S1 mapping studies demonstrated two types of mRNAs in human placenta that corresponded to the two types of cDNA clones isolated. The NH/sub 2/-terminal amino acid sequence determined for human spleen ..beta..-glucuronidase agreed with that inferred from the DNAmore » sequence of the two placental clones, beginning at amino acid 23, suggesting a cleaved signal sequence of 22 amino acids. When transfected into COS cells, plasmids containing either placental clone expressed an immunoprecipitable protein that contained N-linked oligosaccharides as evidenced by sensitivity to endoglycosidase F. However, only transfection with the clone containing the 153-base-pair segment led to expression of human ..beta..-glucuronidase activity. These studies provide the sequence for the full-length cDNA for human ..beta..-glucuronidase, demonstrate the existence of two populations of mRNA for ..beta..-glucuronidase in human placenta, only one of which specifies a catalytically active enzyme, and illustrate the importance of expression studies in verifying that a cDNA is functionally full-length.« less

  8. Amino acid sequences of the ribosomal proteins HL30 and HmaL5 from the archaebacterium Halobacterium marismortui.

    PubMed

    Hatakeyama, T; Hatakeyama, T

    1990-07-06

    The complete amino acid sequences of the ribosomal proteins HL30 and HmaL5 from the archaebacterium Halobacterium marismortui were determined. Protein HL30 was found to be acetylated at its N-terminal amino acid and shows homology to the eukaryotic ribosomal proteins YL34 from yeast and RL31 from rat. Protein HmaL5 was homologous to the protein L5 from Escherichia coli and Bacillus stearothermophilus as well as to YL16 from yeast. HmaL5 shows more similarities to its eukaryotic counterpart than to eubacterial ones.

  9. A putative carbohydrate-binding domain of the lactose-binding Cytisus sessilifolius anti-H(O) lectin has a similar amino acid sequence to that of the L-fucose-binding Ulex europaeus anti-H(O) lectin.

    PubMed

    Konami, Y; Yamamoto, K; Osawa, T; Irimura, T

    1995-04-01

    The complete amino acid sequence of a lactose-binding Cytisus sessilifolius anti-H(O) lectin II (CSA-II) was determined using a protein sequencer. After digestion of CSA-II with endoproteinase Lys-C or Asp-N, the resulting peptides were purified by reversed-phase high performance liquid chromatography (HPLC) and then subjected to sequence analysis. Comparison of the complete amino acid sequence of CSA-II with the sequences of other leguminous seed lectins revealed regions of extensive homology. The amino acid sequence of a putative carbohydrate-binding domain of CSA-II was found to be similar to those of several anti-H(O) leguminous lectins, especially to that of the L-fucose-binding Ulex europaeus lectin I (UEA-I).

  10. Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.

    PubMed

    Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami

    2012-08-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or <0.5% or >15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.

  11. Multiplex, Rapid, and Sensitive Isothermal Detection of Nucleic-Acid Sequence by Endonuclease Restriction-Mediated Real-Time Multiple Cross Displacement Amplification.

    PubMed

    Wang, Yi; Wang, Yan; Zhang, Lu; Liu, Dongxin; Luo, Lijuan; Li, Hua; Cao, Xiaolong; Liu, Kai; Xu, Jianguo; Ye, Changyun

    2016-01-01

    We have devised a novel isothermal amplification technology, termed endonuclease restriction-mediated real-time multiple cross displacement amplification (ET-MCDA), which facilitated multiplex, rapid, specific and sensitive detection of nucleic-acid sequences at a constant temperature. The ET-MCDA integrated multiple cross displacement amplification strategy, restriction endonuclease cleavage and real-time fluorescence detection technique. In the ET-MCDA system, the functional cross primer E-CP1 or E-CP2 was constructed by adding a short sequence at the 5' end of CP1 or CP2, respectively, and the new E-CP1 or E-CP2 primer was labeled at the 5' end with a fluorophore and in the middle with a dark quencher. The restriction endonuclease Nb.BsrDI specifically recognized the short sequence and digested the newly synthesized double-stranded terminal sequences (5' end short sequences and their complementary sequences), which released the quenching, resulting on a gain of fluorescence signal. Thus, the ET-MCDA allowed real-time detection of single or multiple targets in only a single reaction, and the positive results were observed in as short as 12 min, detecting down to 3.125 fg of genomic DNA per tube. Moreover, the analytical specificity and the practical application of the ET-MCDA were also successfully evaluated in this study. Here, we provided the details on the novel ET-MCDA technique and expounded the basic ET-MCDA amplification mechanism.

  12. Solid-Phase Nucleic Acid Sequence-Based Amplification and Length-Scale Effects during RNA Amplification.

    PubMed

    Ma, Youlong; Teng, Feiyue; Libera, Matthew

    2018-06-05

    Solid-phase oligonucleotide amplification is of interest because of possible applications to next-generation sequencing, multiplexed microarray-based detection, and cell-free synthetic biology. Its efficiency is, however, less than that of traditional liquid-phase amplification involving unconstrained primers and enzymes, and understanding how to optimize the solid-phase amplification process remains challenging. Here, we demonstrate the concept of solid-phase nucleic acid sequence-based amplification (SP-NASBA) and use it to study the effect of tethering density on amplification efficiency. SP-NASBA involves two enzymes, avian myeloblastosis virus reverse transcriptase (AMV-RT) and RNase H, to convert tethered forward and reverse primers into tethered double-stranded DNA (ds-DNA) bridges from which RNA - amplicons can be generated by a third enzyme, T7 RNA polymerase. We create microgels on silicon surfaces using electron-beam patterning of thin-film blends of hydroxyl-terminated and biotin-terminated poly(ethylene glycol) (PEG-OH, PEG-B). The tethering density is linearly related to the PEG-B concentration, and biotinylated primers and molecular beacon detection probes are tethered to streptavidin-activated microgels. While SP-NASBA is very efficient at low tethering densities, the efficiency decreases dramatically with increasing tethering density due to three effects: (a) a reduced hybridization efficiency of tethered molecular beacon detection probes; (b) a decrease in T7 RNA polymerase efficiency; (c) inhibition of T7 RNA polymerase activity by AMV-RT.

  13. A Single Molecular Beacon Probe Is Sufficient for the Analysis of Multiple Nucleic Acid Sequences

    PubMed Central

    Gerasimova, Yulia V.; Hayson, Aaron; Ballantyne, Jack; Kolpashchikov, Dmitry M.

    2010-01-01

    Molecular beacon (MB) probes are dual-labeled hairpin-shaped oligodeoxyribonucleotides that are extensively used for real-time detection of specific RNA/DNA analytes. In the MB probe, the loop fragment is complementary to the analyte: therefore, a unique probe is required for the analysis of each new analyte sequence. The conjugation of an oligonucleotide with two dyes and subsequent purification procedures add to the cost of MB probes, thus reducing their application in multiplex formats. Here we demonstrate how one MB probe can be used for the analysis of an arbitrary nucleic acid. The approach takes advantage of two oligonucleotide adaptor strands, each of which contains a fragment complementary to the analyte and a fragment complementary to an MB probe. The presence of the analyte leads to association of MB probe and the two DNA strands in quadripartite complex. The MB probe fluorescently reports the formation of this complex. In this design, the MB does not bind the analyte directly; therefore, the MB sequence is independent of the analyte. In this study one universal MB probe was used to genotype three human polymorphic sites. This approach promises to reduce the cost of multiplex real-time assays and improve the accuracy of single-nucleotide polymorphism genotyping. PMID:20665615

  14. Dna Sequencing

    DOEpatents

    Tabor, Stanley; Richardson, Charles C.

    1995-04-25

    A method for sequencing a strand of DNA, including the steps off: providing the strand of DNA; annealing the strand with a primer able to hybridize to the strand to give an annealed mixture; incubating the mixture with four deoxyribonucleoside triphosphates, a DNA polymerase, and at least three deoxyribonucleoside triphosphates in different amounts, under conditions in favoring primer extension to form nucleic acid fragments complementory to the DNA to be sequenced; labelling the nucleic and fragments; separating them and determining the position of the deoxyribonucleoside triphosphates by differences in the intensity of the labels, thereby to determine the DNA sequence.

  15. Sequence of a cDNA encoding pancreatic preprosomatostatin-22.

    PubMed Central

    Magazin, M; Minth, C D; Funckes, C L; Deschenes, R; Tavianini, M A; Dixon, J E

    1982-01-01

    We report the nucleotide sequence of a precursor to somatostatin that upon proteolytic processing may give rise to a hormone of 22 amino acids. The nucleotide sequence of a cDNA from the channel catfish (Ictalurus punctatus) encodes a precursor to somatostatin that is 105 amino acids (Mr, 11,500). The cDNA coding for somatostatin-22 consists of 36 nucleotides in the 5' untranslated region, 315 nucleotides that code for the precursor to somatostatin-22, 269 nucleotides at the 3' untranslated region, and a variable length of poly(A). The putative preprohormone contains a sequence of hydrophobic amino acids at the amino terminus that has the properties of a "signal" peptide. A connecting sequence of approximately 57 amino acids is followed by a single Arg-Arg sequence, which immediately precedes the hormone. Somatostatin-22 is homologous to somatostatin-14 in 7 of the 14 amino acids, including the Phe-Trp-Lys sequence. Hybridization selection of mRNA, followed by its translation in a wheat germ cell-free system, resulted in the synthesis of a single polypeptide having a molecular weight of approximately 10,000 as estimated on Na-DodSO4/polyacrylamide gels. Images PMID:6127673

  16. An artificial intelligence approach fit for tRNA gene studies in the era of big sequence data.

    PubMed

    Iwasaki, Yuki; Abe, Takashi; Wada, Kennosuke; Wada, Yoshiko; Ikemura, Toshimichi

    2017-09-12

    Unsupervised data mining capable of extracting a wide range of knowledge from big data without prior knowledge or particular models is a timely application in the era of big sequence data accumulation in genome research. By handling oligonucleotide compositions as high-dimensional data, we have previously modified the conventional self-organizing map (SOM) for genome informatics and established BLSOM, which can analyze more than ten million sequences simultaneously. Here, we develop BLSOM specialized for tRNA genes (tDNAs) that can cluster (self-organize) more than one million microbial tDNAs according to their cognate amino acid solely depending on tetra- and pentanucleotide compositions. This unsupervised clustering can reveal combinatorial oligonucleotide motifs that are responsible for the amino acid-dependent clustering, as well as other functionally and structurally important consensus motifs, which have been evolutionarily conserved. BLSOM is also useful for identifying tDNAs as phylogenetic markers for special phylotypes. When we constructed BLSOM with 'species-unknown' tDNAs from metagenomic sequences plus 'species-known' microbial tDNAs, a large portion of metagenomic tDNAs self-organized with species-known tDNAs, yielding information on microbial communities in environmental samples. BLSOM can also enhance accuracy in the tDNA database obtained from big sequence data. This unsupervised data mining should become important for studying numerous functionally unclear RNAs obtained from a wide range of organisms.

  17. RoboOligo: software for mass spectrometry data to support manual and de novo sequencing of post-transcriptionally modified ribonucleic acids

    PubMed Central

    Sample, Paul J.; Gaston, Kirk W.; Alfonzo, Juan D.; Limbach, Patrick A.

    2015-01-01

    Ribosomal ribonucleic acid (RNA), transfer RNA and other biological or synthetic RNA polymers can contain nucleotides that have been modified by the addition of chemical groups. Traditional Sanger sequencing methods cannot establish the chemical nature and sequence of these modified-nucleotide containing oligomers. Mass spectrometry (MS) has become the conventional approach for determining the nucleotide composition, modification status and sequence of modified RNAs. Modified RNAs are analyzed by MS using collision-induced dissociation tandem mass spectrometry (CID MS/MS), which produces a complex dataset of oligomeric fragments that must be interpreted to identify and place modified nucleosides within the RNA sequence. Here we report the development of RoboOligo, an interactive software program for the robust analysis of data generated by CID MS/MS of RNA oligomers. There are three main functions of RoboOligo: (i) automated de novo sequencing via the local search paradigm. (ii) Manual sequencing with real-time spectrum labeling and cumulative intensity scoring. (iii) A hybrid approach, coined ‘variable sequencing’, which combines the user intuition of manual sequencing with the high-throughput sampling of automated de novo sequencing. PMID:25820423

  18. Graphene for amino acid biosensing: Theoretical study of the electronic transport

    NASA Astrophysics Data System (ADS)

    Rodríguez, S. J.; Makinistian, L.; Albanesi, E. A.

    2017-10-01

    The study of biosensors based on graphene has increased in the last years, the combination of excellent electrical properties and low noise makes graphene a material for next generation electronic devices. This work discusses the application of a graphene-based biosensor for the detection of amino acids histidine (His), alanine (Ala), aspartic acid (Asp), and tyrosine (Tyr). First, we present the results of modeling from first principles the adsorption of the four amino acids on a graphene sheet, we calculate adsorption energy, substrate-adsorbate distance, equilibrium geometrical configurations (upon relaxation) and densities of states (DOS) for each biomolecule adsorbed. Furthermore, in order to evaluate the effects of amino acid adsorption on the electronic transport of graphene, we modeled a device using first-principles calculations with a combination of Density Functional Theory (DFT) and Nonequilibrium Greens Functions (NEGF). We provide with a detailed discussion in terms of transmission, current-voltage curves, and charge transfer. We found evidence of differences in the electronic transport through the graphene sheet due to amino acid adsorption, reinforcing the possibility of graphene-based sensors for amino acid sequencing of proteins.

  19. Random Amplification and Pyrosequencing for Identification of Novel Viral Genome Sequences

    PubMed Central

    Hang, Jun; Forshey, Brett M.; Kochel, Tadeusz J.; Li, Tao; Solórzano, Víctor Fiestas; Halsey, Eric S.; Kuschner, Robert A.

    2012-01-01

    ssRNA viruses have high levels of genomic divergence, which can lead to difficulty in genomic characterization of new viruses using traditional PCR amplification and sequencing methods. In this study, random reverse transcription, anchored random PCR amplification, and high-throughput pyrosequencing were used to identify orthobunyavirus sequences from total RNA extracted from viral cultures of acute febrile illness specimens. Draft genome sequence for the orthobunyavirus L segment was assembled and sequentially extended using de novo assembly contigs from pyrosequencing reads and orthobunyavirus sequences in GenBank as guidance. Accuracy and continuous coverage were achieved by mapping all reads to the L segment draft sequence. Subsequently, RT-PCR and Sanger sequencing were used to complete the genome sequence. The complete L segment was found to be 6936 bases in length, encoding a 2248-aa putative RNA polymerase. The identified L segment was distinct from previously published South American orthobunyaviruses, sharing 63% and 54% identity at the nucleotide and amino acid level, respectively, with the complete Oropouche virus L segment and 73% and 81% identity at the nucleotide and amino acid level, respectively, with a partial Caraparu virus L segment. The result demonstrated the effectiveness of a sequence-independent amplification and next-generation sequencing approach for obtaining complete viral genomes from total nucleic acid extracts and its use in pathogen discovery. PMID:22468136

  20. Complete amino acid sequences of the ribosomal proteins L25, L29 and L31 from the archaebacterium Halobacterium marismortui.

    PubMed

    Hatakeyama, T; Kimura, M

    1988-03-15

    Ribosomal proteins were extracted from 50S ribosomal subunits of the archaebacterium Halobacterium marismortui by decreasing the concentration of Mg2+ and K+, and the proteins were separated and purified by ion-exchange column chromatography on DEAE-cellulose. Ten proteins were purified to homogeneity and three of these proteins were subjected to sequence analysis. The complete amino acid sequences of the ribosomal proteins L25, L29 and L31 were established by analyses of the peptides obtained by enzymatic digestion with trypsin, Staphylococcus aureus protease, chymotrypsin and lysylendopeptidase. Proteins L25, L29 and L31 consist of 84, 115 and 95 amino acid residues with the molecular masses of 9472 Da, 12293 Da and 10418 Da respectively. A comparison of their sequences with those of other large-ribosomal-subunit proteins from other organisms revealed that protein L25 from H. marismortui is homologous to protein L23 from Escherichia coli (34.6%), Bacillus stearothermophilus (41.8%), and tobacco chloroplasts (16.3%) as well as to protein L25 from yeast (38.0%). Proteins L29 and L31 do not appear to be homologous to any other ribosomal proteins whose structures are so far known.

  1. Amino-terminal sequence of glycoprotein D of herpes simplex virus types 1 and 2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eisenberg, R.J.; Long, D.; Hogue-Angeletti, R.

    1984-01-01

    Glycoprotein D (gD) of herpes simplex virus is a structural component of the virion envelope which stimulates production of high titers of herpes simplex virus type-common neutralizing antibody. The authors caried out automated N-terminal amino acid sequencing studies on radiolabeled preparations of gD-1 (gD of herpes simplex virus type 1) and gD-2 (gD of herpes simplex virus type 2). Although some differences were noted, particularly in the methionine and alanine profiles for gD-1 and gD-2, the amino acid sequence of a number of the first 30 residues of the amino terminus of gD-1 and gD-2 appears to be quite similar.more » For both proteins, the first residue is a lysine. When we compared out sequence data for gD-1 with those predicted by nucleic acid sequencing, the two sequences could be aligned (with one exception) starting at residue 26 (lysine) of the predicted sequence. Thus, the first 25 amino acids of the predicted sequence are absent from the polypeptides isolated from infected cells.« less

  2. Sequence signatures of allosteric proteins towards rational design.

    PubMed

    Namboodiri, Saritha; Verma, Chandra; Dhar, Pawan K; Giuliani, Alessandro; Nair, Achuthsankar S

    2010-12-01

    Allostery is the phenomenon of changes in the structure and activity of proteins that appear as a consequence of ligand binding at sites other than the active site. Studying mechanistic basis of allostery leading to protein design with predetermined functional endpoints is an important unmet need of synthetic biology. Here, we screened the amino acid sequence landscape in search of sequence-signatures of allostery using Recurrence Quantitative Analysis (RQA) method. A characteristic vector, comprised of 10 features extracted from RQA was defined for amino acid sequences. Using Principal Component Analysis, four factors were found to be important determinants of allosteric behavior. Our sequence-based predictor method shows 82.6% accuracy, 85.7% sensitivity and 77.9% specificity with the current dataset. Further, we show that Laminarity-Mean-hydrophobicity representing repeated hydrophobic patches is the most crucial indicator of allostery. To our best knowledge this is the first report that describes sequence determinants of allostery based on hydrophobicity. As an outcome of these findings, we plan to explore possibility of inducing allostery in proteins.

  3. Whole-Genome Sequence Analysis of Bombella intestini LMG 28161T, a Novel Acetic Acid Bacterium Isolated from the Crop of a Red-Tailed Bumble Bee, Bombus lapidarius.

    PubMed

    Li, Leilei; Illeghems, Koen; Van Kerrebroeck, Simon; Borremans, Wim; Cleenwerck, Ilse; Smagghe, Guy; De Vuyst, Luc; Vandamme, Peter

    2016-01-01

    The whole-genome sequence of Bombella intestini LMG 28161T, an endosymbiotic acetic acid bacterium (AAB) occurring in bumble bees, was determined to investigate the molecular mechanisms underlying its metabolic capabilities. The draft genome sequence of B. intestini LMG 28161T was 2.02 Mb. Metabolic carbohydrate pathways were in agreement with the metabolite analyses of fermentation experiments and revealed its oxidative capacity towards sucrose, D-glucose, D-fructose and D-mannitol, but not ethanol and glycerol. The results of the fermentation experiments also demonstrated that the lack of effective aeration in small-scale carbohydrate consumption experiments may be responsible for the lack of reproducibility of such results in taxonomic studies of AAB. Finally, compared to the genome sequences of its nearest phylogenetic neighbor and of three other insect associated AAB strains, the B. intestini LMG 28161T genome lost 69 orthologs and included 89 unique genes. Although many of the latter were hypothetical they also included several type IV secretion system proteins, amino acid transporter/permeases and membrane proteins which might play a role in the interaction with the bumble bee host.

  4. Identification of metal ion binding sites based on amino acid sequences.

    PubMed

    Cao, Xiaoyong; Hu, Xiuzhen; Zhang, Xiaojin; Gao, Sujuan; Ding, Changjiang; Feng, Yonge; Bao, Weihua

    2017-01-01

    The identification of metal ion binding sites is important for protein function annotation and the design of new drug molecules. This study presents an effective method of analyzing and identifying the binding residues of metal ions based solely on sequence information. Ten metal ions were extracted from the BioLip database: Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+ and Co2+. The analysis showed that Zn2+, Cu2+, Fe2+, Fe3+, and Co2+ were sensitive to the conservation of amino acids at binding sites, and promising results can be achieved using the Position Weight Scoring Matrix algorithm, with an accuracy of over 79.9% and a Matthews correlation coefficient of over 0.6. The binding sites of other metals can also be accurately identified using the Support Vector Machine algorithm with multifeature parameters as input. In addition, we found that Ca2+ was insensitive to hydrophobicity and hydrophilicity information and Mn2+ was insensitive to polarization charge information. An online server was constructed based on the framework of the proposed method and is freely available at http://60.31.198.140:8081/metal/HomePage/HomePage.html.

  5. Identification of metal ion binding sites based on amino acid sequences

    PubMed Central

    Cao, Xiaoyong; Zhang, Xiaojin; Gao, Sujuan; Ding, Changjiang; Feng, Yonge; Bao, Weihua

    2017-01-01

    The identification of metal ion binding sites is important for protein function annotation and the design of new drug molecules. This study presents an effective method of analyzing and identifying the binding residues of metal ions based solely on sequence information. Ten metal ions were extracted from the BioLip database: Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+ and Co2+. The analysis showed that Zn2+, Cu2+, Fe2+, Fe3+, and Co2+ were sensitive to the conservation of amino acids at binding sites, and promising results can be achieved using the Position Weight Scoring Matrix algorithm, with an accuracy of over 79.9% and a Matthews correlation coefficient of over 0.6. The binding sites of other metals can also be accurately identified using the Support Vector Machine algorithm with multifeature parameters as input. In addition, we found that Ca2+ was insensitive to hydrophobicity and hydrophilicity information and Mn2+ was insensitive to polarization charge information. An online server was constructed based on the framework of the proposed method and is freely available at http://60.31.198.140:8081/metal/HomePage/HomePage.html. PMID:28854211

  6. Automated Sanger Analysis Pipeline (ASAP): A Tool for Rapidly Analyzing Sanger Sequencing Data with Minimum User Interference.

    PubMed

    Singh, Aditya; Bhatia, Prateek

    2016-12-01

    Sanger sequencing platforms, such as applied biosystems instruments, generate chromatogram files. Generally, for 1 region of a sequence, we use both forward and reverse primers to sequence that area, in that way, we have 2 sequences that need to be aligned and a consensus generated before mutation detection studies. This work is cumbersome and takes time, especially if the gene is large with many exons. Hence, we devised a rapid automated command system to filter, build, and align consensus sequences and also optionally extract exonic regions, translate them in all frames, and perform an amino acid alignment starting from raw sequence data within a very short time. In full capabilities of Automated Mutation Analysis Pipeline (ASAP), it is able to read "*.ab1" chromatogram files through command line interface, convert it to the FASTQ format, trim the low-quality regions, reverse-complement the reverse sequence, create a consensus sequence, extract the exonic regions using a reference exonic sequence, translate the sequence in all frames, and align the nucleic acid and amino acid sequences to reference nucleic acid and amino acid sequences, respectively. All files are created and can be used for further analysis. ASAP is available as Python 3.x executable at https://github.com/aditya-88/ASAP. The version described in this paper is 0.28.

  7. Assignment of fatty acid-beta-oxidizing syntrophic bacteria to Syntrophomonadaceae fam. nov. on the basis of 16S rRNA sequence analyses

    NASA Technical Reports Server (NTRS)

    Zhao, H.; Yang, D.; Woese, C. R.; Bryant, M. P.

    1993-01-01

    After enrichment from Chinese rural anaerobic digestor sludge, anaerobic, sporing and nonsporing, saturated fatty acid-beta-oxidizing syntrophic bacteria were isolated as cocultures with H2- and formate-utilizing Methanospirillum hungatei or Desulfovibrio sp. strain G-11. The syntrophs degraded C4 to C8 saturated fatty acids, including isobutyrate and 2-methylbutyrate. They were adapted to grow on crotonate and were isolated as pure cultures. The crotonate-grown pure cultures alone did not grow on butyrate in either the presence or the absence of some common electron acceptors. However, when they were reconstituted with M. hungatei, growth on butyrate again occurred. In contrast, crotonate-grown Clostridium kluyveri and Clostridium sticklandii, as well as Clostridium sporogenes, failed to grow on butyrate when these organisms were cocultured with M. hungatei. The crotonate-grown pure subcultures of the syntrophs described above were subjected to 16S rRNA sequence analysis. Several previously documented fatty acid-beta-oxidizing syntrophs grown in pure cultures with crotonate were also subjected to comparative sequence analyses. The sequence analyses revealed that the new sporing and nonsporing isolates and other syntrophs that we sequenced, which had either gram-negative or gram-positive cell wall ultrastructure, all belonged to the phylogenetically gram-positive phylum. They were not closely related to any of the previously known subdivisions in the gram-positive phylum with which they were compared, but were closely related to each other, forming a new subdivision in the phylum. We recommend that this group be designated Syntrophomonadaceae fam. nov.; a description is given.

  8. Comparative study of IDH1 mutations in gliomas by immunohistochemistry and DNA sequencing.

    PubMed

    Agarwal, Shipra; Sharma, Mehar Chand; Jha, Prerana; Pathak, Pankaj; Suri, Vaishali; Sarkar, Chitra; Chosdol, Kunzang; Suri, Ashish; Kale, Shashank Sharad; Mahapatra, Ashok Kumar; Jha, Pankaj

    2013-06-01

    Mutations involving isocitrate dehydrogenase 1 (IDH 1) occur in a high proportion of diffuse gliomas, with implications on diagnosis and prognosis. About 90% involve exon 4 at codon 132, replacing amino acid arginine with histidine (R132H). Rarer ones include R132C, R132S, R132G, R132L, R132V, and R132P. Most authors have used DNA-based methods to assess IDH1 status. Preliminary studies comparing imunohistochemistry (IHC) with IDH1-R132H mutation-specific antibodies have shown concordance with DNA sequencing and no cross-reactivity with wild-type IDH1 or other mutant proteins. The present study compares results of IHC with DNA sequencing in diffuse gliomas. Fifty diffuse gliomas with frozen tissue samples for DNA sequencing and adequate tissue in paraffin blocks for IHC using IDH1-R132H specific antibody were assessed for IDH1 mutations. Concordance of findings between IHC and DNA sequencing was noted in 88% (44/50) cases. All 6 cases with discrepancy were immunopositive with DIA-H09 antibody. While in 3 of these 6 cases, DNA sequencing failed to reveal any mutations, R132L (arginine replaced by leucine) mutation was found in the rest 3 cases. Interestingly, of the immunopositive cases, 46.6% (14/30) showed immunostaining in only a fraction of tumor cells. IHC is an easy and quick method of detecting IDH1-R132H mutations, but there may be some discrepancies between IHC and DNA sequencing. Although there were no false-negative cases, cross-reactivity with IDH1-R132L was seen in 3, a finding not reported thus far. Because of more universal availability of IHC over genetic testing, cross-reactivity and staining heterogeneity may have bearing over its use in detecting IDH1-R132H mutation in gliomas.

  9. Comprehensive genetic study of fatty acids helps explain the role of noncoding inflammatory bowel disease associated SNPs and fatty acid metabolism in disease pathogenesis.

    PubMed

    Jezernik, Gregor; Potočnik, Uroš

    2018-03-01

    Fatty acids and their derivatives play an important role in inflammation. Diet and genetics influence fatty acid profiles. Abnormalities of fatty acid profiles have been observed in inflammatory bowel diseases (IBD), a group of complex diseases defined by chronic gastrointestinal inflammation. IBD associated fatty acid profile abnormalities were observed independently of nutritional status or disease activity, suggesting a common genetic background. However, no study so far has attempted to look for overlap between IBD loci and fatty acid associated loci or investigate the genetics of fatty acid profiles in IBD. To this end, we conducted a comprehensive genetic study of fatty acid profiles in IBD using iCHIP, a custom microarray platform designed for deep sequencing of immune-mediated disease associated loci. This study identifies 10 loci associated with fatty acid profiles in IBD. The most significant associations were a locus near CBS (p = 7.62 × 10 -8 ) and a locus in LRRK2 (p = 1.4 × 10 -7 ). Of note, this study replicates the FADS gene cluster locus, previously associated with both fatty acid profiles and IBD pathogenesis. Furthermore, we identify 18 carbon chain trans-fatty acids (p = 1.12 × 10 -3 ), total trans-fatty acids (p = 4.49 × 10 -3 ), palmitic acid (p = 5.85 × 10 -3 ) and arachidonic acid (p = 8.58 × 10 -3 ) as significantly associated with IBD pathogenesis. Copyright © 2018 Elsevier Ltd. All rights reserved.

  10. Parameters of proteome evolution from histograms of amino-acid sequence identities of paralogous proteins

    PubMed Central

    Axelsen, Jacob Bock; Yan, Koon-Kiu; Maslov, Sergei

    2007-01-01

    Background The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic parameters of these three basic processes is contained in the proteome-wide distribution of sequence identities of pairs of paralogous proteins. Results We introduce a simple mathematical framework based on a stochastic birth-and-death model that allows one to extract some of this information and apply it to the set of all pairs of paralogous proteins in H. pylori, E. coli, S. cerevisiae, C. elegans, D. melanogaster, and H. sapiens. It was found that the histogram of sequence identities p generated by an all-to-all alignment of all protein sequences encoded in a genome is well fitted with a power-law form ~ p-γ with the value of the exponent γ around 4 for the majority of organisms used in this study. This implies that the intra-protein variability of substitution rates is best described by the Gamma-distribution with the exponent α ≈ 0.33. Different features of the shape of such histograms allow us to quantify the ratio between the genome-wide average deletion/duplication rates and the amino-acid substitution rate. Conclusion We separately measure the short-term ("raw") duplication and deletion rates rdup∗, rdel∗ which include gene copies that will be removed soon after the duplication event and their dramatically reduced long-term counterparts rdup, rdel. High deletion rate among recently duplicated proteins is consistent with a scenario in which they didn't have enough time to significantly change their functional roles and thus are to a large degree disposable. Systematic trends of each of the four duplication/deletion rates with the total number of genes in the genome were analyzed. All but the deletion rate of recent duplicates rdel∗ were shown to systematically increase with Ngenes. Abnormally flat shapes

  11. Nucleotide sequence and regulatory studies of VGF, a nervous system-specific mRNA that is rapidly and relatively selectively induced by nerve growth factor.

    PubMed

    Salton, S R

    1991-09-01

    A nervous system-specific mRNA that is rapidly induced in PC12 cells to a greater extent by nerve growth factor (NGF) than by epidermal growth factor treatment has been cloned. The polypeptide deduced from the nucleic acid sequence of the NGF33.1 cDNA clone contains regions of amino acid sequence identity with that predicted by the cDNA clone VGF, and further analysis suggests that both NGF33.1 and VGF cDNA clones very likely correspond to the same mRNA (VGF). In this report both the nucleic acid sequence that corresponds to VGF mRNA and the polypeptide predicted by the NGF33.1 cDNA clone are presented. Genomic Southern analysis and database comparison did not detect additional sequences with high homology to the VGF gene. Induction of VGF mRNA by depolarization and phorbol 12-myristate 13-acetate treatment was greater than by serum stimulation or protein kinase A pathway activation. These studies suggest that VGF mRNA is induced to the greatest extent by NGF treatment and that VGF is one of the most rapidly regulated neuronal mRNAs identified in PC12 cells.

  12. Human somatostatin I: sequence of the cDNA.

    PubMed Central

    Shen, L P; Pictet, R L; Rutter, W J

    1982-01-01

    RNA has been isolated from a human pancreatic somatostatinoma and used to prepare a cDNA library. After prescreening, clones containing somatostatin I sequences were identified by hybridization with an anglerfish somatostatin I-cloned cDNA probe. From the nucleotide sequence of two of these clones, we have deduced an essentially full-length mRNA sequence, including the preprosomatostatin coding region, 105 nucleotides from the 5' untranslated region and the complete 150-nucleotide 3' untranslated region. The coding region predicts a 116-amino acid precursor protein (Mr, 12.727) that contains somatostatin-14 and -28 at its COOH terminus. The predicted amino acid sequence of human somatostatin-28 is identical to that of somatostatin-28 isolated from the porcine and ovine species. A comparison of the amino acid sequences of human and anglerfish preprosomatostatin I indicated that the COOH-terminal region encoding somatostatin-14 and the adjacent 6 amino acids are highly conserved, whereas the remainder of the molecule, including the signal peptide region, is more divergent. However, many of the amino acid differences found in the pro region of the human and anglerfish proteins are conservative changes. This suggests that the propeptides have a similar secondary structure, which in turn may imply a biological function for this region of the molecule. Images PMID:6126875

  13. A Single Electrochemical Probe Used for Analysis of Multiple Nucleic Acid Sequences

    PubMed Central

    Mills, Dawn M.; Calvo-Marzal, Percy; Pinzon, Jeffer M.; Armas, Stephanie; Kolpashchikov, Dmitry M.; Chumbimuni-Torres, Karin Y.

    2017-01-01

    Electrochemical hybridization sensors have been explored extensively for analysis of specific nucleic acids. However, commercialization of the platform is hindered by the need for attachment of separate oligonucleotide probes complementary to a RNA or DNA target to an electrode’s surface. Here we demonstrate that a single probe can be used to analyze several nucleic acid targets with high selectivity and low cost. The universal electrochemical four-way junction (4J)-forming (UE4J) sensor consists of a universal DNA stem-loop (USL) probe attached to the electrode’s surface and two adaptor strands (m and f) which hybridize to the USL probe and the analyte to form a 4J associate. The m adaptor strand was conjugated with a methylene blue redox marker for signal ON sensing and monitored using square wave voltammetry. We demonstrated that a single sensor can be used for detection of several different DNA/RNA sequences and can be regenerated in 30 seconds by a simple water rinse. The UE4J sensor enables a high selectivity by recognition of a single base substitution, even at room temperature. The UE4J sensor opens a venue for a re-useable universal platform that can be adopted at low cost for the analysis of DNA or RNA targets. PMID:29371782

  14. Identification of single amino acid substitutions (SAAS) in neuraminidase from influenza a virus (H1N1) via mass spectrometry analysis coupled with de novo peptide sequencing.

    PubMed

    Peng, Qisheng; Wang, Zijian; Wu, Donglin; Li, Xiaoou; Liu, Xiaofeng; Sun, Wanchun; Liu, Ning

    2016-08-01

    Amino acid substitutions in the neuraminidase of the influenza virus are the main cause of the emergence of resistance to zanamivir or oseltamivir during seasonal influenza treatment; they are the result of non-synonymous mutations in the viral genome that can be successfully detected by polymer chain reaction (PCR)-based approaches. There is always an urgent need to detect variation in amino acid sequences directly at the protein level. Mass spectrometry coupled with de novo sequencing has been explored as an alternative and straightforward strategy for detecting amino acid substitutions, as well - this approach is the primary focus of the present study. Influenza virus (A/Puerto Rico/8/1934 H1N1) propagated in embryonated chicken eggs was purified by ultracentrifugation, followed by PNGase F treatment. The deglycosylated virion was lysed and separated by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). The gel band corresponding to neuraminidase was picked up and subjected to liquid chromatography tandem mass spectrometry (LC-MS/MS) analysis. LC-MS/MS analyses, coupled with manual de novo sequencing, allowed the determination of three amino acid substitutions: R346K, S349 N, and S370I/L, in the neuraminidase from the influenza virus (A/Puerto Rico/8/1934 H1N1), which were located in three mutated peptides of the neuraminidase: YGNGVWIGK, TKNHSSR, and PNGWTETDI/LK, respectively. We found that the amino acid substitutions in the proteins of RNA viruses (including influenza A virus) resulting from non-synonymous gene mutations can indeed be directly analyzed via mass spectrometry, and that manual interpretation of the MS/MS data may be beneficial. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  15. Nucleic and amino acid sequences relating to a novel transketolase, and methods for the expression thereof

    DOEpatents

    Croteau, Rodney Bruce; Wildung, Mark Raymond; Lange, Bernd Markus; McCaskill, David G.

    2001-01-01

    cDNAs encoding 1-deoxyxylulose-5-phosphate synthase from peppermint (Mentha piperita) have been isolated and sequenced, and the corresponding amino acid sequences have been determined. Accordingly, isolated DNA sequences (SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7) are provided which code for the expression of 1-deoxyxylulose-5-phosphate synthase from plants. In another aspect the present invention provides for isolated, recombinant DXPS proteins, such as the proteins having the sequences set forth in SEQ ID NO:4, SEQ ID NO:6 and SEQ ID NO:8. In other aspects, replicable recombinant cloning vehicles are provided which code for plant 1-deoxyxylulose-5-phosphate synthases, or for a base sequence sufficiently complementary to at least a portion of 1-deoxyxylulose-5-phosphate synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding a plant 1-deoxyxylulose-5-phosphate synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant 1-deoxyxylulose-5-phosphate synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant 1-deoxyxylulose-5-phosphate synthase may be used to obtain expression or enhanced expression of 1-deoxyxylulose-5-phosphate synthase in plants in order to enhance the production of 1-deoxyxylulose-5-phosphate, or its derivatives such as isopentenyl diphosphate (BP), or may be otherwise employed for the regulation or expression of 1-deoxyxylulose-5-phosphate synthase, or the production of its products.

  16. Whole-Genome Sequence of the Anaerobic Isosaccharinic Acid Degrading Isolate, Macellibacteroides fermentans Strain HH-ZS

    PubMed Central

    Rout, Simon P.; Salah, Zohier B.; Charles, Christopher J.

    2017-01-01

    Abstract The ability of micro-organisms to degrade isosaccharinic acids (ISAs) while tolerating hyperalkaline conditions is pivotal to our understanding of the biogeochemistry associated within these environs, but also in scenarios pertaining to the cementitious disposal of radioactive wastes. An alkalitolerant, ISA degrading micro-organism was isolated from the hyperalkaline soils resulting from lime depositions. Here, we report the first whole-genome sequence, ISA degradation profile and carbohydrate preoteome of a Macellibacteroides fermentans strain HH-ZS, 4.08 Mb in size, coding 3,241 proteins, 64 tRNA, and 1 rRNA. PMID:28859355

  17. Identification of amino acid sequences in the polyomavirus capsid proteins that serve as nuclear localization signals

    NASA Technical Reports Server (NTRS)

    Chang, D.; Haynes, J. I. Jr; Brady, J. N.; Consigli, R. A.; Spooner, B. S. (Principal Investigator)

    1993-01-01

    The molecular mechanism participating in the transport of newly synthesized proteins from the cytoplasm to the nucleus in mammalian cells is poorly understood. Recently, the nuclear localization signal sequences (NLS) of many nuclear proteins have been identified, and most have been found to be composed of a highly basic amino acid stretch. A genetic "subtractive" and a biochemical "additive" approach were used in our studies to identify the NLS's of the polyomavirus structural capsid proteins. An NLS was identified at the N-terminus (Ala1-Pro-Lys-Arg-Lys-Ser-Gly-Val-Ser-Lys-Cys11) of the major capsid protein VP1 and at the C-terminus (Glu307 -Glu-Asp-Gly-Pro-Glu-Lys-Lys-Lys-Arg-Arg-Leu318) of the VP2/VP3 minor capsid proteins.

  18. Exome sequencing and SNP analysis detect novel compound heterozygosity in fatty acid hydroxylase-associated neurodegeneration

    PubMed Central

    Pierson, Tyler Mark; Simeonov, Dimitre R; Sincan, Murat; Adams, David A; Markello, Thomas; Golas, Gretchen; Fuentes-Fajardo, Karin; Hansen, Nancy F; Cherukuri, Praveen F; Cruz, Pedro; Blackstone, Craig; Tifft, Cynthia; Boerkoel, Cornelius F; Gahl, William A

    2012-01-01

    Fatty acid hydroxylase-associated neurodegeneration due to fatty acid 2-hydroxylase deficiency presents with a wide range of phenotypes including spastic paraplegia, leukodystrophy, and/or brain iron deposition. All previously described families with this disorder were consanguineous, with homozygous mutations in the probands. We describe a 10-year-old male, from a non-consanguineous family, with progressive spastic paraplegia, dystonia, ataxia, and cognitive decline associated with a sural axonal neuropathy. The use of high-throughput sequencing techniques combined with SNP array analyses revealed a novel paternally derived missense mutation and an overlapping novel maternally derived ∼28-kb genomic deletion in FA2H. This patient provides further insight into the consistent features of this disorder and expands our understanding of its phenotypic presentation. The presence of a sural nerve axonal neuropathy had not been previously associated with this disorder and so may extend the phenotype. PMID:22146942

  19. Studies of a biochemical factory: tomato trichome deep expressed sequence tag sequencing and proteomics.

    PubMed

    Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L

    2010-07-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.

  20. Next-generation sequencing library preparation method for identification of RNA viruses on the Ion Torrent Sequencing Platform.

    PubMed

    Chen, Guiqian; Qiu, Yuan; Zhuang, Qingye; Wang, Suchun; Wang, Tong; Chen, Jiming; Wang, Kaicheng

    2018-05-09

    Next generation sequencing (NGS) is a powerful tool for the characterization, discovery, and molecular identification of RNA viruses. There were multiple NGS library preparation methods published for strand-specific RNA-seq, but some methods are not suitable for identifying and characterizing RNA viruses. In this study, we report a NGS library preparation method to identify RNA viruses using the Ion Torrent PGM platform. The NGS sequencing adapters were directly inserted into the sequencing library through reverse transcription and polymerase chain reaction, without fragmentation and ligation of nucleic acids. The results show that this method is simple to perform, able to identify multiple species of RNA viruses in clinical samples.

  1. Protein-Protein Interactions Prediction Using a Novel Local Conjoint Triad Descriptor of Amino Acid Sequences

    PubMed Central

    Zhang, Long; Jia, Lianyin; Ren, Yazhou

    2017-01-01

    Protein-protein interactions (PPIs) play crucial roles in almost all cellular processes. Although a large amount of PPIs have been verified by high-throughput techniques in the past decades, currently known PPIs pairs are still far from complete. Furthermore, the wet-lab experiments based techniques for detecting PPIs are time-consuming and expensive. Hence, it is urgent and essential to develop automatic computational methods to efficiently and accurately predict PPIs. In this paper, a sequence-based approach called DNN-LCTD is developed by combining deep neural networks (DNNs) and a novel local conjoint triad description (LCTD) feature representation. LCTD incorporates the advantage of local description and conjoint triad, thus, it is capable to account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. DNNs can not only learn suitable features from the data by themselves, but also learn and discover hierarchical representations of data. When performing on the PPIs data of Saccharomyces cerevisiae, DNN-LCTD achieves superior performance with accuracy as 93.12%, precision as 93.75%, sensitivity as 93.83%, area under the receiver operating characteristic curve (AUC) as 97.92%, and it only needs 718 s. These results indicate DNN-LCTD is very promising for predicting PPIs. DNN-LCTD can be a useful supplementary tool for future proteomics study. PMID:29117139

  2. Protein-Protein Interactions Prediction Using a Novel Local Conjoint Triad Descriptor of Amino Acid Sequences.

    PubMed

    Wang, Jun; Zhang, Long; Jia, Lianyin; Ren, Yazhou; Yu, Guoxian

    2017-11-08

    Protein-protein interactions (PPIs) play crucial roles in almost all cellular processes. Although a large amount of PPIs have been verified by high-throughput techniques in the past decades, currently known PPIs pairs are still far from complete. Furthermore, the wet-lab experiments based techniques for detecting PPIs are time-consuming and expensive. Hence, it is urgent and essential to develop automatic computational methods to efficiently and accurately predict PPIs. In this paper, a sequence-based approach called DNN-LCTD is developed by combining deep neural networks (DNNs) and a novel local conjoint triad description (LCTD) feature representation. LCTD incorporates the advantage of local description and conjoint triad, thus, it is capable to account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. DNNs can not only learn suitable features from the data by themselves, but also learn and discover hierarchical representations of data. When performing on the PPIs data of Saccharomyces cerevisiae , DNN-LCTD achieves superior performance with accuracy as 93.12%, precision as 93.75%, sensitivity as 93.83%, area under the receiver operating characteristic curve (AUC) as 97.92%, and it only needs 718 s. These results indicate DNN-LCTD is very promising for predicting PPIs. DNN-LCTD can be a useful supplementary tool for future proteomics study.

  3. Complete Genome Sequence of Moraxella osloensis Strain KMC41, a Producer of 4-Methyl-3-Hexenoic Acid, a Major Malodor Compound in Laundry

    PubMed Central

    Hirakawa, Hideki; Morita, Yuji; Tomida, Junko; Sato, Jun; Matsumura, Yuta; Mitani, Asako; Niwano, Yu; Takeuchi, Kohei; Kubota, Hiromi; Kawamura, Yoshiaki

    2016-01-01

    We report the complete genome sequence of Moraxella osloensis strain KMC41, isolated from laundry with malodor. The KMC41 genome comprises a 2,445,556-bp chromosome and three plasmids. A fatty acid desaturase and at least four β-oxidation-related genes putatively associated with 4-methyl-3-hexenoic acid generation were detected in the KMC41 chromosome. PMID:27445387

  4. Biosynthesis of Lipoic Acid in Arabidopsis: Cloning and Characterization of the cDNA for Lipoic Acid Synthase1

    PubMed Central

    Yasuno, Rie; Wada, Hajime

    1998-01-01

    Lipoic acid is a coenzyme that is essential for the activity of enzyme complexes such as those of pyruvate dehydrogenase and glycine decarboxylase. We report here the isolation and characterization of LIP1 cDNA for lipoic acid synthase of Arabidopsis. The Arabidopsis LIP1 cDNA was isolated using an expressed sequence tag homologous to the lipoic acid synthase of Escherichia coli. This cDNA was shown to code for Arabidopsis lipoic acid synthase by its ability to complement a lipA mutant of E. coli defective in lipoic acid synthase. DNA-sequence analysis of the LIP1 cDNA revealed an open reading frame predicting a protein of 374 amino acids. Comparisons of the deduced amino acid sequence with those of E. coli and yeast lipoic acid synthase homologs showed a high degree of sequence similarity and the presence of a leader sequence presumably required for import into the mitochondria. Southern-hybridization analysis suggested that LIP1 is a single-copy gene in Arabidopsis. Western analysis with an antibody against lipoic acid synthase demonstrated that this enzyme is located in the mitochondrial compartment in Arabidopsis cells as a 43-kD polypeptide. PMID:9808738

  5. 5S ribosomal ribonucleic acid sequences in Bacteroides and Fusobacterium: evolutionary relationships within these genera and among eubacteria in general

    NASA Technical Reports Server (NTRS)

    Van den Eynde, H.; De Baere, R.; Shah, H. N.; Gharbia, S. E.; Fox, G. E.; Michalik, J.; Van de Peer, Y.; De Wachter, R.

    1989-01-01

    The 5S ribosomal ribonucleic acid (rRNA) sequences were determined for Bacteroides fragilis, Bacteroides thetaiotaomicron, Bacteroides capillosus, Bacteroides veroralis, Porphyromonas gingivalis, Anaerorhabdus furcosus, Fusobacterium nucleatum, Fusobacterium mortiferum, and Fusobacterium varium. A dendrogram constructed by a clustering algorithm from these sequences, which were aligned with all other hitherto known eubacterial 5S rRNA sequences, showed differences as well as similarities with respect to results derived from 16S rRNA analyses. In the 5S rRNA dendrogram, Bacteroides clustered together with Cytophaga and Fusobacterium, as in 16S rRNA analyses. Intraphylum relationships deduced from 5S rRNAs suggested that Bacteroides is specifically related to Cytophaga rather than to Fusobacterium, as was suggested by 16S rRNA analyses. Previous taxonomic considerations concerning the genus Bacteroides, based on biochemical and physiological data, were confirmed by the 5S rRNA sequence analysis.

  6. DNA sequence similarity recognition by hybridization to short oligomers

    DOEpatents

    Milosavljevic, Aleksandar

    1999-01-01

    Methods are disclosed for the comparison of nucleic acid sequences. Data is generated by hybridizing sets of oligomers with target nucleic acids. The data thus generated is manipulated simultaneously with respect to both (i) matching between oligomers and (ii) matching between oligomers and putative reference sequences available in databases. Using data compression methods to manipulate this mutual information, sequences for the target can be constructed.

  7. Analysis of Draft Genome Sequence of Pseudomonas sp. QTF5 Reveals Its Benzoic Acid Degradation Ability and Heavy Metal Tolerance

    PubMed Central

    Li, Yang; Ren, Yi

    2017-01-01

    Pseudomonas sp. QTF5 was isolated from the continuous permafrost near the bitumen layers in the Qiangtang basin of Qinghai-Tibetan Plateau in China (5,111 m above sea level). It is psychrotolerant and highly and widely tolerant to heavy metals and has the ability to metabolize benzoic acid and salicylic acid. To gain insight into the genetic basis for its adaptation, we performed whole genome sequencing and analyzed the resistant genes and metabolic pathways. Based on 120 published and annotated genomes representing 31 species in the genus Pseudomonas, in silico genomic DNA-DNA hybridization (<54%) and average nucleotide identity calculation (<94%) revealed that QTF5 is closest to Pseudomonas lini and should be classified into a novel species. This study provides the genetic basis to identify the genes linked to its specific mechanisms for adaptation to extreme environment and application of this microorganism in environmental conservation. PMID:29270429

  8. Evolutionary connections of biological kingdoms based on protein and nucleic acid sequence evidence

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.

    1983-01-01

    Prokaryotic and eukaryotic evolutionary trees are developed from protein and nucleic-acid sequences by the methods of numerical taxonomy. Trees are presented for bacterial ferredoxins, 5S ribosomal RNA, c-type cytochromes , cytochromes c2 and c', and 5.8S ribosomal RNA; the implications for early evolution are discussed; and a composite tree showing the branching of the anaerobes, aerobes, archaebacteria, and eukaryotes is shown. Single lines are found for all oxygen-evolving photosynthetic forms and for the salt-loving and high-temperature forms of archaebacteria. It is argued that the eukaryote mitochondria, chloroplasts, and cytoplasmic host material are descended from free-living prokaryotes that formed symbiotic associations, with more than one symbiotic event involved in the evolution of each organelle.

  9. Sequencing proteins with transverse ionic transport in nanochannels.

    PubMed

    Boynton, Paul; Di Ventra, Massimiliano

    2016-05-03

    De novo protein sequencing is essential for understanding cellular processes that govern the function of living organisms and all sequence modifications that occur after a protein has been constructed from its corresponding DNA code. By obtaining the order of the amino acids that compose a given protein one can then determine both its secondary and tertiary structures through structure prediction, which is used to create models for protein aggregation diseases such as Alzheimer's Disease. Here, we propose a new technique for de novo protein sequencing that involves translocating a polypeptide through a synthetic nanochannel and measuring the ionic current of each amino acid through an intersecting perpendicular nanochannel. We find that the distribution of ionic currents for each of the 20 proteinogenic amino acids encoded by eukaryotic genes is statistically distinct, showing this technique's potential for de novo protein sequencing.

  10. Development and Evaluation of Novel Real-Time Reverse Transcription-PCR Assays with Locked Nucleic Acid Probes Targeting Leader Sequences of Human-Pathogenic Coronaviruses

    PubMed Central

    Chan, Jasper Fuk-Woo; Choi, Garnet Kwan-Yue; Tsang, Alan Ka-Lun; Tee, Kah-Meng; Lam, Ho-Yin; Yip, Cyril Chik-Yan; To, Kelvin Kai-Wang; Cheng, Vincent Chi-Chung; Yeung, Man-Lung; Lau, Susanna Kar-Pui; Woo, Patrick Chiu-Yat; Chan, Kwok-Hung; Tang, Bone Siu-Fai

    2015-01-01

    Based on findings in small RNA-sequencing (Seq) data analysis, we developed highly sensitive and specific real-time reverse transcription (RT)-PCR assays with locked nucleic acid probes targeting the abundantly expressed leader sequences of Middle East respiratory syndrome coronavirus (MERS-CoV) and other human coronaviruses. Analytical and clinical evaluations showed their noninferiority to a commercial multiplex PCR test for the detection of these coronaviruses. PMID:26019210

  11. An In Vivo Study of Self-Regulated Study Sequencing in Introductory Psychology Courses

    PubMed Central

    de Leeuw, Joshua R.; Motz, Benjamin A.; Goldstone, Robert L.

    2016-01-01

    Study sequence can have a profound influence on learning. In this study we investigated how students decide to sequence their study in a naturalistic context and whether their choices result in improved learning. In the study reported here, 2061 undergraduate students enrolled in an Introductory Psychology course completed an online homework tutorial on measures of central tendency, a topic relevant to an exam that counted towards their grades. One group of students was enabled to choose their own study sequence during the tutorial (Self-Regulated group), while the other group of students studied the same materials in sequences chosen by other students (Yoked group). Students who chose their sequence of study showed a clear tendency to block their study by concept, and this tendency was positively associated with subsequent exam performance. In the Yoked group, study sequence had no effect on exam performance. These results suggest that despite findings that blocked study is maladaptive when assigned by an experimenter, it may actually be adaptive when chosen by the learner in a naturalistic context. PMID:27003164

  12. An In Vivo Study of Self-Regulated Study Sequencing in Introductory Psychology Courses.

    PubMed

    Carvalho, Paulo F; Braithwaite, David W; de Leeuw, Joshua R; Motz, Benjamin A; Goldstone, Robert L

    2016-01-01

    Study sequence can have a profound influence on learning. In this study we investigated how students decide to sequence their study in a naturalistic context and whether their choices result in improved learning. In the study reported here, 2061 undergraduate students enrolled in an Introductory Psychology course completed an online homework tutorial on measures of central tendency, a topic relevant to an exam that counted towards their grades. One group of students was enabled to choose their own study sequence during the tutorial (Self-Regulated group), while the other group of students studied the same materials in sequences chosen by other students (Yoked group). Students who chose their sequence of study showed a clear tendency to block their study by concept, and this tendency was positively associated with subsequent exam performance. In the Yoked group, study sequence had no effect on exam performance. These results suggest that despite findings that blocked study is maladaptive when assigned by an experimenter, it may actually be adaptive when chosen by the learner in a naturalistic context.

  13. Complete Genome Sequence of Moraxella osloensis Strain KMC41, a Producer of 4-Methyl-3-Hexenoic Acid, a Major Malodor Compound in Laundry.

    PubMed

    Goto, Takatsugu; Hirakawa, Hideki; Morita, Yuji; Tomida, Junko; Sato, Jun; Matsumura, Yuta; Mitani, Asako; Niwano, Yu; Takeuchi, Kohei; Kubota, Hiromi; Kawamura, Yoshiaki

    2016-07-21

    We report the complete genome sequence of Moraxella osloensis strain KMC41, isolated from laundry with malodor. The KMC41 genome comprises a 2,445,556-bp chromosome and three plasmids. A fatty acid desaturase and at least four β-oxidation-related genes putatively associated with 4-methyl-3-hexenoic acid generation were detected in the KMC41 chromosome. Copyright © 2016 Goto et al.

  14. EGVII endoglucanase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Goedegebuur, Frits; Ward, Michael; Yao, Jian

    2014-02-25

    The present invention provides a novel endoglucanase nucleic acid sequence, designated egl7, and the corresponding EGVII amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding EGVII, recombinant EGVII proteins and methods for producing the same.

  15. EGVII endoglucanase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Goedegebuur, Frits; Ward, Michael; Yao, Jian

    2006-05-16

    The present invention provides a novel endoglucanase nucleic acid sequence, designated egl7, and the corresponding EGVII amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding EGVII, recombinant EGVII proteins and methods for producing the same.

  16. EGVI endoglucanase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel [Los Gatos, CA; Goedegebuur, Frits [Vlaardingen, NL; Ward, Michael [San Francisco, CA; Yao, Jian [Sunnyvale, CA

    2008-04-01

    The present invention provides a novel endoglucanase nucleic acid sequence, designated egl6, and the corresponding EGVI amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding EGVI, recombinant EGVI proteins and methods for producing the same.

  17. EGVI endoglucanase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Goedegebuur, Frits; Ward, Michael; Yao, Jian

    2010-10-12

    The present invention provides a novel endoglucanase nucleic acid sequence, designated egl6, and the corresponding EGVI amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding EGVI, recombinant EGVI proteins and methods for producing the same.

  18. EGVIII endoglucanase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Goedegebuur, Frits; Ward, Michael; Yao, Jian

    2006-05-23

    The present invention provides a novel endoglucanase nucleic acid sequence, designated egl8, and the corresponding EGVIII amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding EGVIII, recombinant EGVIII proteins and methods for producing the same.

  19. EGVI endoglucanase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Goedegebuur, Frits; Ward, Michael; Yao, Jian

    2010-10-05

    The present invention provides a novel endoglucanase nucleic acid sequence, designated egl6, and the corresponding EGVI amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding EGVI, recombinant EGVI proteins and methods for producing the same.

  20. EGVI endoglucanase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Goedegebuur, Frits; Ward, Michael; Yao, Jian

    2006-06-06

    The present invention provides a novel endoglucanase nucleic acid sequence, designated egl6, and the corresponding EGVI amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding EGVI, recombinant EGVI proteins and methods for producing the same.

  1. EGVII endoglucanase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel [Los Gatos, CA; Goedegebuur, Frits [Vlaardingen, NL; Ward, Michael [San Francisco, CA; Yao, Jian [Sunnyvale, CA

    2009-05-05

    The present invention provides an endoglucanase nucleic acid sequence, designated egl7, and the corresponding EGVII amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding EGVII, recombinant EGVII proteins and methods for producing the same.

  2. EGVII endoglucanase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Goedegebuur, Frits; Ward, Michael; Yao, Jian

    2013-07-16

    The present invention provides a novel endoglucanase nucleic acid sequence, designated egl7, and the corresponding EGVII amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding EGVII, recombinant EGVII proteins and methods for producing the same.

  3. EGVII endoglucanase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel [Los Gatos, CA; Goedegebuur, Frits [Vlaardingen, NL; Ward, Michael [San Francisco, CA; Yao, Jian [Sunnyvale, CA

    2012-02-14

    The present invention provides a novel endoglucanase nucleic acid sequence, designated egl7, and the corresponding EGVII amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding EGVII, recombinant EGVII proteins and methods for producing the same.

  4. EGVII endoglucanase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Goedegebuur, Frits; Ward, Michael; Yao, Jian

    2015-04-14

    The present invention provides a novel endoglucanase nucleic acid sequence, designated egl7, and the corresponding EGVII amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding EGVII, recombinant EGVII proteins and methods for producing the same.

  5. Cloning, sequencing, and expression of the gene coding for bile acid 7 alpha-hydroxysteroid dehydrogenase from Eubacterium sp. strain VPI 12708.

    PubMed Central

    Baron, S F; Franklund, C V; Hylemon, P B

    1991-01-01

    Southern blot analysis indicated that the gene encoding the constitutive, NADP-linked bile acid 7 alpha-hydroxysteroid dehydrogenase of Eubacterium sp. strain VPI 12708 was located on a 6.5-kb EcoRI fragment of the chromosomal DNA. This fragment was cloned into bacteriophage lambda gt11, and a 2.9-kb piece of this insert was subcloned into pUC19, yielding the recombinant plasmid pBH51. DNA sequence analysis of the 7 alpha-hydroxysteroid dehydrogenase gene in pBH51 revealed a 798-bp open reading frame, coding for a protein with a calculated molecular weight of 28,500. A putative promoter sequence and ribosome binding site were identified. The 7 alpha-hydroxysteroid dehydrogenase mRNA transcript in Eubacterium sp. strain VPI 12708 was about 0.94 kb in length, suggesting that it is monocistronic. An Escherichia coli DH5 alpha transformant harboring pBH51 had approximately 30-fold greater levels of 7 alpha-hydroxysteroid dehydrogenase mRNA, immunoreactive protein, and specific activity than Eubacterium sp. strain VPI 12708. The 7 alpha-hydroxysteroid dehydrogenase purified from the pBH51 transformant was similar in subunit molecular weight, specific activity, and kinetic properties to that from Eubacterium sp. strain VPI 12708, and it reached with antiserum raised against the authentic enzyme on Western immunoblots. Alignment of the amino acid sequence of the 7 alpha-hydroxysteroid dehydrogenase with those of 10 other pyridine nucleotide-linked alcohol/polyol dehydrogenases revealed six conserved amino acid residues in the N-terminal regions thought to function in coenzyme binding. Images PMID:1856160

  6. Amino acid sequence and the cellular location of the Na(+)-dependent D-glucose symporters (SGLT1) in the ovine enterocyte and the parotid acinar cell.

    PubMed Central

    Tarpey, P S; Wood, I S; Shirazi-Beechey, S P; Beechey, R B

    1995-01-01

    The Na(+)-dependent D-glucose symporter has been shown to be located on the basolateral domain of the plasma membrane of ovine parotid acinar cells. This is in contrast to the apical location of this transporter in the ovine enterocyte. The amino acid sequences of these two proteins have been determined. They are identical. The results indicated that the signals responsible for the differential targeting of these two proteins to the apical and the basal domains of the plasma membrane are not contained within the primary amino acid sequence. Images Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 PMID:7492327

  7. Quick identification of acetic acid bacteria based on nucleotide sequences of the 16S-23S rDNA internal transcribed spacer region and of the PQQ-dependent alcohol dehydrogenase gene.

    PubMed

    Trcek, Janja

    2005-10-01

    Acetic acid bacteria (AAB) are well known for oxidizing different ethanol-containing substrates into various types of vinegar. They are also used for production of some biotechnologically important products, such as sorbose and gluconic acids. However, their presence is not always appreciated since certain species also spoil wine, juice, beer and fruits. To be able to follow AAB in all these processes, the species involved must be identified accurately and quickly. Because of inaccuracy and very time-consuming phenotypic analysis of AAB, the application of molecular methods is necessary. Since the pairwise comparison among the 16S rRNA gene sequences of AAB shows very high similarity (up to 99.9%) other DNA-targets should be used. Our previous studies showed that the restriction analysis of 16S-23S rDNA internal transcribed spacer region is a suitable approach for quick affiliation of an acetic acid bacterium to a distinct group of restriction types and also for quick identification of a potentially novel species of acetic acid bacterium (Trcek & Teuber 2002; Trcek 2002). However, with the exception of two conserved genes, encoding tRNAIle and tRNAAla, the sequences of 16S-23S rDNA are highly divergent among AAB species. For this reason we analyzed in this study a gene encoding PQQ-dependent ADH as a possible DNA-target. First we confirmed the expression of subunit I of PQQ-dependent ADH (AdhA) also in Asaia, the only genus of AAB which exhibits little or no ADH-activity. Further we analyzed the partial sequences of adhA among some representative species of the genera Acetobacter, Gluconobacter and Gluconacetobacter. The conserved and variable regions in these sequences made possible the construction of A. acetispecific oligonucleotide the specificity of which was confirmed in PCR-reaction using 45 well-defined strains of AAB as DNA-templates. The primer was also successfully used in direct identification of A. aceti from home made cider vinegar as well as for

  8. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV)

    PubMed Central

    Martin, Andrew C. R.

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and ’dotifying’ repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/. PMID:25653836

  9. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

    PubMed

    Martin, Andrew C R

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.

  10. Bioinformatic Analysis of the Contribution of Primer Sequences to Aptamer Structures

    PubMed Central

    Ellington, Andrew D.

    2009-01-01

    Aptamers are nucleic acid molecules selected in vitro to bind a particular ligand. While numerous experimental studies have examined the sequences, structures, and functions of individual aptamers, considerably fewer studies have applied bioinformatics approaches to try to infer more general principles from these individual studies. We have used a large Aptamer Database to parse the contributions of both random and constant regions to the secondary structures of more than 2000 aptamers. We find that the constant, primer-binding regions do not, in general, contribute significantly to aptamer structures. These results suggest that (a) binding function is not contributed to nor constrained by constant regions; (b) in consequence, the landscape of functional binding sequences is sparse but robust, favoring scenarios for short, functional nucleic acid sequences near origins; and (c) many pool designs for the selection of aptamers are likely to prove robust. PMID:18594898

  11. Comparative study of IDH1 mutations in gliomas by immunohistochemistry and DNA sequencing

    PubMed Central

    Agarwal, Shipra; Sharma, Mehar Chand; Jha, Prerana; Pathak, Pankaj; Suri, Vaishali; Sarkar, Chitra; Chosdol, Kunzang; Suri, Ashish; Kale, Shashank Sharad; Mahapatra, Ashok Kumar; Jha, Pankaj

    2013-01-01

    Background Mutations involving isocitrate dehydrogenase 1 (IDH 1) occur in a high proportion of diffuse gliomas, with implications on diagnosis and prognosis. About 90% involve exon 4 at codon 132, replacing amino acid arginine with histidine (R132H). Rarer ones include R132C, R132S, R132G, R132L, R132V, and R132P. Most authors have used DNA-based methods to assess IDH1 status. Preliminary studies comparing imunohistochemistry (IHC) with IDH1-R132H mutation-specific antibodies have shown concordance with DNA sequencing and no cross-reactivity with wild-type IDH1 or other mutant proteins. The present study compares results of IHC with DNA sequencing in diffuse gliomas. Materials and methods Fifty diffuse gliomas with frozen tissue samples for DNA sequencing and adequate tissue in paraffin blocks for IHC using IDH1-R132H specific antibody were assessed for IDH1 mutations. Results Concordance of findings between IHC and DNA sequencing was noted in 88% (44/50) cases. All 6 cases with discrepancy were immunopositive with DIA-H09 antibody. While in 3 of these 6 cases, DNA sequencing failed to reveal any mutations, R132L (arginine replaced by leucine) mutation was found in the rest 3 cases. Interestingly, of the immunopositive cases, 46.6% (14/30) showed immunostaining in only a fraction of tumor cells. Conclusions IHC is an easy and quick method of detecting IDH1-R132H mutations, but there may be some discrepancies between IHC and DNA sequencing. Although there were no false-negative cases, cross-reactivity with IDH1-R132L was seen in 3, a finding not reported thus far. Because of more universal availability of IHC over genetic testing, cross-reactivity and staining heterogeneity may have bearing over its use in detecting IDH1-R132H mutation in gliomas. PMID:23486690

  12. Sequence repeats and protein structure

    NASA Astrophysics Data System (ADS)

    Hoang, Trinh X.; Trovato, Antonio; Seno, Flavio; Banavar, Jayanth R.; Maritan, Amos

    2012-11-01

    Repeats are frequently found in known protein sequences. The level of sequence conservation in tandem repeats correlates with their propensities to be intrinsically disordered. We employ a coarse-grained model of a protein with a two-letter amino acid alphabet, hydrophobic (H) and polar (P), to examine the sequence-structure relationship in the realm of repeated sequences. A fraction of repeated sequences comprises a distinct class of bad folders, whose folding temperatures are much lower than those of random sequences. Imperfection in sequence repetition improves the folding properties of the bad folders while deteriorating those of the good folders. Our results may explain why nature has utilized repeated sequences for their versatility and especially to design functional proteins that are intrinsically unstructured at physiological temperatures.

  13. Sequence space and the ongoing expansion of the protein universe.

    PubMed

    Povolotskaya, Inna S; Kondrashov, Fyodor A

    2010-06-17

    The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: approximately 98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, approximately 3.5 x 10(9) yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences.

  14. Parallel nucleic acid recognition by the LNA (locked nucleic acid) stereoisomers beta-L-LNA and alpha-D-LNA; studies in the mirror image world.

    PubMed

    Christensen, Nanna K; Bryld, Torsten; Sørensen, Mads D; Arar, Khalil; Wengel, Jesper; Nielsen, Poul

    2004-02-07

    Two LNA (locked nucleic acid) stereoisomers (beta-L-LNA and alpha-D-LNA) are evaluated in the mirror-image world, that is by the study of two mixed sequences of LNA and alpha-L-LNA and their L-DNA and L-RNA complements. Both are found to display high-affinity RNA-recognition by the formation of duplexes with parallel strand orientation.

  15. Effect of amino acid sequence and pH on nanofiber formation of self-assembling peptides EAK16-II and EAK16-IV.

    PubMed

    Hong, Yooseong; Legge, Raymond L; Zhang, S; Chen, P

    2003-01-01

    Atomic force microscopy (AFM) and axisymmetric drop shape analysis-profile (ASDA-P) were used to investigate the mechanism of self-assembly of peptides. The peptides chosen consisted of 16 alternating hydrophobic and hydrophilic amino acids, where the hydrophilic residues possess alternating negative and positive charges. Two types of peptides, AEAEAKAKAEAEAKAK (EAK16-II) and AEAEAEAEAKAKAKAK (EAK16-IV), were investigated in terms of nanostructure formation through self-assembly. The experimental results, which focused on the effects of the amino acid sequence and pH, show that the nanostructures formed by the peptides are dependent on the amino acid sequence and the pH of the solution. For pH conditions around neutrality, one of the peptides used in this study, EAK16-IV, forms globular assemblies and has lower surface tension at air-water interfaces than another peptide, EAK16-II, which forms fibrillar assemblies at the same pH. When the pH is lowered below 6.5 or raised above 7.5, there is a transition from globular to fibrillar structures for EAK16-IV, but EAK16-II does not show any structural transition. Surface tension measurements using ADSA-P showed different surface activities of peptides at air-water interfaces. EAK16-II does not show a significant difference in surface tension for the pH range between 4 and 9. However, EAK16-IV shows a noticeable decrease in surface tension at pH around neutrality, indicating that the formation of globular assemblies is related to the molecular hydrophobicity.

  16. Draft Genome Sequence of Lactobacillus delbrueckii subsp. bulgaricus CFL1, a Lactic Acid Bacterium Isolated from French Handcrafted Fermented Milk.

    PubMed

    Meneghel, Julie; Dugat-Bony, Eric; Irlinger, Françoise; Loux, Valentin; Vidal, Marie; Passot, Stéphanie; Béal, Catherine; Layec, Séverine; Fonseca, Fernanda

    2016-03-03

    Lactobacillus delbrueckii subsp. bulgaricus (L. bulgaricus) is a lactic acid bacterium widely used for the production of yogurt and cheeses. Here, we report the genome sequence of L. bulgaricus CFL1 to improve our knowledge on its stress-induced damages following production and end-use processes. Copyright © 2016 Meneghel et al.

  17. Axolotl hemoglobin: cDNA-derived amino acid sequences of two alpha globins and a beta globin from an adult Ambystoma mexicanum.

    PubMed

    Shishikura, Fumio; Takeuchi, Hiro-aki; Nagai, Takatoshi

    2005-11-01

    Erythrocytes of the adult axolotl, Ambystoma mexicanum, have multiple hemoglobins. We separated and purified two kinds of hemoglobin, termed major hemoglobin (Hb M) and minor hemoglobin (Hb m), from a five-year-old male by hydrophobic interaction column chromatography on Alkyl Superose. The hemoglobins have two distinct alpha type globin polypeptides (alphaM and alpham) and a common beta globin polypeptide, all of which were purified in FPLC on a reversed-phase column after S-pyridylethylation. The complete amino acid sequences of the three globin chains were determined separately using nucleotide sequencing with the assistance of protein sequencing. The mature globin molecules were composed of 141 amino acid residues for alphaM globin, 143 for alpham globin and 146 for beta globin. Comparing primary structures of the five kinds of axolotl globins, including two previously established alpha type globins from the same species, with other known globins of amphibians and representatives of other vertebrates, we constructed phylogenetic trees for amphibian hemoglobins and tetrapod hemoglobins. The molecular trees indicated that alphaM, alpham, beta and the previously known alpha major globin were adult types of globins and the other known alpha globin was a larval type. The existence of two to four more globins in the axolotl erythrocyte is predicted.

  18. Zn-metalloprotease sequences in extremophiles

    NASA Astrophysics Data System (ADS)

    Holden, T.; Dehipawala, S.; Golebiewska, U.; Cheung, E.; Tremberger, G., Jr.; Williams, E.; Schneider, P.; Gadura, N.; Lieberman, D.; Cheung, T.

    2010-09-01

    The Zn-metalloprotease family contains conserved amino acid structures such that the nucleotide fluctuation at the DNA level would exhibit correlated randomness as described by fractal dimension. A nucleotide sequence fractal dimension can be calculated from a numerical series consisting of the atomic numbers of each nucleotide. The structure's vibration modes can also be studied using a Gaussian Network Model. The vibration measure and fractal dimension values form a two-dimensional plot with a standard vector metric that can be used for comparison of structures. The preference for amino acid usage in extremophiles may suppress nucleotide fluctuations that could be analyzed in terms of fractal dimension and Shannon entropy. A protein level cold adaptation study of the thermolysin Zn-metalloprotease family using molecular dynamics simulation was reported recently and our results show that the associated nucleotide fluctuation suppression is consistent with a regression pattern generated from the sequences's fractal dimension and entropy values (R-square { 0.98, N =5). It was observed that cold adaptation selected for high entropy and low fractal dimension values. Extension to the Archaemetzincin M54 family in extremophiles reveals a similar regression pattern (R-square = 0.98, N = 6). It was observed that the metalloprotease sequences of extremely halophilic organisms possess high fractal dimension and low entropy values as compared with non-halophiles. The zinc atom is usually bonded to the histidine residue, which shows limited levels of vibration in the Gaussian Network Model. The variability of the fractal dimension and entropy for a given protein structure suggests that extremophiles would have evolved after mesophiles, consistent with the bias usage of non-prebiotic amino acids by extremophiles. It may be argued that extremophiles have the capacity to offer extinction protection during drastic changes in astrobiological environments.

  19. GAWK, a novel human pituitary polypeptide: isolation, immunocytochemical localization and complete amino acid sequence.

    PubMed

    Benjannet, S; Leduc, R; Lazure, C; Seidah, N G; Marcinkiewicz, M; Chrétien, M

    1985-01-16

    During the course of reverse-phase high pressure liquid chromatography (RP-HPLC) purification of a postulated big ACTH (1) from human pituitary gland extracts, a highly purified peptide bearing no resemblance to any known polypeptide was isolated. The complete sequence of this 74 amino acid polypeptide, called GAWK, has been determined. Search on a computer data bank on the possible homology to any known protein or fragment, using a mutation data matrix, failed to reveal any homology greater than 30%. An antibody produced against a synthetic fragment allowed us to detect several immunoreactive forms. The antisera also enabled us to localize the polypeptide, by immunocytochemistry, in the anterior lobe of the pituitary gland.

  20. Genome-Wide Association Study of Genetic Control of Seed Fatty Acid Biosynthesis in Brassica napus

    PubMed Central

    Gacek, Katarzyna; Bayer, Philipp E.; Bartkowiak-Broda, Iwona; Szala, Laurencja; Bocianowski, Jan; Edwards, David; Batley, Jacqueline

    2017-01-01

    Fatty acids and their composition in seeds determine oil value for nutritional or industrial purposes and also affect seed germination as well as seedling establishment. To better understand the genetic basis of seed fatty acid biosynthesis in oilseed rape (Brassica napus L.) we applied a genome-wide association study, using 91,205 single nucleotide polymorphisms (SNPs) characterized across a mapping population with high-resolution skim genotyping by sequencing (SkimGBS). We identified a cluster of loci on chromosome A05 associated with oleic and linoleic seed fatty acids. The delineated genomic region contained orthologs of the Arabidopsis thaliana genes known to play a role in regulation of seed fatty acid biosynthesis such as Fatty acyl-ACP thioesterase B (FATB) and Fatty Acid Desaturase (FAD5). This approach allowed us to identify potential functional genes regulating fatty acid composition in this important oil producing crop and demonstrates that this approach can be used as a powerful tool for dissecting complex traits for B. napus improvement programs. PMID:28163710

  1. Draft genome sequence of the oleaginous yeast Cryptococcus curvatus ATCC 20509

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Close, Dan; Ojumu, John O.

    Cryptococcus curvatus ATCC 20509 is a commonly used nonmodel oleaginous yeast capable of converting a variety of carbon sources into fatty acids. In addition, we present the draft genome sequence of this popular organism to provide a means for more in-depth studies of its fatty acid production potential.

  2. Draft genome sequence of the oleaginous yeast Cryptococcus curvatus ATCC 20509

    DOE PAGES

    Close, Dan; Ojumu, John O.

    2016-11-03

    Cryptococcus curvatus ATCC 20509 is a commonly used nonmodel oleaginous yeast capable of converting a variety of carbon sources into fatty acids. In addition, we present the draft genome sequence of this popular organism to provide a means for more in-depth studies of its fatty acid production potential.

  3. Cloning and sequencing of the allophycocyanin genes from Spirulina maxima (Cyanophyta)

    NASA Astrophysics Data System (ADS)

    Qin, Song; Hiroyuki, Kojima; Yoshikazu, Kawata; Shin-Ichi, Yano; Zeng, Cheng-Kui

    1998-03-01

    The genes coding for the α-and β-subunit of allophycocyanin ( apcA and apcB) from the cyanophyte Spirulina maxima were cloned and sequenced. The results revealed 44.4% of nucleotide sequence similarity and 30.4% of similarity of deduced amino acid sequence between them. The amino acid sequence identities between S. maxima and S. platensis are 99.4% for α subunit and 100% for β subunit.

  4. Sequencing of T-superfamily conotoxins from Conus virgo: pyroglutamic acid identification and disulfide arrangement by MALDI mass spectrometry.

    PubMed

    Mandal, Amit Kumar; Ramasamy, Mani Ramakrishnan Santhana; Sabareesh, Varatharajan; Openshaw, Matthew E; Krishnan, Kozhalmannom S; Balaram, Padmanabhan

    2007-08-01

    De novo mass spectrometric sequencing of two Conus peptides, Vi1359 and Vi1361, from the vermivorous cone snail Conus virgo, found off the southern Indian coast, is presented. The peptides, whose masses differ only by 2 Da, possess two disulfide bonds and an amidated C-terminus. Simple chemical modifications and enzymatic cleavage coupled with matrix assisted laser desorption ionization (MALDI) mass spectrometric analysis aided in establishing the sequences of Vi1359, ZCCITIPECCRI-NH(2), and Vi1361, ZCCPTMPECCRI-NH(2), which differ only at residues 4 and 6 (Z = pyroglutamic acid). The presence of the pyroglutamyl residue at the N-terminus was unambiguously identified by chemical hydrolysis of the cyclic amide, followed by esterification. The presence of Ile residues in both the peptides was confirmed from high-energy collision induced dissociation (CID) studies, using the observation of w(n)- and d(n)-ions as a diagnostic. Differential cysteine labeling, in conjunction with MALDI-MS/MS, permitted establishment of disulfide connectivity in both peptides as Cys2-Cys9 and Cys3-Cys10. The cysteine pattern clearly reveals that the peptides belong to the class of T-superfamily conotoxins, in particular the T-1 superfamily.

  5. Detection and quantification of Plasmodium falciparum in blood samples using quantitative nucleic acid sequence-based amplification.

    PubMed

    Schoone, G J; Oskam, L; Kroon, N C; Schallig, H D; Omar, S A

    2000-11-01

    A quantitative nucleic acid sequence-based amplification (QT-NASBA) assay for the detection of Plasmodium parasites has been developed. Primers and probes were selected on the basis of the sequence of the small-subunit rRNA gene. Quantification was achieved by coamplification of the RNA in the sample with one modified in vitro RNA as a competitor in a single-tube NASBA reaction. Parasite densities ranging from 10 to 10(8) Plasmodium falciparum parasites per ml could be demonstrated and quantified in whole blood. This is approximately 1,000 times more sensitive than conventional microscopy analysis of thick blood smears. Comparison of the parasite densities obtained by microscopy and QT-NASBA with 120 blood samples from Kenyan patients with clinical malaria revealed that for 112 of 120 (93%) of the samples results were within a 1-log difference. QT-NASBA may be especially useful for the detection of low parasite levels in patients with early-stage malaria and for the monitoring of the efficacy of drug treatment.

  6. CoSMoS: Conserved Sequence Motif Search in the proteome

    PubMed Central

    Liu, Xiao I; Korde, Neeraj; Jakob, Ursula; Leichert, Lars I

    2006-01-01

    Background With the ever-increasing number of gene sequences in the public databases, generating and analyzing multiple sequence alignments becomes increasingly time consuming. Nevertheless it is a task performed on a regular basis by researchers in many labs. Results We have now created a database called CoSMoS to find the occurrences and at the same time evaluate the significance of sequence motifs and amino acids encoded in the whole genome of the model organism Escherichia coli K12. We provide a precomputed set of multiple sequence alignments for each individual E. coli protein with all of its homologues in the RefSeq database. The alignments themselves, information about the occurrence of sequence motifs together with information on the conservation of each of the more than 1.3 million amino acids encoded in the E. coli genome can be accessed via the web interface of CoSMoS. Conclusion CoSMoS is a valuable tool to identify highly conserved sequence motifs, to find regions suitable for mutational studies in functional analyses and to predict important structural features in E. coli proteins. PMID:16433915

  7. Nucleic acid analysis using terminal-phosphate-labeled nucleotides

    DOEpatents

    Korlach, Jonas [Ithaca, NY; Webb, Watt W [Ithaca, NY; Levene, Michael [Ithaca, NY; Turner, Stephen [Ithaca, NY; Craighead, Harold G [Ithaca, NY; Foquet, Mathieu [Ithaca, NY

    2008-04-22

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  8. Sequence search on a supercomputer.

    PubMed

    Gotoh, O; Tagashira, Y

    1986-01-10

    A set of programs was developed for searching nucleic acid and protein sequence data bases for sequences similar to a given sequence. The programs, written in FORTRAN 77, were optimized for vector processing on a Hitachi S810-20 supercomputer. A search of a 500-residue protein sequence against the entire PIR data base Ver. 1.0 (1) (0.5 M residues) is carried out in a CPU time of 45 sec. About 4 min is required for an exhaustive search of a 1500-base nucleotide sequence against all mammalian sequences (1.2M bases) in Genbank Ver. 29.0. The CPU time is reduced to about a quarter with a faster version.

  9. Stabilization Effect of Amino Acid Side Chains in Peptide Assemblies on Graphite Studied by Scanning Tunneling Microscopy.

    PubMed

    Guo, Yuanyuan; Hou, Jingfei; Zhang, Xuemei; Yang, Yanlian; Wang, Chen

    2017-04-19

    An analysis is presented of the effects of amino acid side chains on peptide assemblies in ambient conditions on a graphite surface. The molecularly resolved assemblies of binary peptides are examined with scanning tunneling microscopy. A comparative analysis of the assembly structures reveals that the lamellae width has an appreciable dependence on the peptide sequence, which could be considered as a manifestation of a stabilizing effect of side-chain moieties of amino acids with high (phenylalanine) and low (alanine, asparagine, histidine and aspartic acid) propensities for aggregation. These amino acids are representative for the chemical structures involving the side chains of charged (histidine and aspartic acid), aromatic (phenylalanine), hydrophobic (alanine), and hydrophilic (asparagine) amino acids. These results might provide useful insight for understanding the effects of sequence on the assembly of surface-bound peptides. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Studying the evolutionary relationships and phylogenetic trees of 21 groups of tRNA sequences based on complex networks.

    PubMed

    Wei, Fangping; Chen, Bowen

    2012-03-01

    To find out the evolutionary relationships among different tRNA sequences of 21 amino acids, 22 networks are constructed. One is constructed from whole tRNAs, and the other 21 networks are constructed from the tRNAs which carry the same amino acids. A new method is proposed such that the alignment scores of any two amino acids groups are determined by the average degree and the average clustering coefficient of their networks. The anticodon feature of isolated tRNA and the phylogenetic trees of 21 group networks are discussed. We find that some isolated tRNA sequences in 21 networks still connect with other tRNAs outside their group, which reflects the fact that those tRNAs might evolve by intercrossing among these 21 groups. We also find that most anticodons among the same cluster are only one base different in the same sites when S ≥ 70, and they stay in the same rank in the ladder of evolutionary relationships. Those observations seem to agree on that some tRNAs might mutate from the same ancestor sequences based on point mutation mechanisms.

  11. The amino acid sequences of carboxypeptidases I and II from Aspergillus niger and their stability in the presence of divalent cations.

    PubMed

    Svendsen, I; Dal Degan, F

    1998-09-08

    The amino acid sequences of serine carboxypeptidase I (CPD-I) and II (CPD-II), respectively, from Aspergillus niger have been determined by conventional Edman degradation of the reduced and vinylpyridinated enzymes and peptides hereof generated by cleavage with cyanogen bromide, iodobenzoic acid, glutamic acid cleaving enzyme, AspN-endoproteinase and EndoLysC proteinase. CPD-I consists of a single peptide chain of 471 amino acid residues, three disulfide bridges and nine N-glycosylated asparaginyl residues, while CPD-II consists of a single peptide chain of 481 amino acid residues, has three disulfide bridges, one free cysteinyl residue and nine glycosylated asparaginyl residues. The enzymes are closely related to carboxypeptidase S3 from Penicillium janthinellum. Both Ca2+ and Mg2+ stabilize CPD-I as well as CPD-II, at basic pH values, Ca2+ being most effective, while the divalent ions have no effect on the activity of the two enzymes.

  12. Quantitative thermodynamic predication of interactions between nucleic acid and non-nucleic acid species using Microsoft excel.

    PubMed

    Zou, Jiaqi; Li, Na

    2013-09-01

    Proper design of nucleic acid sequences is crucial for many applications. We have previously established a thermodynamics-based quantitative model to help design aptamer-based nucleic acid probes by predicting equilibrium concentrations of all interacting species. To facilitate customization of this thermodynamic model for different applications, here we present a generic and easy-to-use platform to implement the algorithm of the model with Microsoft(®) Excel formulas and VBA (Visual Basic for Applications) macros. Two Excel spreadsheets have been developed: one for the applications involving only nucleic acid species, the other for the applications involving both nucleic acid and non-nucleic acid species. The spreadsheets take the nucleic acid sequences and the initial concentrations of all species as input, guide the user to retrieve the necessary thermodynamic constants, and finally calculate equilibrium concentrations for all species in various bound and unbound conformations. The validity of both spreadsheets has been verified by comparing the modeling results with the experimental results on nucleic acid sequences reported in the literature. This Excel-based platform described here will allow biomedical researchers to rationalize the sequence design of nucleic acid probes using the thermodynamics-based modeling even without relevant theoretical and computational skills. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  13. Arrays of probes for positional sequencing by hybridization

    DOEpatents

    Cantor, Charles R [Boston, MA; Prezetakiewiczr, Marek [East Boston, MA; Smith, Cassandra L [Boston, MA; Sano, Takeshi [Waltham, MA

    2008-01-15

    This invention is directed to methods and reagents useful for sequencing nucleic acid targets utilizing sequencing by hybridization technology comprising probes, arrays of probes and methods whereby sequence information is obtained rapidly and efficiently in discrete packages. That information can be used for the detection, identification, purification and complete or partial sequencing of a particular target nucleic acid. When coupled with a ligation step, these methods can be performed under a single set of hybridization conditions. The invention also relates to the replication of probe arrays and methods for making and replicating arrays of probes which are useful for the large scale manufacture of diagnostic aids used to screen biological samples for specific target sequences. Arrays created using PCR technology may comprise probes with 5'- and/or 3'-overhangs.

  14. Integrating mRNA and Protein Sequencing Enables the Detection and Quantitative Profiling of Natural Protein Sequence Variants of Populus trichocarpa.

    PubMed

    Abraham, Paul E; Wang, Xiaojing; Ranjan, Priya; Nookaew, Intawat; Zhang, Bing; Tuskan, Gerald A; Hettich, Robert L

    2015-12-04

    Next-generation sequencing has transformed the ability to link genotypes to phenotypes and facilitates the dissection of genetic contribution to complex traits. However, it is challenging to link genetic variants with the perturbed functional effects on proteins encoded by such genes. Here we show how RNA sequencing can be exploited to construct genotype-specific protein sequence databases to assess natural variation in proteins, providing information about the molecular toolbox driving cellular processes. For this study, we used two natural genotypes selected from a recent genome-wide association study of Populus trichocarpa, an obligate outcrosser with tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs), as well as insertions and deletions. We profiled the frequency of 128 types of naturally occurring amino acid substitutions, including both expected (neutral) and unexpected (non-neutral) SAAPs, with a subset occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. By zeroing in on the molecular signatures of these important regions that might have previously been uncharacterized, we now provide a high-resolution molecular inventory that should improve accessibility and subsequent identification of natural protein variants in future genotype-to-phenotype studies.

  15. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

    PubMed

    Schirmer, Melanie; Ijaz, Umer Z; D'Amore, Rosalinda; Hall, Neil; Sloan, William T; Quince, Christopher

    2015-03-31

    With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. A Novel Cylindrical Representation for Characterizing Intrinsic Properties of Protein Sequences.

    PubMed

    Yu, Jia-Feng; Dou, Xiang-Hua; Wang, Hong-Bo; Sun, Xiao; Zhao, Hui-Ying; Wang, Ji-Hua

    2015-06-22

    The composition and sequence order of amino acid residues are the two most important characteristics to describe a protein sequence. Graphical representations facilitate visualization of biological sequences and produce biologically useful numerical descriptors. In this paper, we propose a novel cylindrical representation by placing the 20 amino acid residue types in a circle and sequence positions along the z axis. This representation allows visualization of the composition and sequence order of amino acids at the same time. Ten numerical descriptors and one weighted numerical descriptor have been developed to quantitatively describe intrinsic properties of protein sequences on the basis of the cylindrical model. Their applications to similarity/dissimilarity analysis of nine ND5 proteins indicated that these numerical descriptors are more effective than several classical numerical matrices. Thus, the cylindrical representation obtained here provides a new useful tool for visualizing and charactering protein sequences. An online server is available at http://biophy.dzu.edu.cn:8080/CNumD/input.jsp .

  17. Primary structure of prostaglandin G/H synthase from sheep vesicular gland determined from the complementary DNA sequence.

    PubMed Central

    DeWitt, D L; Smith, W L

    1988-01-01

    Prostaglandin G/H synthase (8,11,14-icosatrienoate, hydrogen-donor:oxygen oxidoreductase, EC 1.14.99.1) catalyzes the first step in the formation of prostaglandins and thromboxanes, the conversion of arachidonic acid to prostaglandin endoperoxides G and H. This enzyme is the site of action of nonsteroidal anti-inflammatory drugs. We have isolated a 2.7-kilobase complementary DNA (cDNA) encompassing the entire coding region of prostaglandin G/H synthase from sheep vesicular glands. This cDNA, cloned from a lambda gt 10 library prepared from poly(A)+ RNA of vesicular glands, hybridizes with a single 2.75-kilobase mRNA species. The cDNA clone was selected using oligonucleotide probes modeled from amino acid sequences of tryptic peptides prepared from the purified enzyme. The full-length cDNA encodes a protein of 600 amino acids, including a signal sequence of 24 amino acids. Identification of the cDNA as coding for prostaglandin G/H synthase is based on comparison of amino acid sequences of seven peptides comprising 103 amino acids with the amino acid sequence deduced from the nucleotide sequence of the cDNA. The molecular weight of the unglycosylated enzyme lacking the signal peptide is 65,621. The synthase is a glycoprotein, and there are three potential sites for N-glycosylation, two of them in the amino-terminal half of the molecule. The serine reported to be acetylated by aspirin is at position 530, near the carboxyl terminus. There is no significant similarity between the sequence of the synthase and that of any other protein in amino acid or nucleotide sequence libraries, and a heme binding site(s) is not apparent from the amino acid sequence. The availability of a full-length cDNA clone coding for prostaglandin G/H synthase should facilitate studies of the regulation of expression of this enzyme and the structural features important for catalysis and for interaction with anti-inflammatory drugs. Images PMID:3125548

  18. Creation of a data base for sequences of ribosomal nucleic acids and detection of conserved restriction endonucleases sites through computerized processing.

    PubMed Central

    Patarca, R; Dorta, B; Ramirez, J L

    1982-01-01

    As part of a project pertaining the organization of ribosomal genes in Kinetoplastidae, we have created a data base for published sequences of ribosomal nucleic acids, with information in Spanish. As a first step in their processing, we have written a computer program which introduces the new feature of determining the length of the fragments produced after single or multiple digestion with any of the known restriction enzymes. With this information we have detected conserved SAU 3A sites: (i) at the 5' end of the 5.8S rRNA and at the 3' end of the small subunit rRNA, both included in similar larger sequences; (ii) in the 5.8S rRNA of vertebrates (a second one), which is not present in lower eukaryotes, showing a clear evolutive divergence; and, (iii) at the 5' terminal of the small subunit rRNA, included in a larger conserved sequence. The possible biological importance of these sequences is discussed. PMID:6278402

  19. Synthetic signal sequences that enable efficient secretory protein production in the yeast Kluyveromyces marxianus.

    PubMed

    Yarimizu, Tohru; Nakamura, Mikiko; Hoshida, Hisashi; Akada, Rinji

    2015-02-14

    Targeting of cellular proteins to the extracellular environment is directed by a secretory signal sequence located at the N-terminus of a secretory protein. These signal sequences usually contain an N-terminal basic amino acid followed by a stretch containing hydrophobic residues, although no consensus signal sequence has been identified. In this study, simple modeling of signal sequences was attempted using Gaussia princeps secretory luciferase (GLuc) in the yeast Kluyveromyces marxianus, which allowed comprehensive recombinant gene construction to substitute synthetic signal sequences. Mutational analysis of the GLuc signal sequence revealed that the GLuc hydrophobic peptide length was lower limit for effective secretion and that the N-terminal basic residue was indispensable. Deletion of the 16th Glu caused enhanced levels of secreted protein, suggesting that this hydrophilic residue defined the boundary of a hydrophobic peptide stretch. Consequently, we redesigned this domain as a repeat of a single hydrophobic amino acid between the N-terminal Lys and C-terminal Glu. Stretches consisting of Phe, Leu, Ile, or Met were effective for secretion but the number of residues affected secretory activity. A stretch containing sixteen consecutive methionine residues (M16) showed the highest activity; the M16 sequence was therefore utilized for the secretory production of human leukemia inhibitory factor protein in yeast, resulting in enhanced secreted protein yield. We present a new concept for the provision of secretory signal sequence ability in the yeast K. marxianus, determined by the number of residues of a single hydrophobic residue located between N-terminal basic and C-terminal acidic amino acid boundaries.

  20. KM+, a mannose-binding lectin from Artocarpus integrifolia: amino acid sequence, predicted tertiary structure, carbohydrate recognition, and analysis of the beta-prism fold.

    PubMed Central

    Rosa, J. C.; De Oliveira, P. S.; Garratt, R.; Beltramini, L.; Resing, K.; Roque-Barreira, M. C.; Greene, L. J.

    1999-01-01

    The complete amino acid sequence of the lectin KM+ from Artocarpus integrifolia (jackfruit), which contains 149 residues/mol, is reported and compared to those of other members of the Moraceae family, particularly that of jacalin, also from jackfruit, with which it shares 52% sequence identity. KM+ presents an acetyl-blocked N-terminus and is not posttranslationally modified by proteolytic cleavage as is the case for jacalin. Rather, it possesses a short, glycine-rich linker that unites the regions homologous to the alpha- and beta-chains of jacalin. The results of homology modeling implicate the linker sequence in sterically impeding rotation of the side chain of Asp141 within the binding site pocket. As a consequence, the aspartic acid is locked into a conformation adequate only for the recognition of equatorial hydroxyl groups on the C4 epimeric center (alpha-D-mannose, alpha-D-glucose, and their derivatives). In contrast, the internal cleavage of the jacalin chain permits free rotation of the homologous aspartic acid, rendering it capable of accepting hydrogen bonds from both possible hydroxyl configurations on C4. We suggest that, together with direct recognition of epimeric hydroxyls and the steric exclusion of disfavored ligands, conformational restriction of the lectin should be considered to be a new mechanism by which selectivity may be built into carbohydrate binding sites. Jacalin and KM+ adopt the beta-prism fold already observed in two unrelated protein families. Despite presenting little or no sequence similarity, an analysis of the beta-prism reveals a canonical feature repeatedly present in all such structures, which is based on six largely hydrophobic residues within a beta-hairpin containing two classic-type beta-bulges. We suggest the term beta-prism motif to describe this feature. PMID:10210179

  1. Effects of the amino acid sequence on thermal conduction through β-sheet crystals of natural silk protein.

    PubMed

    Zhang, Lin; Bai, Zhitong; Ban, Heng; Liu, Ling

    2015-11-21

    Recent experiments have discovered very different thermal conductivities between the spider silk and the silkworm silk. Decoding the molecular mechanisms underpinning the distinct thermal properties may guide the rational design of synthetic silk materials and other biomaterials for multifunctionality and tunable properties. However, such an understanding is lacking, mainly due to the complex structure and phonon physics associated with the silk materials. Here, using non-equilibrium molecular dynamics, we demonstrate that the amino acid sequence plays a key role in the thermal conduction process through β-sheets, essential building blocks of natural silks and a variety of other biomaterials. Three representative β-sheet types, i.e. poly-A, poly-(GA), and poly-G, are shown to have distinct structural features and phonon dynamics leading to different thermal conductivities. A fundamental understanding of the sequence effects may stimulate the design and engineering of polymers and biopolymers for desired thermal properties.

  2. The cDNA sequence of a neutral horseradish peroxidase.

    PubMed

    Bartonek-Roxå, E; Eriksson, H; Mattiasson, B

    1991-02-16

    A cDNA clone encoding a horseradish (Armoracia rusticana) peroxidase has been isolated and characterized. The cDNA contains 1378 nucleotides excluding the poly(A) tail and the deduced protein contains 327 amino acids which includes a 28 amino acid leader sequence. The predicted amino acid sequence is nine amino acids shorter than the major isoenzyme belonging to the horseradish peroxidase C group (HRP-C) and the sequence shows 53.7% identity with this isoenzyme. The described clone encodes nine cysteines of which eight correspond well with the cysteines found in HRP-C. Five potential N-glycosylation sites with the general sequence Asn-X-Thr/Ser are present in the deduced sequence. Compared to the earlier described HRP-C this is three glycosylation sites less. The shorter sequence and fewer N-glycosylation sites give the native isoenzyme a molecular weight of several thousands less than the horseradish peroxidase C isoenzymes. Comparison with the net charge value of HRP-C indicates that the described cDNA clone encodes a peroxidase which has either the same or a slightly less basic pI value, depending on whether the encoded protein is N-terminally blocked or not. This excludes the possibility that HRP-n could belong to either the HRP-A, -D or -E groups. The low sequence identity (53.7%) with HRP-C indicates that the described clone does not belong to the HRP-C isoenzyme group and comparison of the total amino acid composition with the HRP-B group does not place the described clone within this isoenzyme group. Our conclusion is that the described cDNA clone encodes a neutral horseradish peroxidase which belongs to a new, not earlier described, horseradish peroxidase group.

  3. Lactobacillus kefiri shows inter-strain variations in the amino acid sequence of the S-layer proteins.

    PubMed

    Malamud, Mariano; Carasi, Paula; Bronsoms, Sílvia; Trejo, Sebastián A; Serradell, María de Los Angeles

    2017-04-01

    The S-layer is a proteinaceous envelope constituted by subunits that self-assemble to form a two-dimensional lattice that covers the surface of different species of Bacteria and Archaea, and it could be involved in cell recognition of microbes among other several distinct functions. In this work, both proteomic and genomic approaches were used to gain knowledge about the sequences of the S-layer protein (SLPs) encoding genes expressed by six aggregative and sixteen non-aggregative strains of potentially probiotic Lactobacillus kefiri. Peptide mass fingerprint (PMF) analysis confirmed the identity of SLPs extracted from L. kefiri, and based on the homology with phylogenetically related species, primers located outside and inside the SLP-genes were employed to amplify genomic DNA. The O-glycosylation site SASSAS was found in all L. kefiri SLPs. Ten strains were selected for sequencing of the complete genes. The total length of the mature proteins varies from 492 to 576 amino acids, and all SLPs have a calculated pI between 9.37 and 9.60. The N-terminal region is relatively conserved and shows a high percentage of positively charged amino acids. Major differences among strains are found in the C-terminal region. Different groups could be distinguished regarding the mature SLPs and the similarities observed in the PMF spectra. Interestingly, SLPs of the aggregative strains are 100% homologous, although these strains were isolated from different kefir grains. This knowledge provides relevant data for better understanding of the mechanisms involved in SLPs functionality and could contribute to the development of products of biotechnological interest from potentially probiotic bacteria.

  4. Modular probes for enriching and detecting complex nucleic acid sequences

    NASA Astrophysics Data System (ADS)

    Wang, Juexiao Sherry; Yan, Yan Helen; Zhang, David Yu

    2017-12-01

    Complex DNA sequences are difficult to detect and profile, but are important contributors to human health and disease. Existing hybridization probes lack the capability to selectively bind and enrich hypervariable, long or repetitive sequences. Here, we present a generalized strategy for constructing modular hybridization probes (M-Probes) that overcomes these challenges. We demonstrate that M-Probes can tolerate sequence variations of up to 7 nt at prescribed positions while maintaining single nucleotide sensitivity at other positions. M-Probes are also shown to be capable of sequence-selectively binding a continuous DNA sequence of more than 500 nt. Furthermore, we show that M-Probes can detect genes with triplet repeats exceeding a programmed threshold. As a demonstration of this technology, we have developed a hybrid capture method to determine the exact triplet repeat expansion number in the Huntington's gene of genomic DNA using quantitative PCR.

  5. C-terminal amino acid residue loss for deprotonated peptide ions containing glutamic acid, aspartic acid, or serine residues at the C-terminus.

    PubMed

    Li, Zhong; Yalcin, Talat; Cassady, Carolyn J

    2006-07-01

    Deprotonated peptides containing C-terminal glutamic acid, aspartic acid, or serine residues were studied by sustained off-resonance irradiation collision-induced dissociation (SORI-CID) in a Fourier transform ion cyclotron resonance (FT-ICR) mass spectrometer with ion production by electrospray ionization (ESI). Additional studies were performed by post source decay (PSD) in a matrix-assisted laser desorption ionization/time-of-flight (MALDI/TOF) mass spectrometer. This work included both model peptides synthesized in our laboratory and bioactive peptides with more complex sequences. During SORI-CID and PSD, [M - H]- and [M - 2H]2- underwent an unusual cleavage corresponding to the elimination of the C-terminal residue. Two mechanisms are proposed to occur. They involve nucleophilic attack on the carbonyl carbon of the adjacent residue by either the carboxylate group of the C-terminus or the side chain carboxylate group of C-terminal glutamic acid and aspartic acid residues. To confirm the proposed mechanisms, AAAAAD was labelled by 18O specifically on the side chain of the aspartic acid residue. For peptides that contain multiple C-terminal glutamic acid residues, each of these residues can be sequentially eliminated from the deprotonated ions; a driving force may be the formation of a very stable pyroglutamatic acid neutral. For peptides with multiple aspartic acid residues at the C-terminus, aspartic acid residue loss is not sequential. For peptides with multiple serine residues at the C-terminus, C-terminal residue loss is sequential; however, abundant loss of other neutral molecules also occurs. In addition, the presence of basic residues (arginine or lysine) in the sequence has no effect on C-terminal residue elimination in the negative ion mode.

  6. Identifying a base in a nucleic acid

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2005-02-08

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  7. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition.

    PubMed

    Hayat, Maqsood; Khan, Asifullah

    2011-02-21

    Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor. Copyright © 2010 Elsevier Ltd. All rights reserved.

  8. Highly efficient chemical process to convert mucic acid into adipic acid and DFT studies of the mechanism of the rhenium-catalyzed deoxydehydration.

    PubMed

    Li, Xiukai; Wu, Di; Lu, Ting; Yi, Guangshun; Su, Haibin; Zhang, Yugen

    2014-04-14

    The production of bulk chemicals and fuels from renewable bio-based feedstocks is of significant importance for the sustainability of human society. Adipic acid, as one of the most-demanded drop-in chemicals from a bioresource, is used primarily for the large-volume production of nylon-6,6 polyamide. It is highly desirable to develop sustainable and environmentally friendly processes for the production of adipic acid from renewable feedstocks. However, currently there is no suitable bio-adipic acid synthesis process. Demonstrated herein is the highly efficient synthetic protocol for the conversion of mucic acid into adipic acid through the oxorhenium-complex-catalyzed deoxydehydration (DODH) reaction and subsequent Pt/C-catalyzed transfer hydrogenation. Quantitative yields (99 %) were achieved for the conversion of mucic acid into muconic acid and adipic acid either in separate sequences or in a one-step process. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Studying long 16S rDNA sequences with ultrafast-metagenomic sequence classification using exact alignments (Kraken).

    PubMed

    Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco

    2016-03-01

    Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences.

    PubMed

    Chen, Peng; Li, Jinyan; Wong, Limsoon; Kuwahara, Hiroyuki; Huang, Jianhua Z; Gao, Xin

    2013-08-01

    Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. Copyright © 2013 Wiley Periodicals, Inc.

  11. Molecular cloning and sequence analysis of full-length growth hormone cDNAs from six important economic fishes.

    PubMed

    Zhang, Jing-Nan; Song, Ping; Hu, Jia-Rui; Mo, Sai-Jun; Peng, Mao-Yu; Zhou, Wei; Zou, Ji-Xing; Hu, Yin-Chang

    2005-01-01

    In this study,the full-length cDNAs of GH (Growth Hormone) gene was isolated from six important economic fishes, Siniperca kneri, Epinephelus coioides, Monopterus albus, Silurus asotus, Misgurnus anguillicaudatus and Carassius auratus gibelio Bloch. It is the first time to clone these GH sequences except E. coioides GH. The lengths of the above cDNAs are as follows: 953 bp, 1 023 bp, 825 bp, 1 082 bp, 1 154 bp and 1 180 bp. Each sequence includes an ORF of about 600 bp which encodes a protein of about 200 amino acid: S. kneri, E. coioides and M. albus GHs of 204 amino acid, S. asotus GH of 200 amino acid, M. anguillicaudatus and C. auratus gibelio GHs of 210 amino acid. Then detailed sequence analysis of the six GHs with many other fish sequences was performed. The six sequences all showed high homology to other sequences, especially to sequences within the same order, and many conserved residues were identified, most localized in five domains. The phylogenetic trees (MP and NJ) of many fish GH ORF sequences (including the new six) with Amia calva as outgroup were generally resolved and largely congruent with the morphology-based tree though some incongruities were observed, suggesting GH ORF should be paid more attention to in teleostean phylogeny.

  12. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

    PubMed Central

    Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.

    2015-01-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030

  13. Acid Rain Study Guide.

    ERIC Educational Resources Information Center

    Hunger, Carolyn; And Others

    Acid rain is a complex, worldwide environmental problem. This study guide is intended to aid teachers of grades 4-12 to help their students understand what acid rain is, why it is a problem, and what possible solutions exist. The document contains specific sections on: (1) the various terms used in conjunction with acid rain (such as acid…

  14. Student-Created Definitions of Sequence Convergence: A Case Study

    ERIC Educational Resources Information Center

    Fisher, Brian

    2016-01-01

    This paper describes the development of an instructional sequence designed to allow students to reinvent the definition of sequence convergence in an introductory proof course. The sequence follows a heuristic of guided reinvention that encourages students to independently create their own mathematical definitions. This case study reports on how…

  15. Activity of human kallikrein-related peptidase 6 (KLK6) on substrates containing sequences of basic amino acids. Is it a processing protease?

    PubMed

    Silva, Roberta N; Oliveira, Lilian C G; Parise, Carolina B; Oliveira, Juliana R; Severino, Beatrice; Corvino, Angela; di Vaio, Paola; Temussi, Piero A; Caliendo, Giuseppe; Santagada, Vincenzo; Juliano, Luiz; Juliano, Maria A

    2017-05-01

    Human kallikrein 6 (KLK6) is highly expressed in the central nervous system and with elevated level in demyelinating disease. KLK6 has a very restricted specificity for arginine (R) and hydrolyses myelin basic protein, protein activator receptors and human ionotropic glutamate receptor subunits. Here we report a previously unreported activity of KLK6 on peptides containing clusters of basic amino acids, as in synthetic fluorogenic peptidyl-Arg-7-amino-4-carbamoylmethylcoumarin (peptidyl-ACC) peptides and FRET peptides in the format of Abz-peptidyl-Q-EDDnp (where Abz=ortho-aminobenzoic acid and Q-EDDnp=glutaminyl-N-(2,4-dinitrophenyl) ethylenediamine), in which pairs or sequences of basic amino acids (R or K) were introduced. Surprisingly, KLK6 hydrolyzed the fluorogenic peptides Bz-A-R ↓ R-ACC and Z-R ↓ R-MCA between the two R groups, resulting in non-fluorescent products. FRET peptides containing furin processing sequences of human MMP-14, nerve growth factor (NGF), Neurotrophin-3 (NT-3) and Neurotrophin-4 (NT-4) were cleaved by KLK6 at the same position expected by furin. Finally, KLK6 cleaved FRET peptides derived from human proenkephalin after the KR, the more frequent basic residues flanking enkephalins in human proenkephalin sequence. This result suggests the ability of KLK6 to release enkephalin from proenkephalin precursors and resembles furin a canonical processing proteolytic enzyme. Molecular models of peptides were built into the KLK6 structure and the marked preference of the cut between the two R of the examined peptides was related to the extended conformation of the substrates. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Method for nucleic acid hybridization using single-stranded DNA binding protein

    DOEpatents

    Tabor, Stanley; Richardson, Charles C.

    1996-01-01

    Method of nucleic acid hybridization for detecting the presence of a specific nucleic acid sequence in a population of different nucleic acid sequences using a nucleic acid probe. The nucleic acid probe hybridizes with the specific nucleic acid sequence but not with other nucleic acid sequences in the population. The method includes contacting a sample (potentially including the nucleic acid sequence) with the nucleic acid probe under hybridizing conditions in the presence of a single-stranded DNA binding protein provided in an amount which stimulates renaturation of a dilute solution (i.e., one in which the t.sub.1/2 of renaturation is longer than 3 weeks) of single-stranded DNA greater than 500 fold (i.e., to a t.sub.1/2 less than 60 min, preferably less than 5 min, and most preferably about 1 min.) in the absence of nucleotide triphosphates.

  17. Defining a Conformational Consensus Motif in Cotransin-Sensitive Signal Sequences: A Proteomic and Site-Directed Mutagenesis Study

    PubMed Central

    Klein, Wolfgang; Westendorf, Carolin; Schmidt, Antje; Conill-Cortés, Mercè; Rutz, Claudia; Blohs, Marcus; Beyermann, Michael; Protze, Jonas; Krause, Gerd; Krause, Eberhard; Schülein, Ralf

    2015-01-01

    The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar) to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity. PMID:25806945

  18. The catalytic chain of human complement subcomponent C1r. Purification and N-terminal amino acid sequences of the major cyanogen bromide-cleavage fragments.

    PubMed

    Arlaud, G J; Gagnon, J; Porter, R R

    1982-01-01

    1. The a- and b-chains of reduced and alkylated human complement subcomponent C1r were separated by high-pressure gel-permeation chromatography and isolated in good yield and in pure form. 2. CNBr cleavage of C1r b-chain yielded eight major peptides, which were purified by gel filtration and high-pressure reversed-phase chromatography. As determined from the sum of their amino acid compositions, these peptides accounted for a minimum molecular weight of 28 000, close to the value 29 100 calculated from the whole b-chain. 3. N-Terminal sequence determinations of C1r b-chain and its CNBr-cleavage peptides allowed the identification of about two-thirds of the amino acids of C1r b-chain. From our results, and on the basis of homology with other serine proteinases, an alignment of the eight CNBr-cleavage peptides from C1r b-chain is proposed. 4. The residues forming the 'charge-relay' system of the active site of serine proteinases (His-57, Asp-102 and Ser-195 in the chymotrypsinogen numbering) are found in the corresponding regions of C1r b-chain, and the amino acid sequence around these residues has been determined. 5. The N-terminal sequence of C1r b-chain has been extended to residue 60 and reveals that C1r b-chain lacks the 'histidine loop', a disulphide bond that is present in all other known serine proteinases.

  19. Sequence fingerprints distinguish erroneous from correct predictions of intrinsically disordered protein regions.

    PubMed

    Saravanan, Konda Mani; Dunker, A Keith; Krishnaswamy, Sankaran

    2017-12-27

    More than 60 prediction methods for intrinsically disordered proteins (IDPs) have been developed over the years, many of which are accessible on the World Wide Web. Nearly, all of these predictors give balanced accuracies in the ~65%-~80% range. Since predictors are not perfect, further studies are required to uncover the role of amino acid residues in native IDP as compared to predicted IDP regions. In the present work, we make use of sequences of 100% predicted IDP regions, false positive disorder predictions, and experimentally determined IDP regions to distinguish the characteristics of native versus predicted IDP regions. A higher occurrence of asparagine is observed in sequences of native IDP regions but not in sequences of false positive predictions of IDP regions. The occurrences of certain combinations of amino acids at the pentapeptide level provide a distinguishing feature in the IDPs with respect to globular proteins. The distinguishing features presented in this paper provide insights into the sequence fingerprints of amino acid residues in experimentally determined as compared to predicted IDP regions. These observations and additional work along these lines should enable the development of improvements in the accuracy of disorder prediction algorithm.

  20. Draft Genome Sequence of Sporolactobacillus inulinus Strain CASD, an Efficient d-Lactic Acid-Producing Bacterium with High-Concentration Lactate Tolerance Capability

    PubMed Central

    Yu, Bo; Su, Fei; Wang, Limin; Xu, Ke; Zhao, Bo; Xu, Ping

    2011-01-01

    Sporolactobacillus inulinus CASD is an efficient d-lactic acid producer with high optical purity. Here we report for the first time the draft genome sequence of S. inulinus (2,930,096 bp). The large number of annotated two-component system genes makes it possible to explore the mechanism of extraordinary lactate tolerance of S. inulinus CASD. PMID:21952540

  1. Draft genome sequence of Sporolactobacillus inulinus strain CASD, an efficient D-lactic acid-producing bacterium with high-concentration lactate tolerance capability.

    PubMed

    Yu, Bo; Su, Fei; Wang, Limin; Xu, Ke; Zhao, Bo; Xu, Ping

    2011-10-01

    Sporolactobacillus inulinus CASD is an efficient D-lactic acid producer with high optical purity. Here we report for the first time the draft genome sequence of S. inulinus (2,930,096 bp). The large number of annotated two-component system genes makes it possible to explore the mechanism of extraordinary lactate tolerance of S. inulinus CASD.

  2. FASMA: a service to format and analyze sequences in multiple alignments.

    PubMed

    Costantini, Susan; Colonna, Giovanni; Facchiano, Angelo M

    2007-12-01

    Multiple sequence alignments are successfully applied in many studies for under- standing the structural and functional relations among single nucleic acids and protein sequences as well as whole families. Because of the rapid growth of sequence databases, multiple sequence alignments can often be very large and difficult to visualize and analyze. We offer a new service aimed to visualize and analyze the multiple alignments obtained with different external algorithms, with new features useful for the comparison of the aligned sequences as well as for the creation of a final image of the alignment. The service is named FASMA and is available at http://bioinformatica.isa.cnr.it/FASMA/.

  3. A Glutamic Acid-Producing Lactic Acid Bacteria Isolated from Malaysian Fermented Foods

    PubMed Central

    Zareian, Mohsen; Ebrahimpour, Afshin; Bakar, Fatimah Abu; Mohamed, Abdul Karim Sabo; Forghani, Bita; Ab-Kadir, Mohd Safuan B.; Saari, Nazamid

    2012-01-01

    l-glutamaic acid is the principal excitatory neurotransmitter in the brain and an important intermediate in metabolism. In the present study, lactic acid bacteria (218) were isolated from six different fermented foods as potent sources of glutamic acid producers. The presumptive bacteria were tested for their ability to synthesize glutamic acid. Out of the 35 strains showing this capability, strain MNZ was determined as the highest glutamic-acid producer. Identification tests including 16S rRNA gene sequencing and sugar assimilation ability identified the strain MNZ as Lactobacillus plantarum. The characteristics of this microorganism related to its glutamic acid-producing ability, growth rate, glucose consumption and pH profile were studied. Results revealed that glutamic acid was formed inside the cell and excreted into the extracellular medium. Glutamic acid production was found to be growth-associated and glucose significantly enhanced glutamic acid production (1.032 mmol/L) compared to other carbon sources. A concentration of 0.7% ammonium nitrate as a nitrogen source effectively enhanced glutamic acid production. To the best of our knowledge this is the first report of glutamic acid production by lactic acid bacteria. The results of this study can be further applied for developing functional foods enriched in glutamic acid and subsequently γ-amino butyric acid (GABA) as a bioactive compound. PMID:22754309

  4. A robust and cost-effective approach to sequence and analyze complete genomes of small RNA viruses

    USDA-ARS?s Scientific Manuscript database

    Background: Next-generation sequencing (NGS) allows ultra-deep sequencing of nucleic acids. The use of sequence-independent amplification of viral nucleic acids without utilization of target-specific primers provides advantages over traditional sequencing methods and allows detection of unsuspected ...

  5. BGL7 beta-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Ward, Michael

    2013-01-29

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl7, and the corresponding BGL7 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL7, recombinant BGL7 proteins and methods for producing the same.

  6. BGL6 .beta.-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Ward, Michael

    2012-10-02

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl6, and the corresponding BGL6 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL6, recombinant BGL6 proteins and methods for producing the same.

  7. BGL5 .beta.-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Goedegebuur, Frits; Ward, Michael; Yao, Jian

    2006-02-28

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl5, and the corresponding BGL5 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL5, recombinant BGL5 proteins and methods for producing the same.

  8. BGL5 .beta.-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel [Los Gatos, CA; Goedegebuur, Frits [Vlaardingen, NL; Ward, Michael [San Francisco, CA; Yao, Jian [Sunnyvale, CA

    2008-03-18

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl5, and the corresponding BGL5 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL5, recombinant BGL5 proteins and methods for producing the same.

  9. BGL3 beta-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Goedegebuur, Frits; Ward, Michael; Yao, Jian

    2007-09-25

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl3, and the corresponding BGL3 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL3, recombinant BGL3 proteins and methods for producing the same.

  10. BGL3 beta-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel [Los Gatos, CA; Goedegebuur, Frits [Vlaardingen, NL; Ward, Michael [San Francisco, CA; Yao, Jian [Sunnyvale, CA

    2008-04-01

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl3, and the corresponding BGL3 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL3, recombinant BGL3 proteins and methods for producing the same.

  11. BGL4 beta-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel [Los Gatos, CA; Goedegebuur, Frits [Vlaardingen, NL; Ward, Michael [San Francisco, CA; Yao, Jian [Sunnyvale, CA

    2011-12-06

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl4, and the corresponding BGL4 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL4, recombinant BGL4 proteins and methods for producing the same.

  12. BGL4 .beta.-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Goedegebuur, Frits; Ward, Michael; Yao, Jian

    2006-05-16

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl4, and the corresponding BGL4 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL4, recombinant BGL4 proteins and methods for producing the same.

  13. BGL3 beta-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel [Los Gatos, CA; Goedegebuur, Frits [Vlaardingen, NL; Ward, Michael [San Francisco, CA; Yao, Jian [Sunnyvale, CA

    2011-06-14

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl3, and the corresponding BGL3 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL3, recombinant BGL3 proteins and methods for producing the same.

  14. BGL6 beta-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel [Los Gatos, CA; Ward, Michael [San Francisco, CA

    2009-09-01

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl6, and the corresponding BGL6 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL6, recombinant BGL6 proteins and methods for producing the same.

  15. BGL3 beta-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Goedegebuur, Frits; Ward, Michael; Yao, Jian

    2012-10-30

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl3, and the corresponding BGL3 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL3, recombinant BGL3 proteins and methods for producing the same.

  16. BGL4 beta-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel [Los Gatos, CA; Goedegebuur, Frits [Vlaardingen, NL; Ward, Michael [San Francisco, CA; Yao, Jian [Sunnyvale, CA

    2008-01-22

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl4, and the corresponding BGL4 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL4, recombinant BGL4 proteins and methods for producing the same.

  17. BGL6 beta-glucosidase and nucleic acids encoding the same

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dunn-Coleman, Nigel; Ward, Michael

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl6, and the corresponding BGL6 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL6, recombinant BGL6 proteins and methods for producing the same.

  18. BGL6 beta-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Ward, Michael

    2014-03-04

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl6, and the corresponding BGL6 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL6, recombinant BGL6 proteins and methods for producing the same.

  19. BGL7 beta-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Ward, Michael

    2015-04-14

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl7, and the corresponding BGL7 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL7, recombinant BGL7 proteins and methods for producing the same.

  20. BGL7 beta-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Ward, Michael

    2014-03-25

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl7, and the corresponding BGL7 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL7, recombinant BGL7 proteins and methods for producing the same.

  1. BGL6 beta-glucosidase and nucleic acids encoding the same

    DOEpatents

    Dunn-Coleman, Nigel; Ward, Michael

    2015-08-11

    The present invention provides a novel .beta.-glucosidase nucleic acid sequence, designated bgl6, and the corresponding BGL6 amino acid sequence. The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding BGL6, recombinant BGL6 proteins and methods for producing the same.

  2. Striking similarities in amino acid sequence among nonstructural proteins encoded by RNA viruses that have dissimilar genomic organization.

    PubMed Central

    Haseloff, J; Goelet, P; Zimmern, D; Ahlquist, P; Dasgupta, R; Kaesberg, P

    1984-01-01

    The plant viruses alfalfa mosaic virus (AMV) and brome mosaic virus (BMV) each divide their genetic information among three RNAs while tobacco mosaic virus (TMV) contains a single genomic RNA. Amino acid sequence comparisons suggest that the single proteins encoded by AMV RNA 1 and BMV RNA 1 and by AMV RNA 2 and BMV RNA 2 are related to the NH2-terminal two-thirds and the COOH-terminal one-third, respectively, of the largest protein encoded by TMV. Separating these two domains in the TMV RNA sequence is an amber termination codon, whose partial suppression allows translation of the downstream domain. Many of the residues that the TMV read-through domain and the segmented plant viruses have in common are also conserved in a read-through domain found in the nonstructural polyprotein of the animal alphaviruses Sindbis and Middelburg. We suggest that, despite substantial differences in gene organization and expression, all of these viruses use related proteins for common functions in RNA replication. Reassortment of functional modules of coding and regulatory sequence from preexisting viral or cellular sources, perhaps via RNA recombination, may be an important mechanism in RNA virus evolution. PMID:6611550

  3. Neutrality and evolvability of designed protein sequences

    NASA Astrophysics Data System (ADS)

    Bhattacherjee, Arnab; Biswas, Parbati

    2010-07-01

    The effect of foldability on protein’s evolvability is analyzed by a two-prong approach consisting of a self-consistent mean-field theory and Monte Carlo simulations. Theory and simulation models representing protein sequences with binary patterning of amino acid residues compatible with a particular foldability criteria are used. This generalized foldability criterion is derived using the high temperature cumulant expansion approximating the free energy of folding. The effect of cumulative point mutations on these designed proteins is studied under neutral condition. The robustness, protein’s ability to tolerate random point mutations is determined with a selective pressure of stability (ΔΔG) for the theory designed sequences, which are found to be more robust than that of Monte Carlo and mean-field-biased Monte Carlo generated sequences. The results show that this foldability criterion selects viable protein sequences more effectively compared to the Monte Carlo method, which has a marked effect on how the selective pressure shapes the evolutionary sequence space. These observations may impact de novo sequence design and its applications in protein engineering.

  4. Characterization of relative abundance of lactic acid bacteria species in French organic sourdough by cultural, qPCR and MiSeq high-throughput sequencing methods.

    PubMed

    Michel, Elisa; Monfort, Clarisse; Deffrasnes, Marion; Guezenec, Stéphane; Lhomme, Emilie; Barret, Matthieu; Sicard, Delphine; Dousset, Xavier; Onno, Bernard

    2016-12-19

    In order to contribute to the description of sourdough LAB composition, MiSeq sequencing and qPCR methods were performed in association with cultural methods. A panel of 16 French organic bakers and farmer-bakers were selected for this work. The lactic acid bacteria (LAB) diversity of their organic sourdoughs was investigated quantitatively and qualitatively combining (i) Lactobacillus sanfranciscensis-specific qPCR, (ii) global sequencing with MiSeq Illumina technology and (iii) molecular isolates identification. In addition, LAB and yeast enumeration, pH, Total Titratable Acidity, organic acids and bread specific volume were analyzed. Microbial and physico-chemical data were statistically treated by Principal Component Analysis (PCA) and Hierarchical Ascendant Classification (HAC). Total yeast counts were 6 log 10 to 7.6 log 10 CFU/g while LAB counts varied from 7.2 log 10 to 9.6 log 10 CFU/g. Values obtained by L. sanfranciscensis-specific qPCR were estimated between 7.2 and 10.3 log 10 CFU/g, except for one sample at 4.4 log 10 CFU/g. HAC and PCA clustered the sixteen sourdoughs into three classes described by their variables but without links to bakers' practices. L. sanfranciscensis was the dominant species in 13 of the 16 sourdoughs analyzed by Next Generation Sequencing (NGS), by the culture dependent method this species was dominant only in only 10 samples. Based on isolates identification, LAB diversity was higher for 7 sourdoughs with the recovery of L. curvatus, L. brevis, L. heilongjiangensis, L. xiangfangensis, L. koreensis, L. pontis, Weissella sp. and Pediococcus pentosaceus, as the most representative species. L. koreensis, L. heilongjiangensis and L. xiangfangensis were identified in traditional Asian food and here for the first time as dominant in organic sourdough. This study highlighted that L. sanfranciscensis was not the major species in 6/16 sourdough samples and that a relatively high LAB diversity can be observed in French organic

  5. Genotyping-by-Sequencing-Based Investigation of the Genetic Architecture Responsible for a ∼Sevenfold Increase in Soybean Seed Stearic Acid.

    PubMed

    Heim, Crystal B; Gillman, Jason D

    2017-01-05

    Soybean oil is highly unsaturated but oxidatively unstable, rendering it nonideal for food applications. Until recently, the majority of soybean oil underwent partial chemical hydrogenation, which produces trans fats as an unavoidable consequence. Dietary intake of trans fats and most saturated fats are conclusively linked to negative impacts on cholesterol levels and cardiovascular health. Two major soybean oil breeding targets are: (1) to reduce or eliminate the need for chemical hydrogenation, and (2) to replace the functional properties of partially hydrogenated soybean oil. One potential solution is the elevation of seed stearic acid, a saturated fat which has no negative impacts on cardiovascular health, from 3 to 4% in typical cultivars to > 20% of the seed oil. We performed QTL analysis of a population developed by crossing two mutant lines, one with a missense mutation affecting a stearoyl-acyl-carrier protein desaturase gene resulting in ∼11% seed stearic acid crossed to another mutant, A6, which has 24-28% seed stearic acid. Genotyping-by-sequencing (GBS)-based QTL mapping identified 21 minor and major effect QTL for six seed oil related traits and plant height. The inheritance of a large genomic deletion affecting chromosome 14 is the basis for largest effect QTL, resulting in ∼18% seed stearic acid. This deletion contains SACPD-C and another gene(s); loss of both genes boosts seed stearic acid levels to ≥ 18%. Unfortunately, this genomic deletion has been shown in previous studies to be inextricably correlated with reduced seed yield. Our results will help inform and guide ongoing breeding efforts to improve soybean oil oxidative stability. Copyright © 2017 Heim and Gillman.

  6. MIPS: a database for genomes and protein sequences.

    PubMed

    Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).

  7. Identification of a nuclear localization sequence in the polyomavirus capsid protein VP2

    NASA Technical Reports Server (NTRS)

    Chang, D.; Haynes, J. I. 2nd; Brady, J. N.; Consigli, R. A.; Spooner, B. S. (Principal Investigator)

    1992-01-01

    A nuclear localization signal (NLS) has been identified in the C-terminal (Glu307-Glu-Asp-Gly-Pro-Gln-Lys-Lys-Lys-Arg-Arg-Leu318) amino acid sequence of the polyomavirus minor capsid protein VP2. The importance of this amino acid sequence for nuclear transport of newly synthesized VP2 was demonstrated by a genetic "subtractive" study using the constructs pSG5VP2 (expressing full-length VP2) and pSG5 delta 3VP2 (expressing truncated VP2, lacking amino acids Glu307-Leu318). These constructs were transfected into COS-7 cells, and the intracellular localization of the VP2 protein was determined by indirect immunofluorescence. These studies revealed that the full-length VP2 was localized in the nucleus, while the truncated VP2 protein was localized in the cytoplasm and not transported to the nucleus. A biochemical "additive" approach was also used to determine whether this sequence could target nonnuclear proteins to the nucleus. A synthetic peptide identical to VP2 amino acids Glu307-Leu318 was cross-linked to the nonnuclear proteins bovine serum albumin (BSA) or immunoglobulin G (IgG). The conjugates were then labeled with fluorescein isothiocyanate and microinjected into the cytoplasm of NIH 3T6 cells. Both conjugates localized in the nucleus of the microinjected cells, whereas unconjugated BSA and IgG remained in the cytoplasm. Taken together, these genetic subtractive and biochemical additive approaches have identified the C-terminal sequence of polyoma-virus VP2 (containing amino acids Glu307-Leu318) as the NLS of this protein.

  8. Detection of nucleic acids by multiple sequential invasive cleavages

    DOEpatents

    Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann D.

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of human cytomegalovirus nucleic acid in a sample.

  9. Detection of nucleic acids by multiple sequential invasive cleavages

    DOEpatents

    Hall, Jeff G; Lyamichev, Victor I; Mast, Andrea L; Brow, Mary Ann D

    2012-10-16

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of human cytomegalovirus nucleic acid in a sample.

  10. Draft Genome Sequence, and a Sequence-Defined Genetic Linkage Map of the Legume Crop Species Lupinus angustifolius L

    PubMed Central

    Zheng, Zequn; Zhang, Qisen; Zhou, Gaofeng; Sweetingham, Mark W.; Howieson, John G.; Li, Chengdao

    2013-01-01

    Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species. PMID:23734219

  11. Draft genome sequence, and a sequence-defined genetic linkage map of the legume crop species Lupinus angustifolius L.

    PubMed

    Yang, Huaan; Tao, Ye; Zheng, Zequn; Zhang, Qisen; Zhou, Gaofeng; Sweetingham, Mark W; Howieson, John G; Li, Chengdao

    2013-01-01

    Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species.

  12. Pseudomonas sp. strain CA5 (a selenite-reducing bacterium) 16S rRNA gene complete sequence. National Institute of Health, National Center for Biotechnology Information, GenBank sequence. Accession FJ422810.1.

    USDA-ARS?s Scientific Manuscript database

    This study used 1321 base pair 16S rRNA gene sequence methods to confirm the phylogenetic position of a soil isolate as a bacterium belonging to the genus Pesudomonas sp. Morphological, biochemical characteristics, and fatty acid profiles are consistent with the 16S rRNA gene sequence identification...

  13. Asparagine-linked oligosaccharides present on a non-consensus amino acid sequence in the CH1 domain of human antibodies.

    PubMed

    Valliere-Douglass, John F; Kodama, Paul; Mujacic, Mirna; Brady, Lowell J; Wang, Wes; Wallace, Alison; Yan, Boxu; Reddy, Pranhitha; Treuheit, Michael J; Balland, Alain

    2009-11-20

    We report that N-linked oligosaccharide structures can be present on an asparagine residue not adhering to the consensus site motif NX(S/T), where X is not proline, described in the literature. We have observed oligosaccharides on a non-consensus asparaginyl residue in the C(H)1 constant domain of IgG1 and IgG2 antibodies. The initial findings were obtained from characterization of charge variant populations evident in a recombinant human antibody of the IgG2 subclass. HPLC-MS results indicated that cation-exchange chromatography acidic variant populations were enriched in antibody with a second glycosylation site, in addition to the well documented canonical glycosylation site located in the C(H)2 domain. Subsequent tryptic and chymotryptic peptide map data indicated that the second glycosylation site was associated with the amino acid sequence TVSWN(162)SGAL in the C(H)1 domain of the antibody. This highly atypical modification is present at levels of 0.5-2.0% on most of the recombinant antibodies that have been tested and has also been observed in IgG1 antibodies derived from human donors. Site-directed mutagenesis of the C(H)1 domain sequence in a recombinant-human IgG1 antibody resulted in an increase in non-consensus glycosylation to 3.15%, a greater than 4-fold increase over the level observed in the wild type, by changing the -1 and +1 amino acids relative to the asparagine residue at position 162. We believe that further understanding of the phenomenon of non-consensus glycosylation can be used to gain fundamental insights into the fidelity of the cellular glycosylation machinery.

  14. Human jagged polypeptide, encoding nucleic acids and methods of use

    DOEpatents

    Li, Linheng; Hood, Leroy

    2000-01-01

    The present invention provides an isolated polypeptide exhibiting substantially the same amino acid sequence as JAGGED, or an active fragment thereof, provided that the polypeptide does not have the amino acid sequence of SEQ ID NO:5 or SEQ ID NO:6. The invention further provides an isolated nucleic acid molecule containing a nucleotide sequence encoding substantially the same amino acid sequence as JAGGED, or an active fragment thereof, provided that the nucleotide sequence does not encode the amino acid sequence of SEQ ID NO:5 or SEQ ID NO:6. Also provided herein is a method of inhibiting differentiation of hematopoietic progenitor cells by contacting the progenitor cells with an isolated JAGGED polypeptide, or active fragment thereof. The invention additionally provides a method of diagnosing Alagille Syndrome in an individual. The method consists of detecting an Alagille Syndrome disease-associated mutation linked to a JAGGED locus.

  15. Sequence Analysis and Domain Motifs in the Porcine Skin Decorin Glycosaminoglycan Chain*

    PubMed Central

    Zhao, Xue; Yang, Bo; Solakylidirim, Kemal; Joo, Eun Ji; Toida, Toshihiko; Higashi, Kyohei; Linhardt, Robert J.; Li, Lingyun

    2013-01-01

    Decorin proteoglycan is comprised of a core protein containing a single O-linked dermatan sulfate/chondroitin sulfate glycosaminoglycan (GAG) chain. Although the sequence of the decorin core protein is determined by the gene encoding its structure, the structure of its GAG chain is determined in the Golgi. The recent application of modern MS to bikunin, a far simpler chondroitin sulfate proteoglycans, suggests that it has a single or small number of defined sequences. On this basis, a similar approach to sequence the decorin of porcine skin much larger and more structurally complex dermatan sulfate/chondroitin sulfate GAG chain was undertaken. This approach resulted in information on the consistency/variability of its linkage region at the reducing end of the GAG chain, its iduronic acid-rich domain, glucuronic acid-rich domain, and non-reducing end. A general motif for the porcine skin decorin GAG chain was established. A single small decorin GAG chain was sequenced using MS/MS analysis. The data obtained in the study suggest that the decorin GAG chain has a small or a limited number of sequences. PMID:23423381

  16. Whole-Exome Sequencing in a South American Cohort Links ALDH1A3, FOXN1 and Retinoic Acid Regulation Pathways to Autism Spectrum Disorders.

    PubMed

    Moreno-Ramos, Oscar A; Olivares, Ana María; Haider, Neena B; de Autismo, Liga Colombiana; Lattig, María Claudia

    2015-01-01

    Autism spectrum disorders (ASDs) are a range of complex neurodevelopmental conditions principally characterized by dysfunctions linked to mental development. Previous studies have shown that there are more than 1000 genes likely involved in ASD, expressed mainly in brain and highly interconnected among them. We applied whole exome sequencing in Colombian-South American trios. Two missense novel SNVs were found in the same child: ALDH1A3 (RefSeq NM_000693: c.1514T>C (p.I505T)) and FOXN1 (RefSeq NM_003593: c.146C>T (p.S49L)). Gene expression studies reveal that Aldh1a3 and Foxn1 are expressed in ~E13.5 mouse embryonic brain, as well as in adult piriform cortex (PC; ~P30). Conserved Retinoic Acid Response Elements (RAREs) upstream of human ALDH1A3 and FOXN1 and in mouse Aldh1a3 and Foxn1 genes were revealed using bioinformatic approximation. Chromatin immunoprecipitation (ChIP) assay using Retinoid Acid Receptor B (Rarb) as the immunoprecipitation target suggests RA regulation of Aldh1a3 and Foxn1 in mice. Our results frame a possible link of RA regulation in brain to ASD etiology, and a feasible non-additive effect of two apparently unrelated variants in ALDH1A3 and FOXN1 recognizing that every result given by next generation sequencing should be cautiously analyzed, as it might be an incidental finding.

  17. Whole-Exome Sequencing in a South American Cohort Links ALDH1A3, FOXN1 and Retinoic Acid Regulation Pathways to Autism Spectrum Disorders

    PubMed Central

    Moreno-Ramos, Oscar A.; Olivares, Ana María; Haider, Neena B.; de Autismo, Liga Colombiana; Lattig, María Claudia

    2015-01-01

    Autism spectrum disorders (ASDs) are a range of complex neurodevelopmental conditions principally characterized by dysfunctions linked to mental development. Previous studies have shown that there are more than 1000 genes likely involved in ASD, expressed mainly in brain and highly interconnected among them. We applied whole exome sequencing in Colombian—South American trios. Two missense novel SNVs were found in the same child: ALDH1A3 (RefSeq NM_000693: c.1514T>C (p.I505T)) and FOXN1 (RefSeq NM_003593: c.146C>T (p.S49L)). Gene expression studies reveal that Aldh1a3 and Foxn1 are expressed in ~E13.5 mouse embryonic brain, as well as in adult piriform cortex (PC; ~P30). Conserved Retinoic Acid Response Elements (RAREs) upstream of human ALDH1A3 and FOXN1 and in mouse Aldh1a3 and Foxn1 genes were revealed using bioinformatic approximation. Chromatin immunoprecipitation (ChIP) assay using Retinoid Acid Receptor B (Rarb) as the immunoprecipitation target suggests RA regulation of Aldh1a3 and Foxn1 in mice. Our results frame a possible link of RA regulation in brain to ASD etiology, and a feasible non-additive effect of two apparently unrelated variants in ALDH1A3 and FOXN1 recognizing that every result given by next generation sequencing should be cautiously analyzed, as it might be an incidental finding. PMID:26352270

  18. Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

    PubMed Central

    Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

    2010-01-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087

  19. Terminal region sequence variations in variola virus DNA.

    PubMed

    Massung, R F; Loparev, V N; Knight, J C; Totmenin, A V; Chizhikov, V E; Parsons, J M; Safronov, P F; Gutorov, V V; Shchelkunov, S N; Esposito, J J

    1996-07-15

    Genome DNA terminal region sequences were determined for a Brazilian alastrim variola minor virus strain Garcia-1966 that was associated with an 0.8% case-fatality rate and African smallpox strains Congo-1970 and Somalia-1977 associated with variola major (9.6%) and minor (0.4%) mortality rates, respectively. A base sequence identity of > or = 98.8% was determined after aligning 30 kb of the left- or right-end region sequences with cognate sequences previously determined for Asian variola major strains India-1967 (31% death rate) and Bangladesh-1975 (18.5% death rate). The deduced amino acid sequences of putative proteins of > or = 65 amino acids also showed relatively high identity, although the Asian and African viruses were clearly more related to each other than to alastrim virus. Alastrim virus contained only 10 of 70 proteins that were 100% identical to homologs in Asian strains, and 7 alastrim-specific proteins were noted.

  20. WebLogo: A Sequence Logo Generator

    PubMed Central

    Crooks, Gavin E.; Hon, Gary; Chandonia, John-Marc; Brenner, Steven E.

    2004-01-01

    WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with additional features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization. PMID:15173120

  1. History of retinoic acid receptors.

    PubMed

    Benbrook, Doris M; Chambon, Pierre; Rochette-Egly, Cécile; Asson-Batres, Mary Ann

    2014-01-01

    The discovery of retinoic acid receptors arose from research into how vitamins are essential for life. Early studies indicated that Vitamin A was metabolized into an active factor, retinoic acid (RA), which regulates RNA and protein expression in cells. Each step forward in our understanding of retinoic acid in human health was accomplished by the development and application of new technologies. Development cDNA cloning techniques and discovery of nuclear receptors for steroid hormones provided the basis for identification of two classes of retinoic acid receptors, RARs and RXRs, each of which has three isoforms, α, β and ɣ. DNA manipulation and crystallographic studies revealed that the receptors contain discrete functional domains responsible for binding to DNA, ligands and cofactors. Ligand binding was shown to induce conformational changes in the receptors that cause release of corepressors and recruitment of coactivators to create functional complexes that are bound to consensus promoter DNA sequences called retinoic acid response elements (RAREs) and that cause opening of chromatin and transcription of adjacent genes. Homologous recombination technology allowed the development of mice lacking expression of retinoic acid receptors, individually or in various combinations, which demonstrated that the receptors exhibit vital, but redundant, functions in fetal development and in vision, reproduction, and other functions required for maintenance of adult life. More recent advancements in sequencing and proteomic technologies reveal the complexity of retinoic acid receptor involvement in cellular function through regulation of gene expression and kinase activity. Future directions will require systems biology approaches to decipher how these integrated networks affect human stem cells, health, and disease.

  2. Cloning and sequence analysis of Hemonchus contortus HC58cDNA.

    PubMed

    Muleke, Charles I; Ruofeng, Yan; Lixin, Xu; Xinwen, Bo; Xiangrui, Li

    2007-06-01

    The complete coding sequence of Hemonchus contortus HC58cDNA was generated by rapid amplification of cDNA ends and polymerase chain reaction using primers based on the 5' and 3' ends of the parasite mRNA, accession no. AF305964. The HC58cDNA gene was 851 bp long, with open reading frame of 717 bp, precursors to 239 amino acids coding for approximately 27 kDa protein. Analysis of amino acid sequence revealed conserved residues of cysteine, histidine, asparagine, occluding loop pattern, hemoglobinase motif and glutamine of the oxyanion hole characteristic of cathepsin B like proteases (CBL). Comparison of the predicted amino acid sequences showed the protein shared 33.5-58.7% identity to cathepsin B homologues in the papain clan CA family (family C1). Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the CBL, suggesting that HC58cDNA was a member of the papain family.

  3. Sequence determination and analysis of the NSs genes of two tospoviruses.

    PubMed

    Hallwass, Mariana; Leastro, Mikhail O; Lima, Mirtes F; Inoue-Nagata, Alice K; Resende, Renato O

    2012-03-01

    The tospoviruses groundnut ringspot virus (GRSV) and zucchini lethal chlorosis virus (ZLCV) cause severe losses in many crops, especially in solanaceous and cucurbit species. In this study, the non-structural NSs gene and the 5'UTRs of these two biologically distinct tospoviruses were cloned and sequenced. The NSs sequence of GRSV and ZLCV were both 1,404 nucleotides long. Pairwise comparison showed that the NSs amino acid sequence of GRSV shared 69.6% identity with that of ZLCV and 75.9% identity with that of TSWV, while the NSs sequence of ZLCV and TSWV shared 67.9% identity. Phylogenetic analysis based on NSs sequences confirmed that these viruses cluster in the American clade.

  4. Identification, characterization, and utilization of genome-wide simple sequence repeats to identify a QTL for acidity in apple

    PubMed Central

    2012-01-01

    Background Apple is an economically important fruit crop worldwide. Developing a genetic linkage map is a critical step towards mapping and cloning of genes responsible for important horticultural traits in apple. To facilitate linkage map construction, we surveyed and characterized the distribution and frequency of perfect microsatellites in assembled contig sequences of the apple genome. Results A total of 28,538 SSRs have been identified in the apple genome, with an overall density of 40.8 SSRs per Mb. Di-nucleotide repeats are the most frequent microsatellites in the apple genome, accounting for 71.9% of all microsatellites. AT/TA repeats are the most frequent in genomic regions, accounting for 38.3% of all the G-SSRs, while AG/GA dimers prevail in transcribed sequences, and account for 59.4% of all EST-SSRs. A total set of 310 SSRs is selected to amplify eight apple genotypes. Of these, 245 (79.0%) are found to be polymorphic among cultivars and wild species tested. AG/GA motifs in genomic regions have detected more alleles and higher PIC values than AT/TA or AC/CA motifs. Moreover, AG/GA repeats are more variable than any other dimers in apple, and should be preferentially selected for studies, such as genetic diversity and linkage map construction. A total of 54 newly developed apple SSRs have been genetically mapped. Interestingly, clustering of markers with distorted segregation is observed on linkage groups 1, 2, 10, 15, and 16. A QTL responsible for malic acid content of apple fruits is detected on linkage group 8, and accounts for ~13.5% of the observed phenotypic variation. Conclusions This study demonstrates that di-nucleotide repeats are prevalent in the apple genome and that AT/TA and AG/GA repeats are the most frequent in genomic and transcribed sequences of apple, respectively. All SSR motifs identified in this study as well as those newly mapped SSRs will serve as valuable resources for pursuing apple genetic studies, aiding the apple breeding

  5. Comparative studies of rat recombinant purple acid phosphatase and bone tartrate-resistant acid phosphatase.

    PubMed Central

    Ek-Rylander, B; Barkhem, T; Ljusberg, J; Ohman, L; Andersson, K K; Andersson, G

    1997-01-01

    The tartrate-resistant acid phosphatase (TRAP) of rat osteoclasts has been shown to exhibit high (85-94%) identity at the amino acid sequence level with the purple acid phosphatase (PAP) from bovine spleen and with pig uteroferrin. These iron-containing purple enzymes contain a binuclear iron centre, with a tyrosinate-to-Fe(III) charge-transfer transition responsible for the purple colour. In the present study, production of rat osteoclast TRAP could be achieved at a level of 4.3 mg/litre of medium using a baculovirus expression system. The enzyme was purified to apparent homogeneity using a combination of cation-exchange, hydrophobic-interaction, lectin-affinity and gel-permeation chromatography steps. The protein as isolated had a purple colour, a specific activity of 428 units/mg of protein and consisted of the single-chain form of molecular mass 34 kDa, with only trace amounts of proteolytically derived subunits. The recombinant enzyme had the ability to dephosphorylate bone matrix phosphoproteins, as previously shown for bone TRAP. Light absorption spectroscopy of the isolated purple enzyme showed a lambda max at 544 nm, which upon reduction with ascorbic acid changed to 515 nm, concomitant with the transition to a pink colour. EPR spectroscopic analysis of the reduced enzyme at 3.6 K revealed a typical mu-hydr(oxo)-bridged mixed-valent Fe(II)Fe(III) signal with g-values at 1.96, 1.74 and 1.60, proving that recombinant rat TRAP belongs to the family of PAPs. To validate the use of recombinant PAP in substituting for the rat bone counterpart in functional studies, various comparative studies were carried out. The enzyme isolated from bone exhibited a lower K(m) for p-nitrophenyl phosphate and was slightly more sensitive to PAP inhibitors such as molybdate, tungstate, arsenate and phosphate. In contrast with the recombinant enzyme, TRAP from bone was isolated predominantly as the proteolytically cleaved, two-subunit, form. Both the recombinant enzyme and rat

  6. Methods for making nucleotide probes for sequencing and synthesis

    DOEpatents

    Church, George M; Zhang, Kun; Chou, Joseph

    2014-07-08

    Compositions and methods for making a plurality of probes for analyzing a plurality of nucleic acid samples are provided. Compositions and methods for analyzing a plurality of nucleic acid samples to obtain sequence information in each nucleic acid sample are also provided.

  7. Replica amplification of nucleic acid arrays

    DOEpatents

    Church, George M.; Mitra, Robi D.

    2010-08-31

    Disclosed are improved methods of making and using immobilized arrays of nucleic acids, particularly methods for producing replicas of such arrays. Included are methods for producing high density arrays of nucleic acids and replicas of such arrays, as well as methods for preserving the resolution of arrays through rounds of replication. Also included are methods which take advantage of the availability of replicas of arrays for increased sensitivity in detection of sequences on arrays. Improved methods of sequencing nucleic acids immobilized on arrays utilizing single copies of arrays and methods taking further advantage of the availability of replicas of arrays are disclosed. The improvements lead to higher fidelity and longer read lengths of sequences immobilized on arrays. Methods are also disclosed which improve the efficiency of multiplex PCR using arrays of immobilized nucleic acids.

  8. Sequence-based screening for self-sufficient P450 monooxygenase from a metagenome library.

    PubMed

    Kim, B S; Kim, S Y; Park, J; Park, W; Hwang, K Y; Yoon, Y J; Oh, W K; Kim, B Y; Ahn, J S

    2007-05-01

    Cytochrome P450 monooxygenases (CYPs) are useful catalysts for oxidation reactions. Self-sufficient CYPs harbour a reductive domain covalently connected to a P450 domain and are known for their robust catalytic activity with great potential as biocatalysts. In an effort to expand genetic sources of self-sufficient CYPs, we devised a sequence-based screening system to identify them in a soil metagenome. We constructed a soil metagenome library and performed sequence-based screening for self-sufficient CYP genes. A new CYP gene, syk181, was identified from the metagenome library. Phylogenetic analysis revealed that SYK181 formed a distinct phylogenic line with 46% amino-acid-sequence identity to CYP102A1 which has been extensively studied as a fatty acid hydroxylase. The heterologously expressed SYK181 showed significant hydroxylase activity towards naphthalene and phenanthrene as well as towards fatty acids. Sequence-based screening of metagenome libraries is expected to be a useful approach for searching self-sufficient CYP genes. The translated product of syk181 shows self-sufficient hydroxylase activity towards fatty acids and aromatic compounds. SYK181 is the first self-sufficient CYP obtained directly from a metagenome library. The genetic and biochemical information on SYK181 are expected to be helpful for engineering self-sufficient CYPs with broader catalytic activities towards various substrates, which would be useful for bioconversion of natural products and biodegradation of organic chemicals.

  9. Deep sequencing of the Mexican avocado transcriptome, an ancient angiosperm with a high content of fatty acids.

    PubMed

    Ibarra-Laclette, Enrique; Méndez-Bravo, Alfonso; Pérez-Torres, Claudia Anahí; Albert, Victor A; Mockaitis, Keithanne; Kilaru, Aruna; López-Gómez, Rodolfo; Cervantes-Luevano, Jacob Israel; Herrera-Estrella, Luis

    2015-08-13

    Avocado (Persea americana) is an economically important tropical fruit considered to be a good source of fatty acids. Despite its importance, the molecular and cellular characterization of biochemical and developmental processes in avocado is limited due to the lack of transcriptome and genomic information. The transcriptomes of seeds, roots, stems, leaves, aerial buds and flowers were determined using different sequencing platforms. Additionally, the transcriptomes of three different stages of fruit ripening (pre-climacteric, climacteric and post-climacteric) were also analyzed. The analysis of the RNAseqatlas presented here reveals strong differences in gene expression patterns between different organs, especially between root and flower, but also reveals similarities among the gene expression patterns in other organs, such as stem, leaves and aerial buds (vegetative organs) or seed and fruit (storage organs). Important regulators, functional categories, and differentially expressed genes involved in avocado fruit ripening were identified. Additionally, to demonstrate the utility of the avocado gene expression atlas, we investigated the expression patterns of genes implicated in fatty acid metabolism and fruit ripening. A description of transcriptomic changes occurring during fruit ripening was obtained in Mexican avocado, contributing to a dynamic view of the expression patterns of genes involved in fatty acid biosynthesis and the fruit ripening process.

  10. Mutant fatty acid desaturase

    DOEpatents

    Shanklin, John; Cahoon, Edgar B.

    2004-02-03

    The present invention relates to a method for producing mutants of a fatty acid desaturase having a substantially increased activity towards fatty acid substrates with chains containing fewer than 18 carbons relative to an unmutagenized precursor desaturase having an 18 carbon atom chain length substrate specificity. The method involves inducing one or more mutations in the nucleic acid sequence encoding the precursor desaturase, transforming the mutated sequence into an unsaturated fatty acid auxotroph cell such as MH13 E. coli, culturing the cells in the absence of supplemental unsaturated fatty acids, thereby selecting for recipient cells which have received and which express a mutant fatty acid desaturase with an elevated specificity for fatty acid substrates having chain lengths of less than 18 carbon atoms. A variety of mutants having 16 or fewer carbon atom chain length substrate specificities are produced by this method. Mutant desaturases produced by this method can be introduced via expression vectors into prokaryotic and eukaryotic cells and can also be used in the production of transgenic plants which may be used to produce specific fatty acid products.

  11. Acidity and complex formation studies of 3-(adenine-9-yl)-propionic and 3-(thymine-1-yl)-propionic acids in ethanol-water media

    NASA Astrophysics Data System (ADS)

    Hammud, Hassan H.; El Shazly, Shawky; Sonji, Ghassan; Sonji, Nada; Bouhadir, Kamal H.

    2015-05-01

    The ligands 3-(adenine-9-yl)propionic acid (AA) and 3-(thymine-1-yl)propionic acid (TA) were prepared by N9-alkylation of adenine and N1-alkylation of thymine with ethylacrylate in presence of a base catalyst, followed by acid hydrolysis of the formed ethyl esters to give the corresponding propionic acid derivatives. The products were characterized by spectral methods (FTIR, 1H NMR and 13C NMR), which confirm their structures. The dissociation constants of ligands, were potentiometrically determined in 0.3 M KCl at 20-50 °C temperature range. The work was extended to study complexation behavior of AA and TA with various biologically important divalent metal ions (Co2+, Ni2+, Cu2+, Zn2+, Cd2+, Mn2+ and Pb2+) in 50% v/v water-ethanol medium at four different temperatures, keeping ionic strength constant (0.3 M KCl). The order of the stability constants of the formed complexes decreases in the sequence Cu2+ > Pb2+ > Zn2+ > Ni2+ > Co2+ > Mn2+ > Cd2+ for both ligands. The effect of temperature was also studied and the corresponding thermodynamic functions (ΔG, ΔH, ΔS) were derived and discussed. The formation of metal complexes has been found to be spontaneous, and the stability constants were dependant markedly on the basicity of the ligands.

  12. Molecular Simulations of Sequence-Specific Association of Transmembrane Proteins in Lipid Bilayers

    NASA Astrophysics Data System (ADS)

    Doxastakis, Manolis; Prakash, Anupam; Janosi, Lorant

    2011-03-01

    Association of membrane proteins is central in material and information flow across the cellular membranes. Amino-acid sequence and the membrane environment are two critical factors controlling association, however, quantitative knowledge on such contributions is limited. In this work, we study the dimerization of helices in lipid bilayers using extensive parallel Monte Carlo simulations with recently developed algorithms. The dimerization of Glycophorin A is examined employing a coarse-grain model that retains a level of amino-acid specificity, in three different phospholipid bilayers. Association is driven by a balance of protein-protein and lipid-induced interactions with the latter playing a major role at short separations. Following a different approach, the effect of amino-acid sequence is studied using the four transmembrane domains of the epidermal growth factor receptor family in identical lipid environments. Detailed characterization of dimer formation and estimates of the free energy of association reveal that these helices present significant affinity to self-associate with certain dimers forming non-specific interfaces.

  13. A comparison of anaerobic 2, 4-dichlorophenoxy acetic acid degradation in single-fed and sequencing batch reactor systems

    NASA Astrophysics Data System (ADS)

    Elefsiniotis, P.; Wareham, D. G.; Fongsatitukul, P.

    2017-08-01

    This paper compares the practical limits of 2, 4-dichlorophenoxy acetic acid (2,4-D) degradation that can be obtained in two laboratory-scale anaerobic digestion systems; namely, a sequencing batch reactor (SBR) and a single-fed batch reactor (SFBR) system. The comparison involved synthesizing a decade of research conducted by the lead author and drawing summative conclusions about the ability of each system to accommodate industrial-strength concentrations of 2,4-D. In the main, 2 L liquid volume anaerobic SBRs were used with glucose as a supplemental carbon source for both acid-phase and two-phase conditions. Volatile fatty acids however were used as a supplemental carbon source for the methanogenic SBRs. The anaerobic SBRs were operated at an hydraulic retention time of 48 hours, while being subjected to increasing concentrations of 2,4-D. The SBRs were able to degrade between 130 and 180 mg/L of 2,4-D depending upon whether they were operated in the acid-phase or two-phase regime. The methanogenic-only phase did not achieve 2,4-D degradation however this was primarily attributed to difficulties with obtaining a sufficiently long SRT. For the two-phase SFBR system, 3.5 L liquid-volume digesters were used and no difficulty was experienced with degrading 100 % of the 2,4-D concentration applied (300 mg/L).

  14. MIPS: a database for genomes and protein sequences

    PubMed Central

    Mewes, H. W.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Mayer, K.; Mokrejs, M.; Morgenstern, B.; Münsterkötter, M.; Rudd, S.; Weil, B.

    2002-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz–Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91–93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155–158; Barker et al. (2001) Nucleic Acids Res., 29, 29–32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de). PMID:11752246

  15. Chirality- and sequence-selective successive self-sorting via specific homo- and complementary-duplex formations

    PubMed Central

    Makiguchi, Wataru; Tanabe, Junki; Yamada, Hidekazu; Iida, Hiroki; Taura, Daisuke; Ousaka, Naoki; Yashima, Eiji

    2015-01-01

    Self-recognition and self-discrimination within complex mixtures are of fundamental importance in biological systems, which entirely rely on the preprogrammed monomer sequences and homochirality of biological macromolecules. Here we report artificial chirality- and sequence-selective successive self-sorting of chiral dimeric strands bearing carboxylic acid or amidine groups joined by chiral amide linkers with different sequences through homo- and complementary-duplex formations. A mixture of carboxylic acid dimers linked by racemic-1,2-cyclohexane bis-amides with different amide sequences (NHCO or CONH) self-associate to form homoduplexes in a completely sequence-selective way, the structures of which are different from each other depending on the linker amide sequences. The further addition of an enantiopure amide-linked amidine dimer to a mixture of the racemic carboxylic acid dimers resulted in the formation of a single optically pure complementary duplex with a 100% diastereoselectivity and complete sequence specificity stabilized by the amidinium–carboxylate salt bridges, leading to the perfect chirality- and sequence-selective duplex formation. PMID:26051291

  16. Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution.

    PubMed

    Mao, Wenzhi; Kaya, Cihan; Dutta, Anindita; Horovitz, Amnon; Bahar, Ivet

    2015-06-15

    With rapid accumulation of sequence data on several species, extracting rational and systematic information from multiple sequence alignments (MSAs) is becoming increasingly important. Currently, there is a plethora of computational methods for investigating coupled evolutionary changes in pairs of positions along the amino acid sequence, and making inferences on structure and function. Yet, the significance of coevolution signals remains to be established. Also, a large number of false positives (FPs) arise from insufficient MSA size, phylogenetic background and indirect couplings. Here, a set of 16 pairs of non-interacting proteins is thoroughly examined to assess the effectiveness and limitations of different methods. The analysis shows that recent computationally expensive methods designed to remove biases from indirect couplings outperform others in detecting tertiary structural contacts as well as eliminating intermolecular FPs; whereas traditional methods such as mutual information benefit from refinements such as shuffling, while being highly efficient. Computations repeated with 2,330 pairs of protein families from the Negatome database corroborated these results. Finally, using a training dataset of 162 families of proteins, we propose a combined method that outperforms existing individual methods. Overall, the study provides simple guidelines towards the choice of suitable methods and strategies based on available MSA size and computing resources. Software is freely available through the Evol component of ProDy API. © The Author 2015. Published by Oxford University Press.

  17. Hydroxamic acids as weak base indicators: protonation in strong acid media.

    PubMed

    García, B; Ibeas, S; Hoyuelos, F J; Leal, J M; Secco, F; Venturini, M

    2001-11-30

    The protonation equilibria of N-phenylbenzohydroxamic, benzohydroxamic, salicylhydroxamic, and N-p-tolylcinnamohydroxamic acids have been studied at 25 degrees C in concentrated sulfuric, hydrochloric, and perchloric acid media; the UV-vis spectral measurements were analyzed using the Hammett equation and the Bunnett-Olsen and excess acidity methods. The medium effects observed in the UV spectral curves were corrected with the Cox-Yates and vector analysis methods. The H(A) acidity function based on benzamides provided the best results. The range of variation of the solvation coefficient m is similar to that of amides, this indicating similar solvation requirements for amides and hydroxamic acids. For the same substrate, the observed variations of pK(BH)(+) with the mineral acid used was justified by formation of solvent-separated ion pairs; for the same mineral acid, the observed changes in pK(BH)(+) can be explained by the solvation of BH(+). The change of the pK(BH)(+) values was in reasonably good agreement with the sequence of the catalytic efficiency of the mineral acids used, HCl > H(2)SO(4) > HClO(4).

  18. Lyophilisation and concentration of chitosan/siRNA polyplexes: Influence of buffer composition, oligonucleotide sequence, and hyaluronic acid coating.

    PubMed

    Veilleux, Daniel; Gopalakrishna Panicker, Rajesh Krishnan; Chevrier, Anik; Biniecki, Kristof; Lavertu, Marc; Buschmann, Michael D

    2018-02-15

    Chitosan (CS)/siRNA polyplexes have great therapeutic potential for treating multiple diseases by gene silencing. However, clinical application of this technology requires the development of concentrated, hemocompatible, pH neutral formulations for safe and efficient administration. In this study we evaluate physicochemical properties of chitosan polyplexes in various buffers at increasing ionic strengths, to identify conditions for freeze-drying and rehydration at higher doses of uncoated or hyaluronic acid (HA)-coated polyplexes while maintaining physiological compatibility. Optimized formulations are used to evaluate the impact of the siRNA/oligonucleotide sequence on polyplex physicochemical properties, and to measure their in vitro silencing efficiency, cytotoxicity, and hemocompatibility. Specific oligonucleotide sequences influence polyplex physical properties at low N:P ratios, as well as their stability during freeze-drying. Nanoparticles display greater stability for oligodeoxynucleotides ODN vs siRNA; AT-rich vs GC-rich; and overhangs vs blunt ends. Using this knowledge, various CS/siRNA polyplexes are prepared with and without HA coating, freeze-dried and rehydrated at increased concentrations using reduced rehydration volumes. These polyplexes are non-cytotoxic and preserve silencing activity even after rehydration to 20-fold their initial concentration, while HA-coated polyplexes at pH∼7 also displayed increased hemocompatibility. These concentrated formulations represent a critical step towards clinical development of chitosan-based oligonucleotide intravenous delivery systems. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. Detection of nucleic acids by multiple sequential invasive cleavages 02

    DOEpatents

    Hall, Jeff G.; Lyamichev, Victor I.; Mast, Andrea L.; Brow, Mary Ann D.

    2002-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The structure-specific nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based on charge. The present invention also provides methods for the detection of non-target cleavage products via the formation of a complete and activated protein binding region. The invention further provides sensitive and specific methods for the detection of human cytomegalovirus nucleic acid in a sample.

  20. Computational analysis of sequence selection mechanisms.

    PubMed

    Meyerguz, Leonid; Grasso, Catherine; Kleinberg, Jon; Elber, Ron

    2004-04-01

    Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.

  1. Retention of nucleic acids in ion-pair reversed-phase high-performance liquid chromatography depends not only on base composition but also on base sequence.

    PubMed

    Qiao, Jun-Qin; Liang, Chao; Wei, Lan-Chun; Cao, Zhao-Ming; Lian, Hong-Zhen

    2016-12-01

    The study on nucleic acid retention in ion-pair reversed-phase high-performance liquid chromatography mainly focuses on size-dependence, however, other factors influencing retention behaviors have not been comprehensively clarified up to date. In this present work, the retention behaviors of oligonucleotides and double-stranded DNAs were investigated on silica-based C 18 stationary phase by ion-pair reversed-phase high-performance liquid chromatography. It is found that the retention of oligonucleotides was influenced by base composition and base sequence as well as size, and oligonucleotides prone to self-dimerization have weaker retention than those not prone to self-dimerization but with the same base composition. However, homo-oligonucleotides are suitable for the size-dependent separation as a special case of oligonucleotides. For double-stranded DNAs, the retention is also influenced by base composition and base sequence, as well as size. This may be attributed to the interaction of exposed bases in major or minor grooves with the hydrophobic alky chains of stationary phase. In addition, no specific influence of guanine and cytosine content was confirmed on retention of double-stranded DNAs. Notably, the space effect resulted from the stereostructure of nucleic acids also influences the retention behavior in ion-pair reversed-phase high-performance liquid chromatography. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. [Apply fourier transform infrared spectra coupled with two-dimensional correlation analysis to study the evolution of humic acids during composting].

    PubMed

    Bu, Gui-jun; Yu, Jing; Di, Hui-hui; Luo, Shi-jia; Zhou, Da-zhai; Xiao, Qiang

    2015-02-01

    The composition and structure of humic acids formed during composting play an important influence on the quality and mature of compost. In order to explore the composition and evolution mechanism, municipal solid wastes were collected to compost and humic and fulvic acids were obtained from these composted municipal solid wastes. Furthermore, fourier transform infrared spectra and two-dimensional correlation analysis were applied to study the composition and transformation of humic and fulvic acids during composting. The results from fourier transform infrared spectra showed that, the composition of humic acids was complex, and several absorbance peaks were observed at 2917-2924, 2844-2852, 2549, 1662, 1622, 1566, 1454, 1398, 1351, 990-1063, 839 and 711 cm(-1). Compared to humic acids, the composition of fulvci acids was simple, and only three peaks were detected at 1725, 1637 and 990 cm(-1). The appearance of these peaks showed that both humic and fulvic acids comprised the benzene originated from lignin and the polysaccharide. In addition, humic acids comprised a large number of aliphatic and protein which were hardly detected in fulvic acids. Aliphatic, polysaccharide, protein and lignin all were degraded during composting, however, the order of degradation was different between humic and fulvci acids. The result from two-dimensional correlation analysis showed that, organic compounds in humic acids were degraded in the following sequence: aliphatic> protein> polysaccharide and lignin, while that in fulvic acids was as following: protein> polysaccharide and aliphatic. A large number of carboxyl, alcohols and ethers were formed during the degradation process, and the carboxyl was transformed into carbonates. It can be concluded that, fourier transform infrared spectra coupled with two-dimensional correlation analysis not only can analyze the function group composition of humic substances, but also can characterize effectively the degradation sequence of these

  3. Cloning and characterization of the hamster and guinea pig nicotinic acid receptors.

    PubMed

    Torhan, April Smith; Cheewatrakoolpong, Boonlert; Kwee, Lia; Greenfeder, Scott

    2007-09-01

    In this study, we present the identification and characterization of hamster and guinea pig nicotinic acid receptors. The hamster receptor shares approximately 80-90% identity with the nucleotide and amino acid sequences of human, mouse, and rat receptors. The guinea pig receptor shares 76-80% identity with the nucleotide and amino acid sequences of these other species. [(3)H]nicotinic acid binding affinity at guinea pig and hamster receptors is similar to that in human (dissociation constant = 121 nM for guinea pig, 72 nM for hamster, and 74 nM for human), as are potencies of nicotinic acid analogs in competition binding studies. Inhibition of forskolin-stimulated cAMP production by nicotinic acid and related analogs is also similar to the activity in the human receptor. Analysis of mRNA tissue distribution for the hamster and guinea pig nicotinic acid receptors shows expression across a number of tissues, with higher expression in adipose, lung, skeletal muscle, spleen, testis, and ovary.

  4. Study design requirements for RNA sequencing-based breast cancer diagnostics.

    PubMed

    Mer, Arvind Singh; Klevebring, Daniel; Grönberg, Henrik; Rantalainen, Mattias

    2016-02-01

    Sequencing-based molecular characterization of tumors provides information required for individualized cancer treatment. There are well-defined molecular subtypes of breast cancer that provide improved prognostication compared to routine biomarkers. However, molecular subtyping is not yet implemented in routine breast cancer care. Clinical translation is dependent on subtype prediction models providing high sensitivity and specificity. In this study we evaluate sample size and RNA-sequencing read requirements for breast cancer subtyping to facilitate rational design of translational studies. We applied subsampling to ascertain the effect of training sample size and the number of RNA sequencing reads on classification accuracy of molecular subtype and routine biomarker prediction models (unsupervised and supervised). Subtype classification accuracy improved with increasing sample size up to N = 750 (accuracy = 0.93), although with a modest improvement beyond N = 350 (accuracy = 0.92). Prediction of routine biomarkers achieved accuracy of 0.94 (ER) and 0.92 (Her2) at N = 200. Subtype classification improved with RNA-sequencing library size up to 5 million reads. Development of molecular subtyping models for cancer diagnostics requires well-designed studies. Sample size and the number of RNA sequencing reads directly influence accuracy of molecular subtyping. Results in this study provide key information for rational design of translational studies aiming to bring sequencing-based diagnostics to the clinic.

  5. Study Sequence Matters for the Inductive Learning of Cognitive Concepts

    ERIC Educational Resources Information Center

    Sana, Faria; Yan, Veronica X.; Kim, Joseph A.

    2017-01-01

    The sequence in which problems of different concepts are studied during instruction impacts concept learning. For example, several problems of a given concept can be studied together (blocking) or several problems of different concepts can be studied together (interleaving). In the current study, we demonstrate that the 2 sequences impact concept…

  6. Method of Identifying a Base in a Nucleic Acid

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    1999-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  7. Molecular beacon sequence design algorithm.

    PubMed

    Monroe, W Todd; Haselton, Frederick R

    2003-01-01

    A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.

  8. Nucleic Acid Detection Methods

    DOEpatents

    Smith, Cassandra L.; Yaar, Ron; Szafranski, Przemyslaw; Cantor, Charles R.

    1998-05-19

    The invention relates to methods for rapidly determining the sequence and/or length a target sequence. The target sequence may be a series of known or unknown repeat sequences which are hybridized to an array of probes. The hybridized array is digested with a single-strand nuclease and free 3'-hydroxyl groups extended with a nucleic acid polymerase. Nuclease cleaved heteroduplexes can be easily distinguish from nuclease uncleaved heteroduplexes by differential labeling. Probes and target can be differentially labeled with detectable labels. Matched target can be detected by cleaving resulting loops from the hybridized target and creating free 3-hydroxyl groups. These groups are recognized and extended by polymerases added into the reaction system which also adds or releases one label into solution. Analysis of the resulting products using either solid phase or solution. These methods can be used to detect characteristic nucleic acid sequences, to determine target sequence and to screen for genetic defects and disorders. Assays can be conducted on solid surfaces allowing for multiple reactions to be conducted in parallel and, if desired, automated.

  9. Nucleic acid detection methods

    DOEpatents

    Smith, C.L.; Yaar, R.; Szafranski, P.; Cantor, C.R.

    1998-05-19

    The invention relates to methods for rapidly determining the sequence and/or length a target sequence. The target sequence may be a series of known or unknown repeat sequences which are hybridized to an array of probes. The hybridized array is digested with a single-strand nuclease and free 3{prime}-hydroxyl groups extended with a nucleic acid polymerase. Nuclease cleaved heteroduplexes can be easily distinguish from nuclease uncleaved heteroduplexes by differential labeling. Probes and target can be differentially labeled with detectable labels. Matched target can be detected by cleaving resulting loops from the hybridized target and creating free 3-hydroxyl groups. These groups are recognized and extended by polymerases added into the reaction system which also adds or releases one label into solution. Analysis of the resulting products using either solid phase or solution. These methods can be used to detect characteristic nucleic acid sequences, to determine target sequence and to screen for genetic defects and disorders. Assays can be conducted on solid surfaces allowing for multiple reactions to be conducted in parallel and, if desired, automated. 18 figs.

  10. CDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

    1995-03-21

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74-79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  11. Interaction of recombinant human epidermal growth factor with phospholipid vesicles. A steady-state and time-resolved fluorescence study of the bis-tryptophan sequence (Trp49-Trp50).

    PubMed

    Li De La Sierra, I M; Vincent, M; Padron, G; Gallay, J

    1992-01-01

    The interaction of recombinant human epidermal growth factor with small unilamellar phospholipid vesicles was studied by steady-state and time-resolved fluorescence of the bis-tryptophan sequence (Trp49-Trp50). Steady-state anisotropy measurements demonstrate that strong binding occurred with small unilamellar vesicles made up of acidic phospholipids at acidic pH only (pH < or = 4.7). An apparent stoichiometry for 1,2-dimyristoyl-sn-phosphoglycerol of about 12 phospholipid molecules per molecule of human epidermal growth factor was estimated. The binding appears to be more efficient at temperatures above the gel to liquid-crystalline phase transition. The conformation and the environment of the Trp-Trp sequence are not greatly modified after binding, as judged from the invariance of the excited state lifetime distribution and from that of the fast processes affecting the anisotropy decay. This suggests that the Trp-Trp sequence is not embedded within the bilayer, in contrast to the situation in surfactant micelles (Mayo et al. 1987; Kohda and Inigaki 1992).

  12. Genome Sequence of Lactobacillus saerimneri 30a (Formerly Lactobacillus sp. Strain 30a), a Reference Lactic Acid Bacterium Strain Producing Biogenic Amines

    PubMed Central

    Romano, Andrea; Trip, Hein; Campbell-Sills, Hugo; Bouchez, Olivier; Sherman, David; Lolkema, Juke S.

    2013-01-01

    Lactobacillus sp. strain 30a (Lactobacillus saerimneri) produces the biogenic amines histamine, putrescine, and cadaverine by decarboxylating their amino acid precursors. We report its draft genome sequence (1,634,278 bases, 42.6% G+C content) and the principal findings from its annotation, which might shed light onto the enzymatic machineries that are involved in its production of biogenic amines. PMID:23405290

  13. Species specific identification of spore-producing microbes using the gene sequence of small acid-soluble spore coat proteins for amplification based diagnostics

    DOEpatents

    McKinney, Nancy

    2002-01-01

    PCR (polymerase chain reaction) primers for the detection of certain Bacillus species, such as Bacillus anthracis. The primers specifically amplify only DNA found in the target species and can distinguish closely related species. Species-specific PCR primers for Bacillus anthracis, Bacillus globigii and Clostridium perfringens are disclosed. The primers are directed to unique sequences within sasp (small acid soluble protein) genes.

  14. Application of 2D graphic representation of protein sequence based on Huffman tree method.

    PubMed

    Qi, Zhao-Hui; Feng, Jun; Qi, Xiao-Qin; Li, Ling

    2012-05-01

    Based on Huffman tree method, we propose a new 2D graphic representation of protein sequence. This representation can completely avoid loss of information in the transfer of data from a protein sequence to its graphic representation. The method consists of two parts. One is about the 0-1 codes of 20 amino acids by Huffman tree with amino acid frequency. The amino acid frequency is defined as the statistical number of an amino acid in the analyzed protein sequences. The other is about the 2D graphic representation of protein sequence based on the 0-1 codes. Then the applications of the method on ten ND5 genes and seven Escherichia coli strains are presented in detail. The results show that the proposed model may provide us with some new sights to understand the evolution patterns determined from protein sequences and complete genomes. Copyright © 2012 Elsevier Ltd. All rights reserved.

  15. De novo peptide sequencing by deep learning

    PubMed Central

    Tran, Ngoc Hieu; Zhang, Xianglilan; Xin, Lei; Shan, Baozhen; Li, Ming

    2017-01-01

    De novo peptide sequencing from tandem MS data is the key technology in proteomics for the characterization of proteins, especially for new sequences, such as mAbs. In this study, we propose a deep neural network model, DeepNovo, for de novo peptide sequencing. DeepNovo architecture combines recent advances in convolutional neural networks and recurrent neural networks to learn features of tandem mass spectra, fragment ions, and sequence patterns of peptides. The networks are further integrated with local dynamic programming to solve the complex optimization task of de novo sequencing. We evaluated the method on a wide variety of species and found that DeepNovo considerably outperformed state of the art methods, achieving 7.7–22.9% higher accuracy at the amino acid level and 38.1–64.0% higher accuracy at the peptide level. We further used DeepNovo to automatically reconstruct the complete sequences of antibody light and heavy chains of mouse, achieving 97.5–100% coverage and 97.2–99.5% accuracy, without assisting databases. Moreover, DeepNovo is retrainable to adapt to any sources of data and provides a complete end-to-end training and prediction solution to the de novo sequencing problem. Not only does our study extend the deep learning revolution to a new field, but it also shows an innovative approach in solving optimization problems by using deep learning and dynamic programming. PMID:28720701

  16. GASP: Gapped Ancestral Sequence Prediction for proteins

    PubMed Central

    Edwards, Richard J; Shields, Denis C

    2004-01-01

    Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike. PMID:15350199

  17. Comparative analysis of the prion protein gene sequences in African lion.

    PubMed

    Wu, Chang-De; Pang, Wan-Yong; Zhao, De-Ming

    2006-10-01

    The prion protein gene of African lion (Panthera Leo) was first cloned and polymorphisms screened. The results suggest that the prion protein gene of eight African lions is highly homogenous. The amino acid sequences of the prion protein (PrP) of all samples tested were identical. Four single nucleotide polymorphisms (C42T, C81A, C420T, T600C) in the prion protein gene (Prnp) of African lion were found, but no amino acid substitutions. Sequence analysis showed that the higher homology is observed to felis catus AF003087 (96.7%) and to sheep number M31313.1 (96.2%) Genbank accessed. With respect to all the mammalian prion protein sequences compared, the African lion prion protein sequence has three amino acid substitutions. The homology might in turn affect the potential intermolecular interactions critical for cross species transmission of prion disease.

  18. The Biomolecule Sequencer Project: Nanopore Sequencing as a Dual-Use Tool for Crew Health and Astrobiology Investigations

    NASA Technical Reports Server (NTRS)

    John, K. K.; Botkin, D. S.; Burton, A. S.; Castro-Wallace, S. L.; Chaput, J. D.; Dworkin, J. P.; Lehman, N.; Lupisella, M. L.; Mason, C. E.; Smith, D. J.; hide

    2016-01-01

    Human missions to Mars will fundamentally transform how the planet is explored, enabling new scientific discoveries through more sophisticated sample acquisition and processing than can currently be implemented in robotic exploration. The presence of humans also poses new challenges, including ensuring astronaut safety and health and monitoring contamination. Because the capability to transfer materials to Earth will be extremely limited, there is a strong need for in situ diagnostic capabilities. Nucleotide sequencing is a particularly powerful tool because it can be used to: (1) mitigate microbial risks to crew by allowing identification of microbes in water, in air, and on surfaces; (2) identify optimal treatment strategies for infections that arise in crew members; and (3) track how crew members, microbes, and mission-relevant organisms (e.g., farmed plants) respond to conditions on Mars through transcriptomic and genomic changes. Sequencing would also offer benefits for science investigations occurring on the surface of Mars by permitting identification of Earth-derived contamination in samples. If Mars contains indigenous life, and that life is based on nucleic acids or other closely related molecules, sequencing would serve as a critical tool for the characterization of those molecules. Therefore, spaceflight-compatible nucleic acid sequencing would be an important capability for both crew health and astrobiology exploration. Advances in sequencing technology on Earth have been driven largely by needs for higher throughput and read accuracy. Although some reduction in size has been achieved, nearly all commercially available sequencers are not compatible with spaceflight due to size, power, and operational requirements. Exceptions are nanopore-based sequencers that measure changes in current caused by DNA passing through pores; these devices are inherently much smaller and require significantly less power than sequencers using other detection methods

  19. Reference System of DNA and Protein Sequences on CD-ROM

    NASA Astrophysics Data System (ADS)

    Nasu, Hisanori; Ito, Toshiaki

    DNASIS-DBREF31 is a database for DNA and Protein sequences in the form of optical Compact Disk (CD) ROM, developed and commercialized by Hitachi Software Engineering Co., Ltd. Both nucleic acid base sequences and protein amino acid sequences can be retrieved from a single CD-ROM. Existing database is offered in the form of on-line service, floppy disks, or magnetic tape, all of which have some problems or other, such as usability or storage capacity. DNASIS-DBREF31 newly adopt a CD-ROM as a database device to realize a mass storage and personal use of the database.

  20. Spreadsheet macros for coloring sequence alignments.

    PubMed

    Haygood, M G

    1993-12-01

    This article describes a set of Microsoft Excel macros designed to color amino acid and nucleotide sequence alignments for review and preparation of visual aids. The colored alignments can then be modified to emphasize features of interest. Procedures for importing and coloring sequences are described. The macro file adds a new menu to the menu bar containing sequence-related commands to enable users unfamiliar with Excel to use the macros more readily. The macros were designed for use with Macintosh computers but will also run with the DOS version of Excel.

  1. Insights into the sequence parameters for halophilic adaptation.

    PubMed

    Nath, Abhigyan

    2016-03-01

    The sequence parameters for halophilic adaptation are still not fully understood. To understand the molecular basis of protein hypersaline adaptation, a detailed analysis is carried out, and investigated the likely association of protein sequence attributes to halophilic adaptation. A two-stage strategy is implemented, where in the first stage a supervised machine learning classifier is build, giving an overall accuracy of 86 % on stratified tenfold cross validation and 90 % on blind testing set, which are better than the previously reported results. The second stage consists of statistical analysis of sequence features and possible extraction of halophilic molecular signatures. The results of this study showed that, halophilic proteins are characterized by lower average charge, lower K content, and lower S content. A statistically significant preference/avoidance list of sequence parameters is also reported giving insights into the molecular basis of halophilic adaptation. D, Q, E, H, P, T, V are significantly preferred while N, C, I, K, M, F, S are significantly avoided. Among amino acid physicochemical groups, small, polar, charged, acidic and hydrophilic groups are preferred over other groups. The halophilic proteins also showed a preference for higher average flexibility, higher average polarity and avoidance for higher average positive charge, average bulkiness and average hydrophobicity. Some interesting trends observed in dipeptide counts are also reported. Further a systematic statistical comparison is undertaken for gaining insights into the sequence feature distribution in different residue structural states. The current analysis may facilitate the understanding of the mechanism of halophilic adaptation clearer, which can be further used for rational design of halophilic proteins.

  2. A statistical physics perspective on alignment-independent protein sequence comparison.

    PubMed

    Chattopadhyay, Amit K; Nasiev, Diar; Flower, Darren R

    2015-08-01

    Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from 'first passage probability distribution' to summarize statistics of ensemble averaged amino acid propensity values. In this article, we introduce and elaborate this approach. © The Author 2015. Published by Oxford University Press.

  3. Lactic acid production from potato peel waste by anaerobic sequencing batch fermentation using undefined mixed culture.

    PubMed

    Liang, Shaobo; McDonald, Armando G; Coats, Erik R

    2015-11-01

    Lactic acid (LA) is a necessary industrial feedstock for producing the bioplastic, polylactic acid (PLA), which is currently produced by pure culture fermentation of food carbohydrates. This work presents an alternative to produce LA from potato peel waste (PPW) by anaerobic fermentation in a sequencing batch reactor (SBR) inoculated with undefined mixed culture from a municipal wastewater treatment plant. A statistical design of experiments approach was employed using set of 0.8L SBRs using gelatinized PPW at a solids content range from 30 to 50 g L(-1), solids retention time of 2-4 days for yield and productivity optimization. The maximum LA production yield of 0.25 g g(-1) PPW and highest productivity of 125 mg g(-1) d(-1) were achieved. A scale-up SBR trial using neat gelatinized PPW (at 80 g L(-1) solids content) at the 3 L scale was employed and the highest LA yield of 0.14 g g(-1) PPW and a productivity of 138 mg g(-1) d(-1) were achieved with a 1 d SRT. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. Sequence stratigraphic analysis of Cenomanian greenhouse palaeosols: A case study from southern Patagonia, Argentina

    NASA Astrophysics Data System (ADS)

    Varela, Augusto N.; Veiga, Gonzalo D.; Poiré, Daniel G.

    2012-10-01

    The aim of this contribution is to analyse extrinsic (i.e., tectonics, climate and eustasy) and intrinsic (i.e., palaeotopography, palaeodrainage and relative sedimentation rates) factors that controlled palaeosol development in the Cenomanian Mata Amarilla Formation (Austral foreland basin, southwestern Patagonia, Argentina). Detailed sedimentological logs, facies analysis, pedofeatures and palaeosol horizon identification led to the definition of six pedotypes, which represent Histosols, acid sulphate Histosols, Vertisols, hydromorphic Vertisols, Inceptisols and vertic Alfisols. Small- and large-scale changes in palaeosol development were recognised throughout the units. Small-scale or high-frequency variations, identified within the middle section are represented by the lateral and vertical superimposition of Inceptisols, Vertisols and hydromorphic Vertisols. Lateral changes are interpreted as the result of intrinsic factors to the depositional systems, such as the relative position within the floodplain and the distance from the main channels, that condition the nature of parent material, the sedimentation rate and eventually the palaeotopographic position. Vertical stacking of different soil types is linked to avulsion processes and the relatively abrupt change in the distance to main channels as the system aggraded. The large-scale or low-frequency vertical variations in palaeosol type occurring in the Mata Amarilla Formation are related to long-term changes in depositional environments. The lower and upper sections of the studied logs are characterised by Histosols and acid sulphate Histosols, and few hydromorphic Vertisols associated with low-gradient coastal environments (i.e., lagoons, estuaries and distal fluvial systems). At the lower boundary of the middle section, a thick palaeosol succession composed of vertic Alfisols occurs. The rest of the middle section is characterised by Vertisols, hydromorphic Vertisols and Inceptisols occurring on distal and

  5. Transcriptional Response to Lactic Acid Stress in the Hybrid Yeast Zygosaccharomyces parabailii

    PubMed Central

    2017-01-01

    ABSTRACT Lactic acid has a wide range of applications starting from its undissociated form, and its production using cell factories requires stress-tolerant microbial hosts. The interspecies hybrid yeast Zygosaccharomyces parabailii has great potential to be exploited as a novel host for lactic acid production, due to high organic acid tolerance at low pH and a fermentative metabolism with a high growth rate. Here we used mRNA sequencing (RNA-seq) to analyze Z. parabailii's transcriptional response to lactic acid added exogenously, and we explore the biological mechanisms involved in tolerance. Z. parabailii contains two homeologous copies of most genes. Under lactic acid stress, the two genes in each homeolog pair tend to diverge in expression to a significantly greater extent than under control conditions, indicating that stress tolerance is facilitated by interactions between the two gene sets in the hybrid. Lactic acid induces downregulation of genes related to cell wall and plasma membrane functions, possibly altering the rate of diffusion of lactic acid into cells. Genes related to iron transport and redox processes were upregulated, suggesting an important role for respiratory functions and oxidative stress defense. We found differences in the expression profiles of genes putatively regulated by Haa1 and Aft1/Aft2, previously described as lactic acid responsive in Saccharomyces cerevisiae. Furthermore, formate dehydrogenase (FDH) genes form a lactic acid-responsive gene family that has been specifically amplified in Z. parabailii in comparison to other closely related species. Our study provides a useful starting point for the engineering of Z. parabailii as a host for lactic acid production. IMPORTANCE Hybrid yeasts are important in biotechnology because of their tolerance to harsh industrial conditions. The molecular mechanisms of tolerance can be studied by analyzing differential gene expression under conditions of interest and relating gene expression

  6. Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp.

    PubMed

    Deng, Peng; Tan, Xiaoqing; Wu, Ying; Bai, Qunhua; Jia, Yan; Xiao, Hong

    2015-03-01

    The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica , which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function.

  7. Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp

    PubMed Central

    DENG, PENG; TAN, XIAOQING; WU, YING; BAI, QUNHUA; JIA, YAN; XIAO, HONG

    2015-01-01

    The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica, which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function. PMID:25667630

  8. High-Throughput Next-Generation Sequencing of Polioviruses

    PubMed Central

    Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.

    2016-01-01

    ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929

  9. Prediction of glutathionylation sites in proteins using minimal sequence information and their experimental validation.

    PubMed

    Pal, Debojyoti; Sharma, Deepak; Kumar, Mukesh; Sandur, Santosh K

    2016-09-01

    S-glutathionylation of proteins plays an important role in various biological processes and is known to be protective modification during oxidative stress. Since, experimental detection of S-glutathionylation is labor intensive and time consuming, bioinformatics based approach is a viable alternative. Available methods require relatively longer sequence information, which may prevent prediction if sequence information is incomplete. Here, we present a model to predict glutathionylation sites from pentapeptide sequences. It is based upon differential association of amino acids with glutathionylated and non-glutathionylated cysteines from a database of experimentally verified sequences. This data was used to calculate position dependent F-scores, which measure how a particular amino acid at a particular position may affect the likelihood of glutathionylation event. Glutathionylation-score (G-score), indicating propensity of a sequence to undergo glutathionylation, was calculated using position-dependent F-scores for each amino-acid. Cut-off values were used for prediction. Our model returned an accuracy of 58% with Matthew's correlation-coefficient (MCC) value of 0.165. On an independent dataset, our model outperformed the currently available model, in spite of needing much less sequence information. Pentapeptide motifs having high abundance among glutathionylated proteins were identified. A list of potential glutathionylation hotspot sequences were obtained by assigning G-scores and subsequent Protein-BLAST analysis revealed a total of 254 putative glutathionable proteins, a number of which were already known to be glutathionylated. Our model predicted glutathionylation sites in 93.93% of experimentally verified glutathionylated proteins. Outcome of this study may assist in discovering novel glutathionylation sites and finding candidate proteins for glutathionylation.

  10. BONSAI Garden: Parallel knowledge discovery system for amino acid sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shoudai, T.; Miyano, S.; Shinohara, A.

    1995-12-31

    We have developed a machine discovery system BON-SAI which receives positive and negative examples as inputs and produces as a hypothesis a pair of a decision tree over regular patterns and an alphabet indexing. This system has succeeded in discovering reasonable knowledge on transmembrane domain sequences and signal peptide sequences by computer experiments. However, when several kinds of sequences axe mixed in the data, it does not seem reasonable for a single BONSAI system to find a hypothesis of a reasonably small size with high accuracy. For this purpose, we have designed a system BONSAI Garden, in which several BONSAI`smore » and a program called Gardener run over a network in parallel, to partition the data into some number of classes together with hypotheses explaining these classes accurately.« less

  11. Bov-tA short interspersed nucleotide element sequences in circulating nucleic acids from sera of cattle with bovine spongiform encephalopathy (BSE) and sera of cattle exposed to BSE.

    PubMed

    Schütz, Ekkehard; Urnovitz, Howard B; Iakoubov, Leonid; Schulz-Schaeffer, Walter; Wemheuer, Wilhelm; Brenig, Bertram

    2005-07-01

    Circulating nucleic acids (CNA) are known to be enriched in repetitive DNA sequences in humans. Here, bovine sera CNA were analyzed to determine if cell stress-related short interspersed nucleotide elements (SINEs) could be detected in sera from cattle associated with bovine spongiform encephalopathy (BSE). Nucleic acids were extracted, amplified, cloned, and sequenced from the sera of protease-resistant prion protein (PrP(res))-positive cattle (n = 2) and sera from BSE-cohort cows (n = 6); 150 out of 163 clones revealed the presence of, on average, an 80-bp sequence from the 3' region of Bov-tA SINE. A PCR protocol was developed that differentially identified SINE-associated CNA in BSE-exposed versus normal cattle. CNA were extracted from a serum vesicular fraction after controlled blood collection and processing procedures. Sera from four confirmed cases of BSE, 137 BSE-exposed cohort animals associated with eight confirmed BSE cases, and 845 healthy, PrP(res)-negative control cows were tested. All four sera from confirmed BSE cases were repeatedly reactive in the assay. BSE-exposed cohorts had a 100-fold higher occurrence of repeatedly reactive individuals per cohort (average = 63%; range = 33% to 91%), compared to healthy controls (average = 0.6%; P < 0.001). This study shows that BSE-confirmed and cohort animals possess a unique profile of SINE-associated serum CNA that can be utilized as a marker that highly correlates to BSE exposure.

  12. Bov-tA Short Interspersed Nucleotide Element Sequences in Circulating Nucleic Acids from Sera of Cattle with Bovine Spongiform Encephalopathy (BSE) and Sera of Cattle Exposed to BSE

    PubMed Central

    Schütz, Ekkehard; Urnovitz, Howard B.; Iakoubov, Leonid; Schulz-Schaeffer, Walter; Wemheuer, Wilhelm; Brenig, Bertram

    2005-01-01

    Circulating nucleic acids (CNA) are known to be enriched in repetitive DNA sequences in humans. Here, bovine sera CNA were analyzed to determine if cell stress-related short interspersed nucleotide elements (SINEs) could be detected in sera from cattle associated with bovine spongiform encephalopathy (BSE). Nucleic acids were extracted, amplified, cloned, and sequenced from the sera of protease-resistant prion protein (PrPres)-positive cattle (n = 2) and sera from BSE-cohort cows (n = 6); 150 out of 163 clones revealed the presence of, on average, an 80-bp sequence from the 3′ region of Bov-tA SINE. A PCR protocol was developed that differentially identified SINE-associated CNA in BSE-exposed versus normal cattle. CNA were extracted from a serum vesicular fraction after controlled blood collection and processing procedures. Sera from four confirmed cases of BSE, 137 BSE-exposed cohort animals associated with eight confirmed BSE cases, and 845 healthy, PrPres-negative control cows were tested. All four sera from confirmed BSE cases were repeatedly reactive in the assay. BSE-exposed cohorts had a 100-fold higher occurrence of repeatedly reactive individuals per cohort (average = 63%; range = 33% to 91%), compared to healthy controls (average = 0.6%; P < 0.001). This study shows that BSE-confirmed and cohort animals possess a unique profile of SINE-associated serum CNA that can be utilized as a marker that highly correlates to BSE exposure. PMID:16002628

  13. Sequence Alignment to Predict Across Species Susceptibility ...

    EPA Pesticide Factsheets

    Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target, so it is amenable to variable degrees of protein characterization, depending on available information about the chemical/protein interaction and the molecular target itself. To allow for flexibility in the analysis, a layered strategy was adopted for the tool. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of candidate orthologs), the second level evaluates sequence similarity within selected domains (e.g., ligand-binding domain, DNA binding domain), and the third level of analysis compares individual amino acid residue positions identified as being of importance for protein conformation and/or ligand binding upon chemical perturbation. Each level of the SeqAPASS analysis provides increasing evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further ev

  14. Length and sequence dependence in the association of Huntingtin protein with lipid membranes

    NASA Astrophysics Data System (ADS)

    Jawahery, Sudi; Nagarajan, Anu; Matysiak, Silvina

    2013-03-01

    There is a fundamental gap in our understanding of how aggregates of mutant Huntingtin protein (htt) with overextended polyglutamine (polyQ) sequences gain the toxic properties that cause Huntington's disease (HD). Experimental studies have shown that the most important step associated with toxicity is the binding of mutant htt aggregates to lipid membranes. Studies have also shown that flanking amino acid sequences around the polyQ sequence directly affect interactions with the lipid bilayer, and that polyQ sequences of greater than 35 glutamine repeats in htt are a characteristic of HD. The key steps that determine how flanking sequences and polyQ length affect the structure of lipid bilayers remain unknown. In this study, we use atomistic molecular dynamics simulations to study the interactions between lipid membranes of varying compositions and polyQ peptides of varying lengths and flanking sequences. We find that overextended polyQ interactions do cause deformation in model membranes, and that the flanking sequences do play a role in intensifying this deformation by altering the shape of the affected regions.

  15. Opsin cDNA sequences of a UV and green rhodopsin of the satyrine butterfly Bicyclus anynana.

    PubMed

    Vanhoutte, K J A; Eggen, B J L; Janssen, J J M; Stavenga, D G

    2002-11-01

    The cDNAs of an ultraviolet (UV) and long-wavelength (LW) (green) absorbing rhodopsin of the bush brown Bicyclus anynana were partially identified. The UV sequence, encoding 377 amino acids, is 76-79% identical to the UV sequences of the papilionids Papilio glaucus and Papilio xuthus and the moth Manduca sexta. A dendrogram derived from aligning the amino acid sequences reveals an equidistant position of Bicyclus between Papilio and Manduca. The sequence of the green opsin cDNA fragment, which encodes 242 amino acids, represents six of the seven transmembrane regions. At the amino acid level, this fragment is more than 80% identical to the corresponding LW opsin sequences of Dryas, Heliconius, Papilio (rhodopsin 2) and Manduca. Whereas three LW absorbing rhodopsins were identified in the papilionid butterflies, only one green opsin was found in B. anynana.

  16. The cDNA-derived amino acid sequence of hemoglobin II from Lucina pectinata.

    PubMed

    Torres-Mercado, Elineth; Renta, Jessicca Y; Rodríguez, Yolanda; López-Garriga, Juan; Cadilla, Carmen L

    2003-11-01

    Hemoglobin II from the clam Lucina pectinata is an oxygen-reactive protein with a unique structural organization in the heme pocket involving residues Gln65 (E7), Tyr30 (B10), Phe44 (CD1), and Phe69 (E11). We employed the reverse transcriptase-polymerase chain reaction (RT-PCR) and methods to synthesize various cDNA(HbII). An initial 300-bp cDNA clone was amplified from total RNA by RT-PCR using degenerate oligonucleotides. Gene-specific primers derived from the HbII-partial cDNA sequence were used to obtain the 5' and 3' ends of the cDNA by RACE. The length of the HbII cDNA, estimated from overlapping clones, was approximately 2114 bases. Northern blot analysis revealed that the mRNA size of HbII agrees with the estimated size using cDNA data. The coding region of the full-length HbII cDNA codes for 151 amino acids. The calculated molecular weight of HbII, including the heme group and acetylated N-terminal residue, is 17,654.07 Da.

  17. Sequence-Specific Recognition of DNA by Proteins: Binding Motifs Discovered Using a Novel Statistical/Computational Analysis

    PubMed Central

    Jakubec, David; Laskowski, Roman A.; Vondrasek, Jiri

    2016-01-01

    Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue—amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein—DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties. PMID:27384774

  18. Probe kit for identifying a base in a nucleic acid

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  19. STUDIES ON ORGANIC ACID METABOLISM,

    DTIC Science & Technology

    Lipoic acid metabolism: The acetyl and succinyl thio esters of civinyl dimercapto were prepared by chemical and enzymatic means. The oxidation...reduction reactions of the disulfide-dimercapto groups in pyrimidine nucleotide-linked reactions were explored in the initial lipoic acid assay organiam...disulfide couple. The studies appeared to indicate a bound form of lipoic acid to be the coenzyme, and suggested that an amide or possibly another

  20. Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya

    The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less

  1. Integrating mRNA and protein sequencing enables the detection and quantitative profiling of natural protein sequence variants of Populus trichocarpa

    DOE PAGES

    Abraham, Paul E.; Wang, Xiaojing; Ranjan, Priya; ...

    2015-10-20

    The availability of next-generation sequencing technologies has rapidly transformed our ability to link genotypes to phenotypes, and as such, promises to facilitate the dissection of genetic contribution to complex traits. Although discoveries of genetic associations will further our understanding of biology, once candidate variants have been identified, investigators are faced with the challenge of characterizing the functional effects on proteins encoded by such genes. Here we show how next-generation RNA sequencing data can be exploited to construct genotype-specific protein sequence databases, which provide a clearer picture of the molecular toolbox underlying cellular and organismal processes and their variation in amore » natural population. For this study, we used two individual genotypes (DENA-17-3 and VNDL-27-4) from a recent genome wide association (GWA) study of Populus trichocarpa, an obligate outcrosser that exhibits tremendous phenotypic variation across the natural population. This strategy allowed us to comprehensively catalogue proteins containing single amino acid polymorphisms (SAAPs) and insertions and deletions (INDELS). Based on large-scale identification of SAAPs, we profiled the frequency of 128 types of naturally occurring amino acid substitutions, with a subset of SAAPs occurring in regions of the genome having strong polymorphism patterns consistent with recent positive and/or divergent selection. In addition, we were able to explore the diploid landscape of Populus at the proteome-level, allowing the characterization of heterozygous variants.« less

  2. Polymorphism in the Eruption Sequence of Primary Dentition: A Cross-sectional Study

    PubMed Central

    Bhojraj, Nandlal; Narayanappa

    2017-01-01

    Introduction Primary teeth have shown wide variations in their eruption time among different population. Population specific eruption ages are provided as mean with standard deviations or median ages with its percentile range. This alone will be insufficient for prediction of tooth eruption sequence because they provide no information on the frequency of sequence variation within the pairs of teeth. Norms of polymorphic variation in the eruption sequence can be more useful. Aim This study aims at providing norms for the sequence polymorphism in primary teeth among the children of Mysore population. Materials and Methods A cross-sectional study was designed with 1392 children, recruited from December 2015 to June 2016 by simple random sampling method. Tooth was recorded as present or absent. Across the entire possible intra quadrant tooth pair, cases of present-present, absent-absent, present-absent and absent-present and were counted and computed as percentages. Results Sequence polymorphisms were more common in 82-84 pairs of teeth. Significant polymorphic reverse sequence was observed in 52-54 (9%), 82-84 (35%) in males and 82-84 (18%) in females. There was no polymorphism in maxillary arch in females. Conclusion The present study provides the baseline data values for sequence variation in primary teeth eruption. To the best of investigators knowledge, there are no previous studies describing the sequence polymorphism in primary teeth in Indian population. The results of this study helps in assessment of eruption sequence problems in paediatric dentistry and in evaluation and prediction of tooth eruption sequence in individual child. PMID:28658912

  3. Variability of the protein sequences of lcrV between epidemic and atypical rhamnose-positive strains of Yersinia pestis.

    PubMed

    Anisimov, Andrey P; Panfertsev, Evgeniy A; Svetoch, Tat'yana E; Dentovskaya, Svetlana V

    2007-01-01

    Sequencing of lcrV genes and comparison of the deduced amino acid sequences from ten Y. pestis strains belonging mostly to the group of atypical rhamnose-positive isolates (non-pestis subspecies or pestoides group) showed that the LcrV proteins analyzed could be classified into five sequence types. This classification was based on major amino acid polymorphisms among LcrV proteins in the four "hot points" of the protein sequences. Some additional minor polymorphisms were found throughout these sequence types. The "hot points" corresponded to amino acids 18 (Lys --> Asn), 72 (Lys --> Arg), 273 (Cys --> Ser), and 324-326 (Ser-Gly-Lys --> Arg) in the LcrV sequence of the reference Y. pestis strain CO92. One possible explanation for polymorphism in amino acid sequences of LcrV among different strains is that strain-specific variation resulted from adaptation of the plague pathogen to different rodent and lagomorph hosts.

  4. A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing

    PubMed Central

    Green, Richard E.; Malaspinas, Anna-Sapfo; Krause, Johannes; Briggs, Adrian W.; Johnson, Philip L. F.; Uhler, Caroline; Meyer, Matthias; Good, Jeffrey M.; Maricic, Tomislav; Stenzel, Udo; Prüfer, Kay; Siebauer, Michael; Burbano, Hernán A.; Ronan, Michael; Rothberg, Jonathan M.; Egholm, Michael; Rudan, Pavao; Brajković, Dejana; Kućan, Željko; Gušić, Ivan; Wikström, Mårten; Laakkonen, Liisa; Kelso, Janet; Slatkin, Montgomery; Pääbo, Svante

    2008-01-01

    Summary A complete mitochondrial (mt) genome sequence was reconstructed from a 38,000-year-old Neandertal individual using 8,341 mtDNA sequences identified among 4.8 Gb of DNA generated from ~0.3 grams of bone. Analysis of the assembled sequence unequivocally establishes that the Neandertal mtDNA falls outside the variation of extant human mtDNAs and allows an estimate of the divergence date between the two mtDNA lineages of 660,000±140,000 years. Of the 13 proteins encoded in the mtDNA, subunit 2 of cytochrome c oxidase of the mitochondrial electron transport chain has experienced the largest number of amino acid substitutions in human ancestors since the separation from Neandertals. There is evidence that purifying selection in the Neandertal mtDNA was reduced compared to other primate lineages suggesting that the effective population size of Neandertals was small. PMID:18692465

  5. An endogenous factor enhances ferulic acid decarboxylation catalyzed by phenolic acid decarboxylase from Candida guilliermondii

    PubMed Central

    2012-01-01

    The gene for a eukaryotic phenolic acid decarboxylase of Candida guilliermondii was cloned, sequenced, and expressed in Escherichia coli for the first time. The structural gene contained an open reading frame of 504 bp, corresponding to 168 amino acids with a calculated molecular mass of 19,828 Da. The deduced amino sequence exhibited low similarity to those of functional phenolic acid decarboxylases previously reported from bacteria with 25-39% identity and to those of PAD1 and FDC1 proteins from Saccharomyces cerevisiae with less than 14% identity. The C. guilliermondii phenolic acid decarboxylase converted the main substrates ferulic acid and p-coumaric acid to the respective corresponding products. Surprisingly, the ultrafiltrate (Mr 10,000-cut-off) of the cell-free extract of C. guilliermondii remarkably activated the ferulic acid decarboxylation by the purified enzyme, whereas it was almost without effect on the p-coumaric acid decarboxylation. Gel-filtration chromatography of the ultrafiltrate suggested that an endogenous amino thiol-like compound with a molecular weight greater than Mr 1,400 was responsible for the activation. PMID:22217315

  6. Nucleotide sequences of Japanese isolates of citrus vein enation virus.

    PubMed

    Nakazono-Nagaoka, Eiko; Fujikawa, Takashi; Iwanami, Toru

    2017-03-01

    The genomic sequences of five Japanese isolates of citrus vein enation virus (CVEV) isolates that induce vein enation were determined and compared with that of the Spanish isolate VE-1. The nucleotide sequences of all Japanese isolates were 5,983 nt in length. The genomic RNA of Japanese isolates had five potential open reading frames (ORF 0, ORF 1, ORF 2, ORF 3, and ORF 5) in the positive-sense strand. The nucleotide sequence identity among the Japanese isolates and Spanish isolate VE-1 ranged from 98.0% to 99.8%. Comparison of the partial amino acid sequences of ten Japanese isolates and three Spanish isolates suggested that four amino acid residues, at positions of 83, 104, and 113 in ORF 2 and position 41 in ORF 5, might be unique to some Japanese isolates.

  7. A single amino acid change, Q114R, in the cleavage-site sequence of Newcastle disease virus fusion protein attenuates viral replication and pathogenicity.

    PubMed

    Samal, Sweety; Kumar, Sachin; Khattar, Sunil K; Samal, Siba K

    2011-10-01

    A key determinant of Newcastle disease virus (NDV) virulence is the amino acid sequence at the fusion (F) protein cleavage site. The NDV F protein is synthesized as an inactive precursor, F(0), and is activated by proteolytic cleavage between amino acids 116 and 117 to produce two disulfide-linked subunits, F(1) and F(2). The consensus sequence of the F protein cleavage site of virulent [(112)(R/K)-R-Q-(R/K)-R↓F-I(118)] and avirulent [(112)(G/E)-(K/R)-Q-(G/E)-R↓L-I(118)] strains contains a conserved glutamine residue at position 114. Recently, some NDV strains from Africa and Madagascar were isolated from healthy birds and have been reported to contain five basic residues (R-R-R-K-R↓F-I/V or R-R-R-R-R↓F-I/V) at the F protein cleavage site. In this study, we have evaluated the role of this conserved glutamine residue in the replication and pathogenicity of NDV by using the moderately pathogenic Beaudette C strain and by making Q114R, K115R and I118V mutants of the F protein in this strain. Our results showed that changing the glutamine to a basic arginine residue reduced viral replication and attenuated the pathogenicity of the virus in chickens. The pathogenicity was further reduced when the isoleucine at position 118 was substituted for valine.

  8. Development of chemiluminescent probe hybridization, RT-PCR and nucleic acid cycle sequencing assays of Sabin type 3 isolates to identify base pair 472 Sabin type 3 mutants associated with vaccine associated paralytic poliomyelitis.

    PubMed

    Old, M O; Logan, L H; Maldonado, Y A

    1997-11-01

    Sabin type 3 polio vaccine virus is the most common cause of poliovaccine associated paralytic poliomyelitis. Vaccine associated paralytic poliomyelitis cases have been associated with Sabin type 3 revertants containing a single U to C substitution at bp 472 of Sabin type 3. A rapid method of identification of Sabin type 3 bp 472 mutants is described. An enterovirus group-specific probe for use in a chemiluminescent dot blot hybridization assay was developed to identify enterovirus positive viral lysates. A reverse transcription-polymerase chain reaction (RT-PCR) assay producing a 319 bp PCR product containing the Sabin type 3 bp 472 mutation site was then employed to identify Sabin type 3 isolates. Chemiluminescent nucleic acid cycle sequencing of the purified 319 bp PCR product was then employed to identify nucleic acid sequences at bp 472. The enterovirus group probe hybridization procedure and isolation of the Sabin type 3 PCR product were highly sensitive and specific; nucleic acid cycle sequencing corresponded to the known sequence of stock Sabin type 3 isolates. These methods will be used to identify the Sabin type 3 reversion rate from sequential stool samples of infants obtained after the first and second doses of oral poliovirus vaccine.

  9. TALEN-mediated targeted mutagenesis of fatty acid desaturase 2 (FAD2) in peanut (Arachis hypogaea L.) promotes the accumulation of oleic acid.

    PubMed

    Wen, Shijie; Liu, Hao; Li, Xingyu; Chen, Xiaoping; Hong, Yanbin; Li, Haifen; Lu, Qing; Liang, Xuanqiang

    2018-05-01

    A first creation of high oleic acid peanut varieties by using transcription activator-like effecter nucleases (TALENs) mediated targeted mutagenesis of Fatty Acid Desaturase 2 (FAD2). Transcription activator like effector nucleases (TALENs), which allow the precise editing of DNA, have already been developed and applied for genome engineering in diverse organisms. However, they are scarcely used in higher plant study and crop improvement, especially in allopolyploid plants. In the present study, we aimed to create targeted mutagenesis by TALENs in peanut. Targeted mutations in the conserved coding sequence of Arachis hypogaea fatty acid desaturase 2 (AhFAD2) were created by TALENs. Genetic stability of AhFAD2 mutations was identified by DNA sequencing in up to 9.52 and 4.11% of the regeneration plants at two different targeted sites, respectively. Mutation frequencies among AhFAD2 mutant lines were significantly correlated to oleic acid accumulation. Genetically, stable individuals of positive mutant lines displayed a 0.5-2 fold increase in the oleic acid content compared with non-transgenic controls. This finding suggested that TALEN-mediated targeted mutagenesis could increase the oleic acid content in edible peanut oil. Furthermore, this was the first report on peanut genome editing event, and the obtained high oleic mutants could serve for peanut breeding project.

  10. Plant fatty acid hydroxylase

    DOEpatents

    Somerville, Chris; van de Loo, Frank

    2000-01-01

    The present invention relates to the identification of nucleic acid sequences and constructs, and methods related thereto, and the use of these sequences and constructs to produce genetically modified plants for the purpose of altering the composition of plant oils, waxes and related compounds.

  11. Investigation of the protein osteocalcin of Camelops hesternus: Sequence, structure and phylogenetic implications

    NASA Astrophysics Data System (ADS)

    Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.

    2007-12-01

    Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a

  12. cDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

    1999-05-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74-79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  13. cDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, N.V.; Broekaert, W.F.; Chua, N.H.; Kush, A.

    1999-05-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74--79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli. 12 figs.

  14. cDNA encoding a polypeptide including a hevein sequence

    DOEpatents

    Raikhel, N.V.; Broekaert, W.F.; Chua, N.H.; Kush, A.

    1995-03-21

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1,018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74--79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli. 11 figures.

  15. The complete amino acid sequence of echinoidin, a lectin from the coelomic fluid of the sea urchin Anthocidaris crassispina. Homologies with mammalian and insect lectins.

    PubMed

    Giga, Y; Ikai, A; Takahashi, K

    1987-05-05

    The complete amino acid sequence of echinoidin, the proposed name for a lectin from the coelomic fluid of the sea urchin Anthocidaris crassispina, has been determined by sequencing the peptides obtained from tryptic, Staphylococcus aureus V8 protease, chymotryptic, and thermolysin digestions. Echinoidin is a multimeric protein (Giga, Y., Sutoh, K., and Ikai, A. (1985) Biochemistry 24, 4461-4467) whose subunit consists of a total of 147 amino acid residues and one carbohydrate chain attached to Ser38. The molecular weight of the polypeptide without carbohydrate was calculated to be 16,671. Each polypeptide chain contains seven half-cystines, and six of them form three disulfide bonds in the single polypeptide chain (Cys3-Cys14, Cys31-Cys141, and Cys116-Cys132), while Cys2 is involved in an interpolypeptide disulfide linkage. From secondary structure prediction by the method of Chou and Fasman (Chou, P. Y., and Fasman, G. D. (1974) Biochemistry 13, 211-222) the protein appears to be rich in beta-sheet and beta-turn structures and poor in alpha-helical structure. The sequence of the COOH-terminal half of echinoidin is highly homologous to those of the COOH-terminal carbohydrate recognition portions of rat liver mannose-binding protein and several other hepatic lectins. This COOH-terminal region of echinoidin is also homologous to the central portion of the lectin from the flesh fly Sarcophaga peregrina. Moreover, echinoidin contains an Arg-Gly-Asp sequence which has been proposed to be a basic functional unit in cellular recognition proteins.

  16. Typing of canine parvovirus isolates using mini-sequencing based single nucleotide polymorphism analysis.

    PubMed

    Naidu, Hariprasad; Subramanian, B Mohana; Chinchkar, Shankar Ramchandra; Sriraman, Rajan; Rana, Samir Kumar; Srinivasan, V A

    2012-05-01

    The antigenic types of canine parvovirus (CPV) are defined based on differences in the amino acids of the major capsid protein VP2. Type specificity is conferred by a limited number of amino acid changes and in particular by few nucleotide substitutions. PCR based methods are not particularly suitable for typing circulating variants which differ in a few specific nucleotide substitutions. Assays for determining SNPs can detect efficiently nucleotide substitutions and can thus be adapted to identify CPV types. In the present study, CPV typing was performed by single nucleotide extension using the mini-sequencing technique. A mini-sequencing signature was established for all the four CPV types (CPV2, 2a, 2b and 2c) and feline panleukopenia virus. The CPV typing using the mini-sequencing reaction was performed for 13 CPV field isolates and the two vaccine strains available in our repository. All the isolates had been typed earlier by full-length sequencing of the VP2 gene. The typing results obtained from mini-sequencing matched completely with that of sequencing. Typing could be achieved with less than 100 copies of standard plasmid DNA constructs or ≤10¹ FAID₅₀ of virus by mini-sequencing technique. The technique was also efficient for detecting multiple types in mixed infections. Copyright © 2012 Elsevier B.V. All rights reserved.

  17. Optical resolution of phenylthiohydantoin-amino acids by capillary electrophoresis and identification of the phenylthiohydantoin-D-amino acid residue of [D-Ala2]-methionine enkephalin.

    PubMed

    Kurosu, Y; Murayama, K; Shindo, N; Shisa, Y; Ishioka, N

    1996-11-01

    This is an initial report to propose a protein sequence analysis system with DL differentiation using capillary electrophoresis (CE). This system consists of a protein sequencer and a CE system. After fractionation of phenyl-thiohydantoin (PTH)-amino acids using a protein sequencer, optical resolution for each PTH-amino acid is performed by CE using some chiral selectors such as digitonin, beta-escin and others. As a model peptide, [D-Ala2]-methionine enkephalin (L-Tyr-D-Ala-Gly-L-Phe-L-Met), was used and the sequence with DL differentiation was determined, with the exception of the fourth amino acid, L-Phe, using our proposed system.

  18. Comparison of complete genome sequences of dog rabies viruses isolated from China and Mexico reveals key amino acid changes that may be associated with virus replication and virulence.

    PubMed

    Yu, Fulai; Zhang, Guoqing; Zhong, Xiangfu; Han, Na; Song, Yunfeng; Zhao, Ling; Cui, Min; Rayner, Simon; Fu, Zhen F

    2014-07-01

    Rabies is a global problem, but its impact and prevalence vary across different regions. In some areas, such as parts of Africa and Asia, the virus is prevalent in the domestic dog population, leading to epidemic waves and large numbers of human fatalities. In other regions, such as the Americas, the virus predominates in wildlife and bat populations, with sporadic spillover into domestic animals. In this work, we attempted to investigate whether these distinct environments led to selective pressures that result in measurable changes within the genome at the amino acid level. To this end, we collected and sequenced the full genome of two isolates from divergent environments. The first isolate (DRV-AH08) was from China, where the virus is present in the dog population and the country is experiencing a serious epidemic. The second isolate (DRV-Mexico) was taken from Mexico, where the virus is present in both wildlife and domestic dog populations, but at low levels as a consequence of an effective vaccination program. We then combined and compared these with other full genome sequences to identify distinct amino acid changes that might be associated with environment. Phylogenetic analysis identified strain DRV-AH08 as belonging to the China-I lineage, which has emerged to become the dominant lineage in the current epidemic. The Mexico strain was placed in the D11 Mexico lineage, associated with the West USA-Mexico border clade. Amino acid sequence analysis identified only 17 amino acid differences in the N, G and L proteins. These differences may be associated with virus replication and virulence-for example, the short incubation period observed in the current epidemic in China.

  19. FASH: A web application for nucleotides sequence search.

    PubMed

    Veksler-Lublinksy, Isana; Barash, Danny; Avisar, Chai; Troim, Einav; Chew, Paul; Kedem, Klara

    2008-05-27

    : FASH (Fourier Alignment Sequence Heuristics) is a web application, based on the Fast Fourier Transform, for finding remote homologs within a long nucleic acid sequence. Given a query sequence and a long text-sequence (e.g, the human genome), FASH detects subsequences within the text that are remotely-similar to the query. FASH offers an alternative approach to Blast/Fasta for querying long RNA/DNA sequences. FASH differs from these other approaches in that it does not depend on the existence of contiguous seed-sequences in its initial detection phase. The FASH web server is user friendly and very easy to operate. FASH can be accessed athttps://fash.bgu.ac.il:8443/fash/default.jsp (secured website).

  20. [Comparative genomics and evolutionary analysis of CRISPR loci in acetic acid bacteria].

    PubMed

    Xia, Kai; Liang, Xin-le; Li, Yu-dong

    2015-12-01

    The clustered regularly interspaced short palindromic repeat (CRISPR) is a widespread adaptive immunity system that exists in most archaea and many bacteria against foreign DNA, such as phages, viruses and plasmids. In general, CRISPR system consists of direct repeat, leader, spacer and CRISPR-associated sequences. Acetic acid bacteria (AAB) play an important role in industrial fermentation of vinegar and bioelectrochemistry. To investigate the polymorphism and evolution pattern of CRISPR loci in acetic acid bacteria, bioinformatic analyses were performed on 48 species from three main genera (Acetobacter, Gluconacetobacter and Gluconobacter) with whole genome sequences available from the NCBI database. The results showed that the CRISPR system existed in 32 species of the 48 strains studied. Most of the CRISPR-Cas system in AAB belonged to type I CRISPR-Cas system (subtype E and C), but type II CRISPR-Cas system which contain cas9 gene was only found in the genus Acetobacter and Gluconacetobacter. The repeat sequences of some CRISPR were highly conserved among species from different genera, and the leader sequences of some CRISPR possessed conservative motif, which was associated with regulated promoters. Moreover, phylogenetic analysis of cas1 demonstrated that they were suitable for classification of species. The conservation of cas1 genes was associated with that of repeat sequences among different strains, suggesting they were subjected to similar functional constraints. Moreover, the number of spacer was positively correlated with the number of prophages and insertion sequences, indicating the acetic acid bacteria were continually invaded by new foreign DNA. The comparative analysis of CRISR loci in acetic acid bacteria provided the basis for investigating the molecular mechanism of different acetic acid tolerance and genome stability in acetic acid bacteria.

  1. New partial sequences of phosphoenolpyruvate carboxylase as molecular phylogenetic markers.

    PubMed

    Gehrig, H; Heute, V; Kluge, M

    2001-08-01

    To better understand the evolution of the enzyme phosphoenolpyruvate carboxylase (PEPC) and to test its versatility as a molecular character in phylogenetic and taxonomic studies, we have characterized and compared 70 new partial PEPC nucleotide and amino acid sequences (about 1100 bp of the 3' side of the gene) from 50 plant species (24 species of Bryophyta, 1 of Pteridophyta, and 25 of Spermatophyta). Together with previously published data, the new set of sequences allowed us to construct the up to now most complete phylogenetic tree of PEPC, where the PEPC sequences cluster according to both the taxonomic positions of the donor plants and the assumed specific function of the PEPC isoforms. Altogether, the study further strengthens the view that PEPC sequences can provide interesting information for the reconstruction of phylogenetic relations between organisms and metabolic pathways. To avoid confusion in future discussion, we propose a new nomenclature for the denotation of PEPC isoforms. Copyright 2001 Academic Press.

  2. Identification of Delta5-fatty acid desaturase from the cellular slime mold dictyostelium discoideum.

    PubMed

    Saito, T; Ochiai, H

    1999-10-01

    cDNA fragments putatively encoding amino acid sequences characteristic of the fatty acid desaturase were obtained using expressed sequence tag (EST) information of the Dictyostelium cDNA project. Using this sequence, we have determined the cDNA sequence and genomic sequence of a desaturase. The cloned cDNA is 1489 nucleotides long and the deduced amino acid sequence comprised 464 amino acid residues containing an N-terminal cytochrome b5 domain. The whole sequence was 38.6% identical to the initially identified Delta5-desaturase of Mortierella alpina. We have confirmed its function as Delta5-desaturase by over expression mutation in D. discoideum and also the gain of function mutation in the yeast Saccharomyces cerevisiae. Analysis of the lipids from transformed D. discoideum and yeast demonstrated the accumulation of Delta5-desaturated products. This is the first report concering fatty acid desaturase in cellular slime molds.

  3. Phenylketonuria and Gut Microbiota: A Controlled Study Based on Next-Generation Sequencing

    PubMed Central

    Pinheiro de Oliveira, Felipe; Mendes, Roberta Hack; Dobbler, Priscila Thiago; Mai, Volker; Pylro, Victor Salter; Waugh, Sheldon G; Vairo, Filippo; Refosco, Lilia Farret; Schwartz, Ida Vanessa Doederlein

    2016-01-01

    Phenylketonuria (PKU) is an inborn error of metabolism associated with high blood levels of phenylalanine (Phe). A Phe-restricted diet supplemented with L-amino acids is the main treatment strategy for this disease; if started early, most neurological abnormalities can be prevented. The healthy human gut contains trillions of commensal bacteria, often referred to as the gut microbiota. The composition of the gut microbiota is known to be modulated by environmental factors, including diet. In this study, we compared the gut microbiota of 8 PKU patients on Phe-restricted dietary treatment with that of 10 healthy individuals. The microbiota were characterized by 16S rRNA sequencing using the Ion Torrent™ platform. The most dominant phyla detected in both groups were Bacteroidetes and Firmicutes. PKU patients showed reduced abundance of the Clostridiaceae, Erysipelotrichaceae, and Lachnospiraceae families, Clostridiales class, Coprococcus, Dorea, Lachnospira, Odoribacter, Ruminococcus and Veillonella genera, and enrichment of Prevotella, Akkermansia, and Peptostreptococcaceae. Microbial function prediction suggested significant differences in starch/glucose and amino acid metabolism between PKU patients and controls. Together, our results suggest the presence of distinct taxonomic groups within the gut microbiome of PKU patients, which may be modulated by their plasma Phe concentration. Whether our findings represent an effect of the disease itself, or a consequence of the modified diet is unclear. PMID:27336782

  4. Phenylketonuria and Gut Microbiota: A Controlled Study Based on Next-Generation Sequencing.

    PubMed

    Pinheiro de Oliveira, Felipe; Mendes, Roberta Hack; Dobbler, Priscila Thiago; Mai, Volker; Pylro, Victor Salter; Waugh, Sheldon G; Vairo, Filippo; Refosco, Lilia Farret; Roesch, Luiz Fernando Würdig; Schwartz, Ida Vanessa Doederlein

    2016-01-01

    Phenylketonuria (PKU) is an inborn error of metabolism associated with high blood levels of phenylalanine (Phe). A Phe-restricted diet supplemented with L-amino acids is the main treatment strategy for this disease; if started early, most neurological abnormalities can be prevented. The healthy human gut contains trillions of commensal bacteria, often referred to as the gut microbiota. The composition of the gut microbiota is known to be modulated by environmental factors, including diet. In this study, we compared the gut microbiota of 8 PKU patients on Phe-restricted dietary treatment with that of 10 healthy individuals. The microbiota were characterized by 16S rRNA sequencing using the Ion Torrent™ platform. The most dominant phyla detected in both groups were Bacteroidetes and Firmicutes. PKU patients showed reduced abundance of the Clostridiaceae, Erysipelotrichaceae, and Lachnospiraceae families, Clostridiales class, Coprococcus, Dorea, Lachnospira, Odoribacter, Ruminococcus and Veillonella genera, and enrichment of Prevotella, Akkermansia, and Peptostreptococcaceae. Microbial function prediction suggested significant differences in starch/glucose and amino acid metabolism between PKU patients and controls. Together, our results suggest the presence of distinct taxonomic groups within the gut microbiome of PKU patients, which may be modulated by their plasma Phe concentration. Whether our findings represent an effect of the disease itself, or a consequence of the modified diet is unclear.

  5. Plant fatty acid hydroxylases

    DOEpatents

    Somerville, Chris; Broun, Pierre; van de Loo, Frank

    2001-01-01

    This invention relates to plant fatty acyl hydroxylases. Methods to use conserved amino acid or nucleotide sequences to obtain plant fatty acyl hydroxylases are described. Also described is the use of cDNA clones encoding a plant hydroxylase to produce a family of hydroxylated fatty acids in transgenic plants. In addition, the use of genes encoding fatty acid hydroxylases or desaturases to alter the level of lipid fatty acid unsaturation in transgenic plants is described.

  6. Cloning and sequence analysis of the invertase gene INV 1 from the yeast Pichia anomala.

    PubMed

    Pérez, J A; Rodríguez, J; Rodríguez, L; Ruiz, T

    1996-02-01

    A genomic library from the yeast Pichia anomala has been constructed and employed to clone the gene encoding the sucrose-hydrolysing enzyme invertase by complementation of a sucrose non-fermenting mutant of Saccharomyces cerevisiae. The cloned gene, INV1, was sequenced and found to encode a polypeptide of 550 amino acids which contained a 22 amino-acid signal sequence and ten potential glycosylation sites. The amino-acid sequence shows significant identity with other yeast invertases and also with Kluyveromyces marxianus inulinase, a yeast beta-fructofuranosidase which has a different substrate specificity. The nucleotide sequences of the 5' and 3' non-coding regions were found to contain several consensus motifs probably involved in the initiation and termination of gene transcription.

  7. Sequence, distribution and chromosomal context of class I and class II pilin genes of Neisseria meningitidis identified in whole genome sequences

    PubMed Central

    2014-01-01

    Background Neisseria meningitidis expresses type four pili (Tfp) which are important for colonisation and virulence. Tfp have been considered as one of the most variable structures on the bacterial surface due to high frequency gene conversion, resulting in amino acid sequence variation of the major pilin subunit (PilE). Meningococci express either a class I or a class II pilE gene and recent work has indicated that class II pilins do not undergo antigenic variation, as class II pilE genes encode conserved pilin subunits. The purpose of this work was to use whole genome sequences to further investigate the frequency and variability of the class II pilE genes in meningococcal isolate collections. Results We analysed over 600 publically available whole genome sequences of N. meningitidis isolates to determine the sequence and genomic organization of pilE. We confirmed that meningococcal strains belonging to a limited number of clonal complexes (ccs, namely cc1, cc5, cc8, cc11 and cc174) harbour a class II pilE gene which is conserved in terms of sequence and chromosomal context. We also identified pilS cassettes in all isolates with class II pilE, however, our analysis indicates that these do not serve as donor sequences for pilE/pilS recombination. Furthermore, our work reveals that the class II pilE locus lacks the DNA sequence motifs that enable (G4) or enhance (Sma/Cla repeat) pilin antigenic variation. Finally, through analysis of pilin genes in commensal Neisseria species we found that meningococcal class II pilE genes are closely related to pilE from Neisseria lactamica and Neisseria polysaccharea, suggesting horizontal transfer among these species. Conclusions Class II pilins can be defined by their amino acid sequence and genomic context and are present in meningococcal isolates which have persisted and spread globally. The absence of G4 and Sma/Cla sequences adjacent to the class II pilE genes is consistent with the lack of pilin subunit variation in these

  8. Discovery of a novel iflavirus sequence in the eastern paralysis tick Ixodes holocyclus.

    PubMed

    O'Brien, Caitlin A; Hall-Mendelin, Sonja; Hobson-Peters, Jody; Deliyannis, Georgia; Allen, Andy; Lew-Tabor, Ala; Rodriguez-Valle, Manuel; Barker, Dayana; Barker, Stephen C; Hall, Roy A

    2018-05-11

    Ixodes holocyclus, the eastern paralysis tick, is a significant parasite in Australia in terms of animal and human health. However, very little is known about its virome. In this study, next-generation sequencing of I. holocyclus salivary glands yielded a full-length genome sequence which phylogenetically groups with viruses classified in the Iflaviridae family and shares 45% amino acid similarity with its closest relative Bole hyalomma asiaticum virus 1. The sequence of this virus, provisionally named Ixodes holocyclus iflavirus (IhIV) has been identified in tick populations from northern New South Wales and Queensland, Australia and represents the first virus sequence reported from I. holocyclus.

  9. Amino Acid Racemization and the Preservation of Ancient DNA

    NASA Technical Reports Server (NTRS)

    Poinar, Hendrik N.; Hoss, Matthias

    1996-01-01

    The extent of racemization of aspartic acid, alanine, and leucine provides criteria for assessing whether ancient tissue samples contain endogenous DNA. In samples in which the D/L ratio of aspartic acid exceeds 0.08, ancient DNA sequences could not be retrieved. Paleontological finds from which DNA sequences purportedly millions of years old have been reported show extensive racemization, and the amino acids present are mainly contaminates. An exception is the amino acids in some insects preserved in amber.

  10. Catalytic behavior of perchloric acid on silica mesoporous SBA-15 as a green heterogeneous Bronsted acid in heterocyclic multicomponent reactions

    NASA Astrophysics Data System (ADS)

    Baghban, Ali; Doustkhah, Esmail; Rostamnia, Sadegh

    2018-04-01

    Catalytic behavior of perchloric acid when supported to mesoporous silica SBA-15 (SBA-15/HClO4) was investigated as a heterogeneous Bronsted acid. Its reactivity and leaching possibility were studied in cascade ring opening-cyclocondensation sequence of diketene and alcohol with aldehyde in the presence of either of urea or ammonium acetate. Results showed that this catalyst can be highly recyclable for several cycles.

  11. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less

  12. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

    DOE PAGES

    Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.; ...

    2017-07-18

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less

  13. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.

    PubMed

    Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D

    2017-01-01

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.

  14. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies

    PubMed Central

    Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Richard A.; Brown, Steven D.

    2017-01-01

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences. PMID:28769883

  15. The complete DNA sequence of lymphocystis disease virus.

    PubMed

    Tidona, C A; Darai, G

    1997-04-14

    Lymphocystis disease virus (LCDV) is the causative agent of lymphocystis disease, which has been reported to occur in over 100 different fish species worldwide. LCDV is a member of the family Iridoviridae and the type species of the genus Lymphocystivirus. The virions contain a single linear double-stranded DNA molecule, which is circularly permuted, terminally redundant, and heavily methylated at cytosines in CpG sequences. The complete nucleotide sequence of LCDV-1 (flounder isolate) was determined by automated cycle sequencing and primer walking. The genome of LCDV-1 is 102.653 bp in length and contains 195 open reading frames with coding capacities ranging from 40 to 1199 amino acids. Computer-assisted analyses of the deduced amino acid sequences led to the identification of several putative gene products with significant homologies to entries in protein data banks, such as the two major subunits of the viral DNA-dependent RNA polymerase, DNA polymerase, several protein kinases, two subunits of the ribonucleoside diphosphate reductase, DNA methyltransferase, the viral major capsid protein, insulin-like growth factor, and tumor necrosis factor receptor homolog.

  16. Transcriptional Response to Lactic Acid Stress in the Hybrid Yeast Zygosaccharomyces parabailii.

    PubMed

    Ortiz-Merino, Raúl A; Kuanyshev, Nurzhan; Byrne, Kevin P; Varela, Javier A; Morrissey, John P; Porro, Danilo; Wolfe, Kenneth H; Branduardi, Paola

    2018-03-01

    Lactic acid has a wide range of applications starting from its undissociated form, and its production using cell factories requires stress-tolerant microbial hosts. The interspecies hybrid yeast Zygosaccharomyces parabailii has great potential to be exploited as a novel host for lactic acid production, due to high organic acid tolerance at low pH and a fermentative metabolism with a high growth rate. Here we used mRNA sequencing (RNA-seq) to analyze Z. parabailii 's transcriptional response to lactic acid added exogenously, and we explore the biological mechanisms involved in tolerance. Z. parabailii contains two homeologous copies of most genes. Under lactic acid stress, the two genes in each homeolog pair tend to diverge in expression to a significantly greater extent than under control conditions, indicating that stress tolerance is facilitated by interactions between the two gene sets in the hybrid. Lactic acid induces downregulation of genes related to cell wall and plasma membrane functions, possibly altering the rate of diffusion of lactic acid into cells. Genes related to iron transport and redox processes were upregulated, suggesting an important role for respiratory functions and oxidative stress defense. We found differences in the expression profiles of genes putatively regulated by Haa1 and Aft1/Aft2, previously described as lactic acid responsive in Saccharomyces cerevisiae Furthermore, formate dehydrogenase ( FDH ) genes form a lactic acid-responsive gene family that has been specifically amplified in Z. parabailii in comparison to other closely related species. Our study provides a useful starting point for the engineering of Z. parabailii as a host for lactic acid production. IMPORTANCE Hybrid yeasts are important in biotechnology because of their tolerance to harsh industrial conditions. The molecular mechanisms of tolerance can be studied by analyzing differential gene expression under conditions of interest and relating gene expression patterns

  17. Molecular Recognition and Structural Influences on Function in Bio-nanosystems of Nucleic Acids and Proteins

    NASA Astrophysics Data System (ADS)

    Sethaphong, Latsavongsakda

    This work examines smart material properties of rational self-assembly and molecular recognition found in nano-biosystems. Exploiting the sequence and structural information encoded within nucleic acids and proteins will permit programmed synthesis of nanomaterials and help create molecular machines that may carry out new roles involving chemical catalysis and bioenergy. Responsive to different ionic environments thru self-reorgnization, nucleic acids (NA) are nature's signature smart material; organisms such as viruses and bacteria use features of NAs to react to their environment and orchestrate their lifecycle. Furthermore, nucleic acid systems (both RNA and DNA) are currently exploited as scaffolds; recent applications have been showcased to build bioelectronics and biotemplated nanostructures via directed assembly of multidimensional nanoelectronic devices 1. Since the most stable and rudimentary structure of nucleic acids is the helical duplex, these were modeled in order to examine the influence of the microenvironment, sequence, and cation-dependent perturbations of their canonical forms. Due to their negatively charged phosphate backbone, NA's rely on counterions to overcome the inherent repulsive forces that arise from the assembly of two complementary strands. As a realistic model system, we chose the HIV-TAR helix (PDB ID: 397D) to study specific sequence motifs on cation sequestration. At physiologically relevant concentrations of sodium and potassium ions, we observed sequence based effects where purine stretches were adept in retaining high residency cations. The transitional space between adenine and guanosine nucleotides (ApG step) in a sequence proved the most favorable. This work was the first to directly show these subtle interactions of sequence based cationic sequestration and may be useful for controlling metallization of nucleic acids in conductive nanowires. Extending the study further, we explored the degree to which the structure of NA

  18. Draft Genome Sequences of Pseudomonas fluorescens BS2 and Pusillimonas noertemannii BS8, Soil Bacteria That Cooperate To Degrade the Poly-γ-d-Glutamic Acid Anthrax Capsule.

    PubMed

    Stabler, Richard A; Negus, David; Pain, Arnab; Taylor, Peter W

    2013-01-01

    A mixed culture of Pseudomonas fluorescens BS2 and Pusillimonas noertemannii BS8 degraded poly-γ-d-glutamic acid; when the 2 strains were cultured separately, no hydrolytic activity was apparent. Here we report the draft genome sequences of both soil isolates.

  19. Depletion of Unwanted Nucleic Acid Templates by Selective Cleavage: LNAzymes, Catalytically Active Oligonucleotides Containing Locked Nucleic Acids, Open a New Window for Detecting Rare Microbial Community Members

    PubMed Central

    Dolinšek, Jan; Dorninger, Christiane; Lagkouvardos, Ilias; Wagner, Michael

    2013-01-01

    Many studies of molecular microbial ecology rely on the characterization of microbial communities by PCR amplification, cloning, sequencing, and phylogenetic analysis of genes encoding rRNAs or functional marker enzymes. However, if the established clone libraries are dominated by one or a few sequence types, the cloned diversity is difficult to analyze by random clone sequencing. Here we present a novel approach to deplete unwanted sequence types from complex nucleic acid mixtures prior to cloning and downstream analyses. It employs catalytically active oligonucleotides containing locked nucleic acids (LNAzymes) for the specific cleavage of selected RNA targets. When combined with in vitro transcription and reverse transcriptase PCR, this LNAzyme-based technique can be used with DNA or RNA extracts from microbial communities. The simultaneous application of more than one specific LNAzyme allows the concurrent depletion of different sequence types from the same nucleic acid preparation. This new method was evaluated with defined mixtures of cloned 16S rRNA genes and then used to identify accompanying bacteria in an enrichment culture dominated by the nitrite oxidizer “Candidatus Nitrospira defluvii.” In silico analysis revealed that the majority of publicly deposited rRNA-targeted oligonucleotide probes may be used as specific LNAzymes with no or only minor sequence modifications. This efficient and cost-effective approach will greatly facilitate tasks such as the identification of microbial symbionts in nucleic acid preparations dominated by plastid or mitochondrial rRNA genes from eukaryotic hosts, the detection of contaminants in microbial cultures, and the analysis of rare organisms in microbial communities of highly uneven composition. PMID:23263968

  20. Ray Wu as Fifth Business: Deconstructing collective memory in the history of DNA sequencing.

    PubMed

    Onaga, Lisa A

    2014-06-01

    The concept of 'Fifth Business' is used to analyze a minority standpoint and bring serious attention to the role of scientists who play a galvanizing role in a science but for multiple reasons appear less prominently in more common recounts of any particular development. Biochemist Ray Wu (1928-2008) published a DNA sequencing experiment in March 1970 using DNA polymerase catalysis and specific nucleotide labeling, both of which are foundational to general sequencing methods today. The scant mention of Wu's work from textbooks, research articles, and other accounts of DNA sequencing calls into question how scientific collective memory forms. This alternative history seeks to understand why a key figure in nucleic acid sequence analysis has remained less visibly connected or peripheral to solidifying narratives about the history of DNA sequencing. The study resists predictable dismissals of Wu's work in order to seriously examine the formation of his nucleic acid sequence analysis research program and how he shared his knowledge of sequencing during a period of rapid advancement in the field. An analysis of Wu's work on sequencing the cohesive ends of lambda bacteriophage in the 1960s and 1970s exemplifies how a variety of individuals and groups attempted to develop protocol for sequencing the order of nucleotide base pairs comprising DNA. This historical examination of the sociality of scientific research suggests a way to understand how Wu and others contributed to the very collective memory of DNA sequencing that Wu eventually tried to repair. The study of Wu, who was a Chinese immigrant to the United States, provides a foundation for further critical scholarship on the heterogeneous histories of Asian American bioscientists, the sociality of their scientific works, and how the resulting knowledge produced is preserved, if not evenly, in a scientific field's collective memory. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. A Unique (3+2) Annulation Reaction between Meldrum's Acid and Nitrones: Mechanistic Insight by ESI-IMS-MS and DFT Studies.

    PubMed

    Lespes, Nicolas; Pair, Etienne; Maganga, Clisy; Bretier, Marie; Tognetti, Vincent; Joubert, Laurent; Levacher, Vincent; Hubert-Roux, Marie; Afonso, Carlos; Loutelier-Bourhis, Corinne; Brière, Jean-François

    2018-03-15

    The fragile intermediates of the domino process leading to an isoxazolidin-5-one, triggered by unique reactivity between Meldrum's acid and an N-benzyl nitrone in the presence of a Brønsted base, were determined thanks to the softness and accuracy of electrospray ionization mass spectrometry coupled to ion mobility spectrometry (ESI-IMS-MS). The combined DFT study shed light on the overall organocatalytic sequence that starts with a stepwise (3+2) annulation reaction that is followed by a decarboxylative protonation sequence encompassing a stereoselective pathway issue. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. [Cloning and sequence analysis of full-length cDNA of secoisolariciresinol dehydrogenase of Dysosma versipellis].

    PubMed

    Xu, Li; Ding, Zhi-Shan; Zhou, Yun-Kai; Tao, Xue-Fen

    2009-06-01

    To obtain the full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene from Dysosma versipellis by RACE PCR,then investigate the character of Secoisolariciresinol Dehydrogenase gene. The full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene was obtained by 3'-RACE and 5'-RACE from Dysosma versipellis. We first reported the full cDNA sequences of Secoisolariciresinol Dehydrogenase in Dysosma versipellis. The acquired gene was 991bp in full length, including 5' untranslated region of 42bp, 3' untranslated region of 112bp with Poly (A). The open reading frame (ORF) encoding 278 amino acid with molecular weight 29253.3 Daltons and isolectric point 6.328. The gene accession nucleotide sequence number in GeneBank was EU573789. Semi-quantitative RT-PCR analysis revealed that the Secoisolariciresinol Dehydrogenase gene was highly expressed in stem. Alignment of the amino acid sequence of Secoisolariciresinol Dehydrogenase indicated there may be some significant amino acid sequence difference among different species. Obtain the full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene from Dysosma versipellis.

  3. Safety and efficacy of gadoteric acid in pediatric magnetic resonance imaging: overview of clinical trials and post-marketing studies.

    PubMed

    Balassy, Csilla; Roberts, Donna; Miller, Stephen F

    2015-11-01

    Gadoteric acid is a paramagnetic gadolinium macrocyclic contrast agent approved for use in MRI of cerebral and spinal lesions and for body imaging. To investigate the safety and efficacy of gadoteric acid in children by extensively reviewing clinical and post-marketing observational studies. Data were collected from 3,810 children (ages 3 days to 17 years) investigated in seven clinical trials of central nervous system (CNS) imaging (n = 141) and six post-marketing observational studies of CNS, musculoskeletal and whole-body MR imaging (n = 3,669). Of these, 3,569 children were 2-17 years of age and 241 were younger than 2 years. Gadoteric acid was generally administered at a dose of 0.1 mmol/kg. We evaluated image quality, lesion detection and border delineation, and the safety of gadoteric acid. We also reviewed post-marketing pharmacovigilance experience. Consistent with findings in adults, gadoteric acid was effective in children for improving image quality compared with T1-W unenhanced sequences, providing diagnostic improvement, and often influencing the therapeutic approach, resulting in treatment modifications. In studies assessing neurological tumors, gadoteric acid improved border delineation, internal morphology and contrast enhancement compared to unenhanced MR imaging. Gadoteric acid has a well-established safety profile. Among all studies, a total of 10 children experienced 20 adverse events, 7 of which were thought to be related to gadoteric acid. No serious adverse events were reported in any study. Post-marketing pharmacovigilance experience did not find any specific safety concern. Gadoteric acid was associated with improved lesion detection and delineation and is an effective and well-tolerated contrast agent for use in children.

  4. Purification and sequence of rat oxyntomodulin.

    PubMed Central

    Collie, N L; Walsh, J H; Wong, H C; Shively, J E; Davis, M T; Lee, T D; Reeve, J R

    1994-01-01

    Structural information about rat enteroglucagon, intestinal peptides containing the pancreatic glucagon sequence, has been based previously on cDNA, immunologic, and chromatographic data. Our interests in testing the physiological actions of synthetic enteroglucagon peptides in rats required that we identify precisely the forms present in vivo. From knowledge of the proglucagon gene sequence, we synthesized an enteroglucagon C-terminal octapeptide common to both proposed enteroglucagon forms, glicentin and oxyntomodulin, but sharing no sequence overlap with glucagon. We then developed a radioimmunoassay using antibodies raised against the octapeptide that was specific for enteroglucagon peptides without cross-reacting with glucagon. Rat intestine was extracted, and one presumptive enteroglucagon form was purified by following the enteroglucagon C-terminal octapeptide-like immunoreactivity through several HPLC purification steps. Structural characterization of the material by amino acid composition, microsequence, and mass spectral analyses identified the peptide as rat oxyntomodulin. The 37-residue peptide consists of pancreatic glucagon plus the C-terminal extension, Lys-Arg-Asn-Arg-Asn-Asn-Ile-Ala. This now permits synthesis of an unambiguous duplicate of endogenous rat oxyntomodulin for physiological studies. Images PMID:7937770

  5. Comparative analysis of ribosomal protein L5 sequences from bacteria of the genus Thermus.

    PubMed

    Jahn, O; Hartmann, R K; Boeckh, T; Erdmann, V A

    1991-06-01

    The genes for the ribosomal 5S rRNA binding protein L5 have been cloned from three extremely thermophilic eubacteria, Thermus flavus, Thermus thermophilus HB8 and Thermus aquaticus (Jahn et al, submitted). Genes for protein L5 from the three Thermus strains display 95% G/C in third positions of codons. Amino acid sequences deduced from the DNA sequence were shown to be identical for T flavus and T thermophilus, although the corresponding DNA sequences differed by two T to C transitions in the T thermophilus gene. Protein L5 sequences from T flavus and T thermophilus are 95% homologous to L5 from T aquaticus and 56.5% homologous to the corresponding E coli sequence. The lowest degrees of homology were found between the T flavus/T thermophilus L5 proteins and those of yeast L16 (27.5%), Halobacterium marismortui (34.0%) and Methanococcus vannielii (36.6%). From sequence comparison it becomes clear that thermostability of Thermus L5 proteins is achieved by an increase in hydrophobic interactions and/or by restriction of steric flexibility due to the introduction of amino acids with branched aliphatic side chains such as leucine. Alignment of the nine protein sequences equivalent to Thermus L5 proteins led to identification of a conserved internal segment, rich in acidic amino acids, which shows homology to subsequences of E coli L18 and L25. The occurrence of conserved sequence elements in 5S rRNA binding proteins and ribosomal proteins in general is discussed in terms of evolution and function.

  6. Applications of Single-Cell Sequencing for Multiomics.

    PubMed

    Xu, Yungang; Zhou, Xiaobo

    2018-01-01

    Single-cell sequencing interrogates the sequence or chromatin information from individual cells with advanced next-generation sequencing technologies. It provides a higher resolution of cellular differences and a better understanding of the underlying genetic and epigenetic mechanisms of an individual cell in the context of its survival and adaptation to microenvironment. However, it is more challenging to perform single-cell sequencing and downstream data analysis, owing to the minimal amount of starting materials, sample loss, and contamination. In addition, due to the picogram level of the amount of nucleic acids used, heavy amplification is often needed during sample preparation of single-cell sequencing, resulting in the uneven coverage, noise, and inaccurate quantification of sequencing data. All these unique properties raise challenges in and thus high demands for computational methods that specifically fit single-cell sequencing data. We here comprehensively survey the current strategies and challenges for multiple single-cell sequencing, including single-cell transcriptome, genome, and epigenome, beginning with a brief introduction to multiple sequencing techniques for single cells.

  7. Multi-locus and long amplicon sequencing approach to study microbial diversity at species level using the MinION™ portable nanopore sequencer

    PubMed Central

    Sanz, Yolanda

    2017-01-01

    Abstract The miniaturized and portable DNA sequencer MinION™ has demonstrated great potential in different analyses such as genome-wide sequencing, pathogen outbreak detection and surveillance, human genome variability, and microbial diversity. In this study, we tested the ability of the MinION™ platform to perform long amplicon sequencing in order to design new approaches to study microbial diversity using a multi-locus approach. After compiling a robust database by parsing and extracting the rrn bacterial region from more than 67000 complete or draft bacterial genomes, we demonstrated that the data obtained during sequencing of the long amplicon in the MinION™ device using R9 and R9.4 chemistries were sufficient to study 2 mock microbial communities in a multiplex manner and to almost completely reconstruct the microbial diversity contained in the HM782D and D6305 mock communities. Although nanopore-based sequencing produces reads with lower per-base accuracy compared with other platforms, we presented a novel approach consisting of multi-locus and long amplicon sequencing using the MinION™ MkIb DNA sequencer and R9 and R9.4 chemistries that help to overcome the main disadvantage of this portable sequencing platform. Furthermore, the nanopore sequencing library, constructed with the last releases of pore chemistry (R9.4) and sequencing kit (SQK-LSK108), permitted the retrieval of the higher level of 1D read accuracy sufficient to characterize the microbial species present in each mock community analysed. Improvements in nanopore chemistry, such as minimizing base-calling errors and new library protocols able to produce rapid 1D libraries, will provide more reliable information in the near future. Such data will be useful for more comprehensive and faster specific detection of microbial species and strains in complex ecosystems. PMID:28605506

  8. Analysis of genomic sequences by Chaos Game Representation.

    PubMed

    Almeida, J S; Carriço, J A; Maretzek, A; Noble, P A; Fletcher, M

    2001-05-01

    Chaos Game Representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to find the coordinates for their position in a continuous space. This distribution of positions has two properties: it is unique, and the source sequence can be recovered from the coordinates such that distance between positions measures similarity between the corresponding sequences. The possibility of using the latter property to identify succession schemes have been entirely overlooked in previous studies which raises the possibility that CGR may be upgraded from a mere representation technique to a sequence modeling tool. The distribution of positions in the CGR plane were shown to be a generalization of Markov chain probability tables that accommodates non-integer orders. Therefore, Markov models are particular cases of CGR models rather than the reverse, as currently accepted. In addition, the CGR generalization has both practical (computational efficiency) and fundamental (scale independence) advantages. These results are illustrated by using Escherichia coli K-12 as a test data-set, in particular, the genes thrA, thrB and thrC of the threonine operon.

  9. SeqAPASS: Sequence alignment to predict across-species ...

    EPA Pesticide Factsheets

    Efforts to shift the toxicity testing paradigm from whole organism studies to those focused on the initiation of toxicity and relevant pathways have led to increased utilization of in vitro and in silico methods. Hence the emergence of high through-put screening (HTS) programs, such as U.S. EPA ToxCast, and application of the adverse outcome pathway (AOP) framework for identifying and defining biological key events triggered upon perturbation of molecular initiating events and leading to adverse outcomes occuring at a level of organization relevant for risk assessment [1]. With these recent initiatives to harness the power of “the pathway” in describing and evaluating toxicity comes the need to extrapolate data beyond the model species. Sequence alignment to predict across-species susceptibilty (SeqAPASS) is a web-based tool that allows the user to begin to understand how broadly HTS data or AOP constructs may plausibly be extrapolated across species, while describing the relative intrinsic susceptibiltiy of different taxa to chemicals with known modes of action (e.g., pharmaceuticals and pesticides). The tool rapidly and strategically assesses available molecular target information to describe protein sequence similarity at the primary amino acid sequence, conserved domain, and individual amino acid residue levels. This in silico approach to species extrapolation was designed to automate and streamline the relatively complex and time-consuming process of co

  10. Characterization of the HLA-DRβ1 third hypervariable region amino acid sequence according to charge and parental inheritance in systemic sclerosis.

    PubMed

    Gentil, Coline A; Gammill, Hilary S; Luu, Christine T; Mayes, Maureen D; Furst, Dan E; Nelson, J Lee

    2017-03-07

    Specific HLA class II alleles are associated with systemic sclerosis (SSc) risk, clinical characteristics, and autoantibodies. HLA nomenclature initially developed with antibodies as typing reagents defining DRB1 allele groups. However, alleles from different DRB1 allele groups encode the same third hypervariable region (3rd HVR) sequence, the primary T-cell recognition site, and 3rd HVR charge differences can affect interactions with T cells. We considered 3rd HVR sequences (amino acids 67-74) irrespective of the allele group and analyzed parental inheritance considered according to the 3rd HVR charge, comparing SSc patients with controls. In total, 306 families (121 SSc and 185 controls) were HLA genotyped and parental HLA-haplotype origin was determined. Analysis was conducted according to DRβ1 3rd HVR sequence, charge, and parental inheritance. The distribution of 3rd HVR sequences differed in SSc patients versus controls (p = 0.007), primarily due to an increase of specific DRB1*11 alleles, in accord with previous observations. The 3rd HVR sequences were next analyzed according to charge and parental inheritance. Paternal transmission of DRB1 alleles encoding a +2 charge 3rd HVR was significantly reduced in SSc patients compared with maternal transmission (p = 0.0003, corrected for analysis of four charge categories p = 0.001). To a lesser extent, paternal transmission was increased when charge was 0 (p = 0.021, corrected for multiple comparisons p = 0.084). In contrast, paternal versus maternal inheritance was similar in controls. SSc patients differed from controls when DRB1 alleles were categorized according to 3rd HVR sequences. Skewed parental inheritance was observed in SSc patients but not in controls when the DRβ1 3rd HVR was considered according to charge. These observations suggest that epigenetic modulation of HLA merits investigation in SSc.

  11. Phosphoenolpyruvate carboxykinase of Trypanosoma brucei is targeted to the glycosomes by a C-terminal sequence.

    PubMed

    Sommer, J M; Nguyen, T T; Wang, C C

    1994-08-15

    Import of proteins into the glycosomes of T. brucei resembles the peroxisomal protein import in that C-terminal SKL-like tripeptide sequences can function as targeting signals. Many of the glycosomal proteins do not, however, possess such C-terminal tripeptide signals. Among these, phosphoenolpyruvate carboxykinase (PEPCK (ATP)) was thought to be targeted to the glycosomes by an N-terminal or an internal targeting signal. A limited similarity to the N-terminal targeting signal of rat peroxisomal thiolase exists at the N-terminus of T. brucei PEPCK. However, we found that this peroxisomal targeting signal does not function for glycosomal protein import in T. brucei. Further studies of the PEPCK gene revealed that the C-terminus of the predicted protein does not correspond to the previously deduced protein sequence of 472 amino acids due to a -1 frame shift error in the original DNA sequence. Readjusting the reading frame of the sequence results in a predicted protein of 525 amino acids in length ending in a tripeptide serine-arginine-leucine (SRL), which is a potential targeting signal for import into the glycosomes. A fusion protein of firefly luciferase, without its own C-terminal SKL targeting signal, and T. brucei PEPCK is efficiently imported into the glycosomes when expressed in procyclic trypanosomes. Deletion of the C-terminal SRL tripeptide or the last 29 amino acids of PEPCK reduced the import only by about 50%, while a deletion of the last 47 amino acids completely abolished the import. These results suggest that T. brucei PEPCK may contain a second, internal glycosomal targeting signal upstream of the C-terminal SRL sequence.

  12. Deep Sequencing Reveals the Complete Genome and Evidence for Transcriptional Activity of the First Virus-Like Sequences Identified in Aristotelia chilensis (Maqui Berry)

    PubMed Central

    Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor

    2015-01-01

    Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242

  13. Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

    PubMed

    Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

    2017-06-01

    Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.

  14. Comprehensive analysis of the T-cell receptor beta chain gene in rhesus monkey by high throughput sequencing

    PubMed Central

    Li, Zhoufang; Liu, Guangjie; Tong, Yin; Zhang, Meng; Xu, Ying; Qin, Li; Wang, Zhanhui; Chen, Xiaoping; He, Jiankui

    2015-01-01

    Profiling immune repertoires by high throughput sequencing enhances our understanding of immune system complexity and immune-related diseases in humans. Previously, cloning and Sanger sequencing identified limited numbers of T cell receptor (TCR) nucleotide sequences in rhesus monkeys, thus their full immune repertoire is unknown. We applied multiplex PCR and Illumina high throughput sequencing to study the TCRβ of rhesus monkeys. We identified 1.26 million TCRβ sequences corresponding to 643,570 unique TCRβ sequences and 270,557 unique complementarity-determining region 3 (CDR3) gene sequences. Precise measurements of CDR3 length distribution, CDR3 amino acid distribution, length distribution of N nucleotide of junctional region, and TCRV and TCRJ gene usage preferences were performed. A comprehensive profile of rhesus monkey immune repertoire might aid human infectious disease studies using rhesus monkeys. PMID:25961410

  15. Prospects for Fungal Bioremediation of Acidic Radioactive Waste Sites: Characterization and Genome Sequence of Rhodotorula taiwanensis MD1149.

    PubMed

    Tkavc, Rok; Matrosova, Vera Y; Grichenko, Olga E; Gostinčar, Cene; Volpe, Robert P; Klimenkova, Polina; Gaidamakova, Elena K; Zhou, Carol E; Stewart, Benjamin J; Lyman, Mathew G; Malfatti, Stephanie A; Rubinfeld, Bonnee; Courtot, Melanie; Singh, Jatinder; Dalgard, Clifton L; Hamilton, Theron; Frey, Kenneth G; Gunde-Cimerman, Nina; Dugan, Lawrence; Daly, Michael J

    2017-01-01

    Highly concentrated radionuclide waste produced during the Cold War era is stored at US Department of Energy (DOE) production sites. This radioactive waste was often highly acidic and mixed with heavy metals, and has been leaking into the environment since the 1950s. Because of the danger and expense of cleanup of such radioactive sites by physicochemical processes, in situ bioremediation methods are being developed for cleanup of contaminated ground and groundwater. To date, the most developed microbial treatment proposed for high-level radioactive sites employs the radiation-resistant bacterium Deinococcus radiodurans . However, the use of Deinococcus spp. and other bacteria is limited by their sensitivity to low pH. We report the characterization of 27 diverse environmental yeasts for their resistance to ionizing radiation (chronic and acute), heavy metals, pH minima, temperature maxima and optima, and their ability to form biofilms. Remarkably, many yeasts are extremely resistant to ionizing radiation and heavy metals. They also excrete carboxylic acids and are exceptionally tolerant to low pH. A special focus is placed on Rhodotorula taiwanensis MD1149, which was the most resistant to acid and gamma radiation. MD1149 is capable of growing under 66 Gy/h at pH 2.3 and in the presence of high concentrations of mercury and chromium compounds, and forming biofilms under high-level chronic radiation and low pH. We present the whole genome sequence and annotation of R. taiwanensis strain MD1149, with a comparison to other Rhodotorula species. This survey elevates yeasts to the frontier of biology's most radiation-resistant representatives, presenting a strong rationale for a role of fungi in bioremediation of acidic radioactive waste sites.

  16. Prospects for Fungal Bioremediation of Acidic Radioactive Waste Sites: Characterization and Genome Sequence of Rhodotorula taiwanensis MD1149

    PubMed Central

    Tkavc, Rok; Matrosova, Vera Y.; Grichenko, Olga E.; Gostinčar, Cene; Volpe, Robert P.; Klimenkova, Polina; Gaidamakova, Elena K.; Zhou, Carol E.; Stewart, Benjamin J.; Lyman, Mathew G.; Malfatti, Stephanie A.; Rubinfeld, Bonnee; Courtot, Melanie; Singh, Jatinder; Dalgard, Clifton L.; Hamilton, Theron; Frey, Kenneth G.; Gunde-Cimerman, Nina; Dugan, Lawrence; Daly, Michael J.

    2018-01-01

    Highly concentrated radionuclide waste produced during the Cold War era is stored at US Department of Energy (DOE) production sites. This radioactive waste was often highly acidic and mixed with heavy metals, and has been leaking into the environment since the 1950s. Because of the danger and expense of cleanup of such radioactive sites by physicochemical processes, in situ bioremediation methods are being developed for cleanup of contaminated ground and groundwater. To date, the most developed microbial treatment proposed for high-level radioactive sites employs the radiation-resistant bacterium Deinococcus radiodurans. However, the use of Deinococcus spp. and other bacteria is limited by their sensitivity to low pH. We report the characterization of 27 diverse environmental yeasts for their resistance to ionizing radiation (chronic and acute), heavy metals, pH minima, temperature maxima and optima, and their ability to form biofilms. Remarkably, many yeasts are extremely resistant to ionizing radiation and heavy metals. They also excrete carboxylic acids and are exceptionally tolerant to low pH. A special focus is placed on Rhodotorula taiwanensis MD1149, which was the most resistant to acid and gamma radiation. MD1149 is capable of growing under 66 Gy/h at pH 2.3 and in the presence of high concentrations of mercury and chromium compounds, and forming biofilms under high-level chronic radiation and low pH. We present the whole genome sequence and annotation of R. taiwanensis strain MD1149, with a comparison to other Rhodotorula species. This survey elevates yeasts to the frontier of biology's most radiation-resistant representatives, presenting a strong rationale for a role of fungi in bioremediation of acidic radioactive waste sites. PMID:29375494

  17. Complete genome sequence of a novel genotype of squash mosaic virus

    USDA-ARS?s Scientific Manuscript database

    Complete genome sequence of a novel genotype of Squash mosaic virus (SqMV) infecting squash plants in Spain was obtained using deep sequencing of small ribonucleic acids and assembly. The low nucleotide sequence identities, with 87-88% on RNA1 and 84-86% on RNA2 to known SqMV isolates, suggest a new...

  18. cDNA encoding a polypeptide including a hev ein sequence

    DOEpatents

    Raikhel, Natasha V.; Broekaert, Willem F.; Chua, Nam-Hai; Kush, Anil

    2000-07-04

    A cDNA clone (HEV1) encoding hevein was isolated via polymerase chain reaction (PCR) using mixed oligonucleotides corresponding to two regions of hevein as primers and a Hevea brasiliensis latex cDNA library as a template. HEV1 is 1018 nucleotides long and includes an open reading frame of 204 amino acids. The deduced amino acid sequence contains a putative signal sequence of 17 amino acid residues followed by a 187 amino acid polypeptide. The amino-terminal region (43 amino acids) is identical to hevein and shows homology to several chitin-binding proteins and to the amino-termini of wound-induced genes in potato and poplar. The carboxyl-terminal portion of the polypeptide (144 amino acids) is 74-79% homologous to the carboxyl-terminal region of wound-inducible genes of potato. Wounding, as well as application of the plant hormones abscisic acid and ethylene, resulted in accumulation of hevein transcripts in leaves, stems and latex, but not in roots, as shown by using the cDNA as a probe. A fusion protein was produced in E. coli from the protein of the present invention and maltose binding protein produced by the E. coli.

  19. SequenceCEROSENE: a computational method and web server to visualize spatial residue neighborhoods at the sequence level.

    PubMed

    Heinke, Florian; Bittrich, Sebastian; Kaiser, Florian; Labudde, Dirk

    2016-01-01

    To understand the molecular function of biopolymers, studying their structural characteristics is of central importance. Graphics programs are often utilized to conceive these properties, but with the increasing number of available structures in databases or structure models produced by automated modeling frameworks this process requires assistance from tools that allow automated structure visualization. In this paper a web server and its underlying method for generating graphical sequence representations of molecular structures is presented. The method, called SequenceCEROSENE (color encoding of residues obtained by spatial neighborhood embedding), retrieves the sequence of each amino acid or nucleotide chain in a given structure and produces a color coding for each residue based on three-dimensional structure information. From this, color-highlighted sequences are obtained, where residue coloring represent three-dimensional residue locations in the structure. This color encoding thus provides a one-dimensional representation, from which spatial interactions, proximity and relations between residues or entire chains can be deduced quickly and solely from color similarity. Furthermore, additional heteroatoms and chemical compounds bound to the structure, like ligands or coenzymes, are processed and reported as well. To provide free access to SequenceCEROSENE, a web server has been implemented that allows generating color codings for structures deposited in the Protein Data Bank or structure models uploaded by the user. Besides retrieving visualizations in popular graphic formats, underlying raw data can be downloaded as well. In addition, the server provides user interactivity with generated visualizations and the three-dimensional structure in question. Color encoded sequences generated by SequenceCEROSENE can aid to quickly perceive the general characteristics of a structure of interest (or entire sets of complexes), thus supporting the researcher in the initial

  20. DNA sequence of the lymphotropic variant of minute virus of mice, MVM(i), and comparison with the DNA sequence of the fibrotropic prototype strain.

    PubMed

    Astell, C R; Gardiner, E M; Tattersall, P

    1986-02-01

    The sequence of molecular clones of the genome of MVM(i), a lymphotropic variant of minute virus of mice, was determined and compared with that of MVM(p), the fibrotropic prototype strain. At the nucleotide level there are 163 base changes: 129 transitions and 34 transversions. Most nucleotide changes are silent, with only 27 amino acids changes predicted, of which 22 are conservative. Notable differences between the MVM(i) and MVM(p) genomes which may account for the cell specificities of these viruses occur within the 3' nontranslated regions. The differences discussed include the absence of a 65-base-pair direct in MVM(i), the presence of only two polyadenylation sites in MVM(i) compared with four in MVM(p), and sequences that bear a resemblance to enhancer sequences. Also included in this paper is an important correction to the MVM(p) sequence (C.R. Astell, M. Thomson, M. Merchlinsky, and D. C. Ward, Nucleic Acids Res. 11:999-1018, 1983).