Science.gov

Sample records for acid sequence information

  1. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  2. Multimodal phylogeny for taxonomy: integrating information from nucleotide and amino acid sequences.

    PubMed

    Bicego, Manuele; Dellaglio, Franco; Felis, Giovanna E

    2007-10-01

    The crucial role played by the analysis of microbial diversity in biotechnology-based innovations has increased the interest in the microbial taxonomy research area. Phylogenetic sequence analyses have contributed significantly to the advances in this field, also in the view of the large amount of sequence data collected in recent years. Phylogenetic analyses could be realized on the basis of protein-encoding nucleotide sequences or encoded amino acid molecules: these two mechanisms present different peculiarities, still starting from two alternative representations of the same information. This complementarity could be exploited to achieve a multimodal phylogenetic scheme that is able to integrate gene and protein information in order to realize a single final tree. This aspect has been poorly addressed in the literature. In this paper, we propose to integrate the two phylogenetic analyses using basic schemes derived from the multimodality fusion theory (or multiclassifier systems theory), a well-founded and rigorous branch for which its powerfulness has already been demonstrated in other pattern recognition contexts. The proposed approach could be applied to distance matrix-based phylogenetic techniques (like neighbor joining), resulting in a smart and fast method. The proposed methodology has been tested in a real case involving sequences of some species of lactic acid bacteria. With this dataset, both nucleotide sequence- and amino acid sequence-based phylogenetic analyses present some drawbacks, which are overcome with the multimodal analysis. PMID:17933011

  3. A simple ligation-based method to increase the information density in sequencing reactions used to deconvolute nucleic acid selections

    PubMed Central

    Childs-Disney, Jessica L.; Disney, Matthew D.

    2008-01-01

    Herein, a method is described to increase the information density of sequencing experiments used to deconvolute nucleic acid selections. The method is facile and should be applicable to any selection experiment. A critical feature of this method is the use of biotinylated primers to amplify and encode a BamHI restriction site on both ends of a PCR product. After amplification, the PCR reaction is captured onto streptavidin resin, washed, and digested directly on the resin. Resin-based digestion affords clean product that is devoid of partially digested products and unincorporated PCR primers. The product's complementary ends are annealed and ligated together with T4 DNA ligase. Analysis of ligation products shows formation of concatemers of different length and little detectable monomer. Sequencing results produced data that routinely contained three to four copies of the library. This method allows for more efficient formulation of structure-activity relationships since multiple active sequences are identified from a single clone. PMID:18065718

  4. Predicting Secretory Proteins of Malaria Parasite by Incorporating Sequence Evolution Information into Pseudo Amino Acid Composition via Grey System Model

    PubMed Central

    Lin, Wei-Zhong; Fang, Jian-An; Xiao, Xuan; Chou, Kuo-Chen

    2012-01-01

    The malaria disease has become a cause of poverty and a major hindrance to economic development. The culprit of the disease is the parasite, which secretes an array of proteins within the host erythrocyte to facilitate its own survival. Accordingly, the secretory proteins of malaria parasite have become a logical target for drug design against malaria. Unfortunately, with the increasing resistance to the drugs thus developed, the situation has become more complicated. To cope with the drug resistance problem, one strategy is to timely identify the secreted proteins by malaria parasite, which can serve as potential drug targets. However, it is both expensive and time-consuming to identify the secretory proteins of malaria parasite by experiments alone. To expedite the process for developing effective drugs against malaria, a computational predictor called “iSMP-Grey” was developed that can be used to identify the secretory proteins of malaria parasite based on the protein sequence information alone. During the prediction process a protein sample was formulated with a 60D (dimensional) feature vector formed by incorporating the sequence evolution information into the general form of PseAAC (pseudo amino acid composition) via a grey system model, which is particularly useful for solving complicated problems that are lack of sufficient information or need to process uncertain information. It was observed by the jackknife test that iSMP-Grey achieved an overall success rate of 94.8%, remarkably higher than those by the existing predictors in this area. As a user-friendly web-server, iSMP-Grey is freely accessible to the public at http://www.jci-bioinfo.cn/iSMP-Grey. Moreover, for the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematical equations involved in this paper. PMID:23189138

  5. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  6. High speed nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  7. Sequence information signal processor

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1999-01-01

    An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.

  8. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid sequence disclosures must include a copy of the sequence listing in accordance with the requirements in 37 CFR...

  9. Chip-based sequencing nucleic acids

    DOEpatents

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  10. Distinguishing Proteins From Arbitrary Amino Acid Sequences

    PubMed Central

    Yau, Stephen S.-T.; Mao, Wei-Guang; Benson, Max; He, Rong Lucy

    2015-01-01

    What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe. PMID:25609314

  11. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  12. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  13. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  14. Amino-Acid Sequence of Porcine Pepsin

    PubMed Central

    Tang, J.; Sepulveda, P.; Marciniszyn, J.; Chen, K. C. S.; Huang, W-Y.; Tao, N.; Liu, D.; Lanier, J. P.

    1973-01-01

    As the culmination of several years of experiments, we propose a complete amino-acid sequence for porcine pepsin, an enzyme containing 327 amino-acid residues in a single polypeptide chain. In the sequence determination, the enzyme was treated with cyanogen bromide. Five resulting fragments were purified. The amino-acid sequence of four of the fragments accounted for 290 residues. Because the structure of a 37-residue carboxyl-terminal fragment was already known, it was not studied. The alignment of these fragments was determined from the sequence of methionyl-peptides we had previously reported. We also discovered the locations of activesite aspartyl residues, as well as the pairing of the three disulfide bridges. A minor component of commercial crystalline pepsin was found to contain two extra amino-acid residues, Ala-Leu-, at the amino-terminus of the molecule. This minor component was apparently derived from a different site of cleavage during the activation of porcine pepsinogen. PMID:4587252

  15. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.

  16. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-07-21

    A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.

  17. On Quantum Algorithm for Multiple Alignment of Amino Acid Sequences

    NASA Astrophysics Data System (ADS)

    Iriyama, Satoshi; Ohya, Masanori

    2009-02-01

    The alignment of genome sequences or amino acid sequences is one of fundamental operations for the study of life. Usual computational complexity for the multiple alignment of N sequences with common length L by dynamic programming is O(LN). This alignment is considered as one of the NP problems, so that it is desirable to find a nice algorithm of the multiple alignment. Thus in this paper we propose the quantum algorithm for the multiple alignment based on the works12,1,2 in which the NP complete problem was shown to be the P problem by means of quantum algorithm and chaos information dynamics.

  18. Methods for analyzing nucleic acid sequences

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid. The method provides a complex comprising a polymerase enzyme, a target nucleic acid molecule, and a primer, wherein the complex is immobilized on a support Fluorescent label is attached to a terminal phosphate group of the nucleotide or nucleotide analog. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The time duration of the signal from labeled nucleotides or nucleotide analogs that become incorporated is distinguished from freely diffusing labels by a longer retention in the observation volume for the nucleotides or nucleotide analogs that become incorporated than for the freely diffusing labels.

  19. Catalog of PRA dominant accident sequence information

    SciTech Connect

    Cathey, N.G.; Krantz, E.A.; Poloski, J.P.; Shapiro, B.J.

    1985-07-01

    Information concerning the dominant accident sequences from twelve published probabilistic risk assessments (PRA) is cataloged in this report, which is published as a part of the Accident Sequence Evaluation Program (ASEP). The purpose of this report is to provide users of PRA information a single reference document. The cataloged results include plant operation information, core-melt frequency, event tree models, dominant factors affecting core-melt and sequence frequencies, and a description of each dominant accident sequence. The report provides a consistent set of insights on the factors that drive the dominant accident sequences. ASEP has reconstructed the PRA fault tree models at the system or train level of detail and requantified the sequence likelihoods to provide the consistent insights. This work provides the information for the other ASEP activities on accident likelihood assessment for the operating and near-term operating plants.

  20. Prebiotically plausible mechanisms increase compositional diversity of nucleic acid sequences

    PubMed Central

    Derr, Julien; Manapat, Michael L.; Rajamani, Sudha; Leu, Kevin; Xulvi-Brunet, Ramon; Joseph, Isaac; Nowak, Martin A.; Chen, Irene A.

    2012-01-01

    During the origin of life, the biological information of nucleic acid polymers must have increased to encode functional molecules (the RNA world). Ribozymes tend to be compositionally unbiased, as is the vast majority of possible sequence space. However, ribonucleotides vary greatly in synthetic yield, reactivity and degradation rate, and their non-enzymatic polymerization results in compositionally biased sequences. While natural selection could lead to complex sequences, molecules with some activity are required to begin this process. Was the emergence of compositionally diverse sequences a matter of chance, or could prebiotically plausible reactions counter chemical biases to increase the probability of finding a ribozyme? Our in silico simulations using a two-letter alphabet show that template-directed ligation and high concatenation rates counter compositional bias and shift the pool toward longer sequences, permitting greater exploration of sequence space and stable folding. We verified experimentally that unbiased DNA sequences are more efficient templates for ligation, thus increasing the compositional diversity of the pool. Our work suggests that prebiotically plausible chemical mechanisms of nucleic acid polymerization and ligation could predispose toward a diverse pool of longer, potentially structured molecules. Such mechanisms could have set the stage for the appearance of functional activity very early in the emergence of life. PMID:22319215

  1. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  2. Detection of nucleic acid sequences by invader-directed cleavage

    DOEpatents

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  3. Hybridization and sequencing of nucleic acids using base pair mismatches

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  4. Amino acid sequence of the nonsecretory ribonuclease of human urine.

    PubMed

    Beintema, J J; Hofsteenge, J; Iwama, M; Morita, T; Ohgi, K; Irie, M; Sugiyama, R H; Schieven, G L; Dekker, C A; Glitz, D G

    1988-06-14

    The amino acid sequence of a nonsecretory ribonuclease isolated from human urine was determined except for the identity of the residue at position 7. Sequence information indicates that the ribonucleases of human liver and spleen and an eosinophil-derived neurotoxin are identical or very closely related gene products. The sequence is identical at about 30% of the amino acid positions with those of all of the secreted mammalian ribonucleases for which information is available. Identical residues include active-site residues histidine-12, histidine-119, and lysine-41, other residues known to be important for substrate binding and catalytic activity, and all eight half-cystine residues common to these enzymes. Major differences include a deletion of six residues in the (so-called) S-peptide loop, insertions of two, and nine residues, respectively, in three other external loops of the molecule, and an addition of three residues at the amino terminus. The sequence shows the human nonsecretory ribonuclease to belong to the same ribonuclease superfamily as the mammalian secretory ribonucleases, turtle pancreatic ribonuclease, and human angiogenin. Sequence data suggest that a gene duplication occurred in an ancient vertebrate ancestor; one branch led to the nonsecretory ribonuclease, while the other branch led to a second duplication, with one line leading to the secretory ribonucleases (in mammals) and the second line leading to pancreatic ribonuclease in turtle and an angiogenic factor in mammals (human angiogenin). The nonsecretory ribonuclease has five short carbohydrate chains attached via asparagine residues at the surface of the molecule; these chains may have been shortened by exoglycosidase action.(ABSTRACT TRUNCATED AT 250 WORDS) PMID:3166997

  5. Predicting intrinsic disorder from amino acid sequence.

    PubMed

    Obradovic, Zoran; Peng, Kang; Vucetic, Slobodan; Radivojac, Predrag; Brown, Celeste J; Dunker, A Keith

    2003-01-01

    Blind predictions of intrinsic order and disorder were made on 42 proteins subsequently revealed to contain 9,044 ordered residues, 284 disordered residues in 26 segments of length 30 residues or less, and 281 disordered residues in 2 disordered segments of length greater than 30 residues. The accuracies of the six predictors used in this experiment ranged from 77% to 91% for the ordered regions and from 56% to 78% for the disordered segments. The average of the order and disorder predictions ranged from 73% to 77%. The prediction of disorder in the shorter segments was poor, from 25% to 66% correct, while the prediction of disorder in the longer segments was better, from 75% to 95% correct. Four of the predictors were composed of ensembles of neural networks. This enabled them to deal more efficiently with the large asymmetry in the training data through diversified sampling from the significantly larger ordered set and achieve better accuracy on ordered and long disordered regions. The exclusive use of long disordered regions for predictor training likely contributed to the disparity of the predictions on long versus short disordered regions, while averaging the output values over 61-residue windows to eliminate short predictions of order or disorder probably contributed to the even greater disparity for three of the predictors. This experiment supports the predictability of intrinsic disorder from amino acid sequence. PMID:14579347

  6. The amino acid sequence of rabbit muscle triose phosphate isomerase.

    PubMed Central

    Corran, P H; Waley, S G

    1975-01-01

    The amino acid sequence of rabbit muscle triose phosphate isomerase was deduced by characterizing peptides that overlap the tryptic peptides. Thiol groups were modified by oxidation, carboxymethylation or aminoen. About 50 peptides that provided information about overlaps were isolated; the peptides were mostly characterized by their compositions and N-terminal residues. The peptide chains contain 248 amino acid residues, and no evidence for dissimilarity of the two subunits that comprise the native enzyme was found. The sequence of the rabbit muscle enzyme may be compared with that of the coelacanth enzyme (Kolb et al., 1974): 84% of the residues are in identical positions. Similarly, comparison of the sequence with that inferred for the chicken enzyme (Furth et al., 1974) shows that 87% of the residues are in identical positions. Limited though these comparisons are, they suggest that triose phosphate isomerase has one of the lowest rates of evolutionary change. An extended version of the present paper has been deposited as Supplementary Publication SUP 50040 (42 pages) at the British Library (Lending Division) (formerly the National Lending Library for Science and Technology), Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms given in Biochem. J. (1975) 145, 5. PMID:1171682

  7. The amino acid sequence of chymopapain from Carica papaya.

    PubMed Central

    Watson, D C; Yaguchi, M; Lynn, K R

    1990-01-01

    Chymopapain is a polypeptide of 218 amino acid residues. It has considerable structural similarity with papain and papaya proteinase omega, including conservation of the catalytic site and of the disulphide bonding. Chymopapain is like papaya proteinase omega in carrying four extra residues between papain positions 168 and 169, but differs from both papaya proteinases in the composition of its S2 subsite, as well as in having a second thiol group, Cys-117. Some evidence for the amino acid sequence of chymopapain has been deposited as Supplementary Publication SUP 50153 (12 pages) at the British Library Document Supply Centre, Boston Spa., Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms indicated in Biochem. J. (1990) 265, 5. The information comprises Supplement Tables 1-4, which contain, in order, amino acid compositions of peptides from tryptic, peptic, CNBr and mild acid cleavages, Supplement Fig. 1, showing re-fractionation of selected peaks from Fig. 2 of the main paper. Supplement Fig. 2, showing cation-exchange chromatography of the earliest-eluted peak of Fig. 3 of the main paper, Supplement Fig. 3, showing reverse-phase h.p.l.c. of the later-eluted peak from Fig. 3 of the main paper, and Supplement Fig. 4, showing the separation of peptides after mild acid hydrolysis of CNBr-cleavage fragment CB3. PMID:2106878

  8. The amino acid sequence of chymopapain from Carica papaya.

    PubMed

    Watson, D C; Yaguchi, M; Lynn, K R

    1990-02-15

    Chymopapain is a polypeptide of 218 amino acid residues. It has considerable structural similarity with papain and papaya proteinase omega, including conservation of the catalytic site and of the disulphide bonding. Chymopapain is like papaya proteinase omega in carrying four extra residues between papain positions 168 and 169, but differs from both papaya proteinases in the composition of its S2 subsite, as well as in having a second thiol group, Cys-117. Some evidence for the amino acid sequence of chymopapain has been deposited as Supplementary Publication SUP 50153 (12 pages) at the British Library Document Supply Centre, Boston Spa., Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms indicated in Biochem. J. (1990) 265, 5. The information comprises Supplement Tables 1-4, which contain, in order, amino acid compositions of peptides from tryptic, peptic, CNBr and mild acid cleavages, Supplement Fig. 1, showing re-fractionation of selected peaks from Fig. 2 of the main paper. Supplement Fig. 2, showing cation-exchange chromatography of the earliest-eluted peak of Fig. 3 of the main paper, Supplement Fig. 3, showing reverse-phase h.p.l.c. of the later-eluted peak from Fig. 3 of the main paper, and Supplement Fig. 4, showing the separation of peptides after mild acid hydrolysis of CNBr-cleavage fragment CB3. PMID:2106878

  9. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  10. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2006-07-04

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  11. Kit for detecting nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2001-01-01

    A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the

  12. Solid phase sequencing of double-stranded nucleic acids

    DOEpatents

    Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

    2002-01-01

    This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.

  13. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  14. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    David J. States

    1998-08-01

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  15. From Artificial Amino Acids to Sequence-Defined Targeted Oligoaminoamides.

    PubMed

    Morys, Stephan; Wagner, Ernst; Lächelt, Ulrich

    2016-01-01

    Artificial oligoamino acids with appropriate protecting groups can be used for the sequential assembly of oligoaminoamides on solid-phase. With the help of these oligoamino acids multifunctional nucleic acid (NA) carriers can be designed and produced in highly defined topologies. Here we describe the synthesis of the artificial oligoamino acid Fmoc-Stp(Boc3)-OH, the subsequent assembly into sequence-defined oligomers and the formulation of tumor-targeted plasmid DNA (pDNA) polyplexes. PMID:27436323

  16. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Foley, B.; Korber, B.; Mellors, J.W.; Jeang, K.T.; Wain-Hobson, S.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  17. Predicting protein-protein interactions based only on sequences information.

    PubMed

    Shen, Juwen; Zhang, Jian; Luo, Xiaomin; Zhu, Weiliang; Yu, Kunqian; Chen, Kaixian; Li, Yixue; Jiang, Hualiang

    2007-03-13

    Protein-protein interactions (PPIs) are central to most biological processes. Although efforts have been devoted to the development of methodology for predicting PPIs and protein interaction networks, the application of most existing methods is limited because they need information about protein homology or the interaction marks of the protein partners. In the present work, we propose a method for PPI prediction using only the information of protein sequences. This method was developed based on a learning algorithm-support vector machine combined with a kernel function and a conjoint triad feature for describing amino acids. More than 16,000 diverse PPI pairs were used to construct the universal model. The prediction ability of our approach is better than that of other sequence-based PPI prediction methods because it is able to predict PPI networks. Different types of PPI networks have been effectively mapped with our method, suggesting that, even with only sequence information, this method could be applied to the exploration of networks for any newly discovered protein with unknown biological relativity. In addition, such supplementary experimental information can enhance the prediction ability of the method. PMID:17360525

  18. Information contained in the amino acid sequence of the alpha1(I)-chain of collagen and its consequences upon the formation of the triple helix, of fibrils and crosslinks.

    PubMed

    Fietzek, P P; Kühn, K

    1975-09-30

    The molecule of type I collagen from skin consists of two alpha1(I)-chains and one alpha2-chain. The sequence of the entire alpha1-chain comprising 1052 residues is summarily presented and discussed. Apart from the 279 residues of alpha1(I)-CB8 whose sequence has been established for rat skin collagen, all sequences have been determined for calf skin collagen. In order to facilitate sequence analysis, the alpha1-chain was cleaved into defined fragments by cyanogen bromide or hydroxylamine or limited collagenase digestion. Most of the sequence was established by automated stepwise Edman degradation. The alpha1-chain contains two basically different types of sequences: the triple helical region of 1011 amino acid residues in which every third position is occupied by glycine and the N- and C-terminal regions not displaying this type of regularity. Both of these non-triple helical regions carry oxidizable lysine or hydroxylysine residues as functional sites for the intermolecular crosslink formation. Implications of the amino acid sequence for the stability of the triple helix and the fibril as well as for formation of crosslinks are discussed. Evaluation of the sequence in connection with electron microscopical investigations yielded the parameters of the axial arrangement of the molecules within the fibrils. Axial stagger of the molecules by a distance D = 670 angstrom = 233 amino acid residues results in maximal interaction of polar sequence regions of adjacent molecules and similarly of regions of hydrophobic residues. Ordered aggregation of molecules into fibrils is, therefore, regulated by electrostatic and electrophobic forces. Possible loci of intermolecular crosslinks between the alpha1-chains of adjacent molecules may be deduced from the dimensions of the axial aggregation of molecules. PMID:171554

  19. Detecting frame shifts by amino acid sequence comparison.

    PubMed

    Claverie, J M

    1993-12-20

    Various amino acid substitution scoring matrices are used in conjunction with local alignments programs to detect regions of similarity and infer potential common ancestry between proteins. The usual scoring schemes derive from the implicit hypothesis that related proteins evolve from a common ancestor by the accumulation of point mutations and that amino acids tend to be progressively substituted by others with similar properties. However, other frequent single mutation events, like nucleotide insertion or deletion and gene inversion, change the translation reading frame and cause previously encoded amino acid sequences to become unrecognizable at once. Here, I derive five new types of scoring matrix, each capable of detecting a specific frame shift (deletion, insertion and inversion in 3 frames) and use them with a regular local alignments program to detect amino acid sequences that may have derived from alternative reading frames of the same nucleotide sequence. Frame shifts are inferred from the sole comparison of the protein sequences. The five scoring matrices were used with the BLASTP program to compare all the protein sequences in the Swissprot database. Surprisingly, the searches revealed hundreds of highly significant frame shift matches, of which many are likely to represent sequencing errors. Others provide some evidence that frame shift mutations might be used in protein evolution as a way to create new amino acid sequences from pre-existing coding regions. PMID:7903399

  20. Segments of amino acid sequence similarity in beta-amylases.

    PubMed

    Friedberg, F; Rhodes, C

    1988-01-01

    In alpha-amylases from animals, plants and bacteria and in beta-amylases from plants and bacteria a number of segments exhibit amino acid sequence similarity specific to the alpha or to the beta type, respectively. In the case of the beta-amylases the similar sequence regions are extensive and they are disrupted only by short interspersed dissimilar regions. Close to the C terminus, however, no such sequence similarity exist. PMID:2464171

  1. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated... sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides....

  2. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated... sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides....

  3. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... acids are not intended to be embraced by this definition. Any amino acid sequence that contains post-translationally modified amino acids may be described as the amino acid sequence that is initially translated... sequence of four or more amino acids or an unbranched sequence of ten or more nucleotides....

  4. The modular structure of informational sequences.

    PubMed

    Schmitt, A O; Ebeling, W; Herzel, H

    1996-01-01

    It is shown that DNA sequences can be decomposed into smaller units much the same as texts can be decomposed into syllables, words, or groups of words. Those smaller units (modules) are extracted from DNA sequences according to statistical criteria. Tests with sequences of known modular structure (two novels and a FORTRAN source code) were performed. The rate to which DNA sequences can be decomposed into modules (modularity) turns out to be a very sensitive measure to distinguish DNA sequences from random sequences. PMID:8924645

  5. A method to find palindromes in nucleic acid sequences.

    PubMed

    Anjana, Ramnath; Shankar, Mani; Vaishnavi, Marthandan Kirti; Sekar, Kanagaraj

    2013-01-01

    Various types of sequences in the human genome are known to play important roles in different aspects of genomic functioning. Among these sequences, palindromic nucleic acid sequences are one such type that have been studied in detail and found to influence a wide variety of genomic characteristics. For a nucleotide sequence to be considered as a palindrome, its complementary strand must read the same in the opposite direction. For example, both the strands i.e the strand going from 5' to 3' and its complementary strand from 3' to 5' must be complementary. A typical nucleotide palindromic sequence would be TATA (5' to 3') and its complimentary sequence from 3' to 5' would be ATAT. Thus, a new method has been developed using dynamic programming to fetch the palindromic nucleic acid sequences. The new method uses less memory and thereby it increases the overall speed and efficiency. The proposed method has been tested using the bacterial (3891 KB bases) and human chromosomal sequences (Chr-18: 74366 kb and Chr-Y: 25554 kb) and the computation time for finding the palindromic sequences is in milli seconds. PMID:23515654

  6. Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets.

    PubMed

    Melo, Francisco; Marti-Renom, Marc A

    2006-06-01

    Reduced or simplified amino acid alphabets group the 20 naturally occurring amino acids into a smaller number of representative protein residues. To date, several reduced amino acid alphabets have been proposed, which have been derived and optimized by a variety of methods. The resulting reduced amino acid alphabets have been applied to pattern recognition, generation of consensus sequences from multiple alignments, protein folding, and protein structure prediction. In this work, amino acid substitution matrices and statistical potentials were derived based on several reduced amino acid alphabets and their performance assessed in a large benchmark for the tasks of sequence alignment and fold assessment of protein structure models, using as a reference frame the standard alphabet of 20 amino acids. The results showed that a large reduction in the total number of residue types does not necessarily translate into a significant loss of discriminative power for sequence alignment and fold assessment. Therefore, some definitions of a few residue types are able to encode most of the relevant sequence/structure information that is present in the 20 standard amino acids. Based on these results, we suggest that the use of reduced amino acid alphabets may allow to increasing the accuracy of current substitution matrices and statistical potentials for the prediction of protein structure of remote homologs. PMID:16506243

  7. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences.

    PubMed

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D; Adir, Noam

    2016-06-28

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel. PMID:27307442

  8. The amino-acid sequence of kangaroo pancreatic ribonuclease.

    PubMed

    Gaastra, W; Welling, G W; Beintema, J J

    1978-05-01

    Red kangaroo (Macropus rufus) ribonuclease was isolated from pancreatic tissue by affinity chromatography. The amino acid sequence was determined by automatic sequencing of overlapping large fragments and by analysis of shorter peptides obtained by digestion with a number of proteolytic enzymes. The polypeptide chain consists of 122 amino acid residues. Compared to other ribonucleases, the N-terminal residue and residue 114 are deleted. In other pancreatic ribonucleases position 114 is occupied by a cis proline residue in an external loop at the surface of the molecule. Other remarkable substitutions are the presence of a tyrosine residue at position 123 instead of a serine which forms a hydrogen bond with the pyrimidine ring of a nucleotide substrate, and a number of hydrophobichydrophilic interchanges in the sequence 51-55, which forms part of an alpha-helix in bovine ribonuclease and exhibits few substitutions in the placental mammals. Kangaroo ribonuclease contains no carbohydrate, although the enzyme possesses a recognition site for carbohydrate attachment in the sequence Asn-Val-Thr (62-64). The enzyme differs at about 35-40% of the positions from all other mammalian pancreatic ribonucleases sequenced to date, which is in agreement with the early divergence between the marsupials and the placental mammals. From fragmentary data a tentative sequence of red-necked wallaby (Macropus rufogriseus) pancreatic ribonuclease has been derived. Eight differences with the kangaroo sequence were found. PMID:658039

  9. Partial amino acid sequence of fructose-1,6-bisphosphatase from the blue-green algae Synechococcus leopoliensis.

    PubMed

    Marcus, F; Latshaw, S P; Steup, M; Gerbling, K P

    1989-08-01

    Purified fructose-1,6-bisphosphatase from the cyanobacterium Synechococcus leopoliensis was S-carboxymethylated and cleaved with trypsin. The resulting peptides were purified by reversed-phase high performance liquid chromatography and the amino acid sequence of six of the purified peptides was determined by gas-phase microsequencing. The results revealed sequence homology with other fructose-1,6-bisphosphatases. The obtained sequence data provides information required for the design of oligonucleotide hybridization probes to screen existing libraries of cyanobacterial DNA. The determination of the amino acid sequence of cyanobacterial proteins may yield important information with respect to the endosymbiotic theory of evolution. PMID:2550924

  10. Protein location prediction using atomic composition and global features of the amino acid sequence

    SciTech Connect

    Cherian, Betsy Sheena; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectively used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.

  11. Amino acid sequence of Salmonella typhimurium branched-chain amino acid aminotransferase.

    PubMed

    Feild, M J; Nguyen, D C; Armstrong, F B

    1989-06-13

    The complete amino acid sequence of the subunit of branched-chain amino acid aminotransferase (transaminase B, EC 2.6.1.42) of Salmonella typhimurium was determined. An Escherichia coli recombinant containing the ilvGEDAY gene cluster of Salmonella was used as the source of the hexameric enzyme. The peptide fragments used for sequencing were generated by treatment with trypsin, Staphylococcus aureus V8 protease, endoproteinase Lys-C, and cyanogen bromide. The enzyme subunit contains 308 residues and has a molecular weight of 33,920. To determine the coenzyme-binding site, the pyridoxal 5-phosphate containing enzyme was treated with tritiated sodium borohydride prior to trypsin digestion. Peptide map comparisons with an apoenzyme tryptic digest and monitoring radioactivity incorporation allowed identification of the pyridoxylated peptide, which was then isolated and sequenced. The coenzyme-binding site is the lysyl residue at position 159. The amino acid sequence of Salmonella transaminase B is 97.4% identical with that of Escherichia coli, differing in only eight amino acid positions. Sequence comparisons of transaminase B to other known aminotransferase sequences revealed limited sequence similarity (24-33%) when conserved amino acid substitutions are allowed and alignments were forced to occur on the coenzyme-binding site. PMID:2669973

  12. Nucleic Acids as Information Molecules.

    ERIC Educational Resources Information Center

    McInerney, Joseph D.

    1996-01-01

    Presents an activity that aims at enabling students to recognize that DNA and RNA are information molecules whose function is to store, copy, and make available the information in biological systems, without feeling overwhelmed by the specialized vocabulary and the minutia of the central dogma. (JRH)

  13. Amino acid sequence of bovine heart coupling factor 6.

    PubMed Central

    Fang, J K; Jacobs, J W; Kanner, B I; Racker, E; Bradshaw, R A

    1984-01-01

    The amino acid sequence of bovine heart mitochondrial coupling factor 6 (F6) has been determined by automated Edman degradation of the whole protein and derived peptides. Preparations based on heat precipitation and ethanol extraction showed allotypic variation at three positions while material further purified by HPLC yielded only one sequence that also differed by a Phe-Thr replacement at residue 62. The mature protein contains 76 amino acids with a calculated molecular weight of 9006 and a pI of approximately equal to 5, in good agreement with experimentally measured values. The charged amino acids are mainly clustered at the termini and in one section in the middle; these three polar segments are separated by two segments relatively rich in nonpolar residues. Chou-Fasman analysis suggests three stretches of alpha-helix coinciding (or within) the high-charge-density sequences with a single beta-turn at the first polar-nonpolar junction. Comparison of the F6 sequence with those of other proteins did not reveal any homologous structures. PMID:6149548

  14. Sequences Of Amino Acids For Human Serum Albumin

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  15. Amino acid sequence of the Amur tiger prion protein.

    PubMed

    Wu, Changde; Pang, Wanyong; Zhao, Deming

    2006-10-01

    Prion diseases are fatal neurodegenerative disorders in human and animal associated with conformational conversion of a cellular prion protein (PrP(C)) into the pathologic isoform (PrP(Sc)). Various data indicate that the polymorphisms within the open reading frame (ORF) of PrP are associated with the susceptibility and control the species barrier in prion diseases. In the present study, partial Prnp from 25 Amur tigers (tPrnp) were cloned and screened for polymorphisms. Four single nucleotide polymorphisms (T423C, A501G, C511A, A610G) were found; the C511A and A610G nucleotide substitutions resulted in the amino acid changes Lysine171Glutamine and Alanine204Threoine, respectively. The tPrnp amino acid sequence is similar to house cat (Felis catus ) and sheep, but differs significantly from other two cat Prnp sequences that were previously deposited in GenBank. PMID:16780982

  16. Genome Sequence of the Lactic Acid Bacterium Lactococcus lactis subsp. lactis TOMSC161, Isolated from a Nonscalded Curd Pressed Cheese

    PubMed Central

    Velly, H.; Abraham, A.-L.; Loux, V.; Delacroix-Buchet, A.; Fonseca, F.; Bouix, M.

    2014-01-01

    Lactococcus lactis is a lactic acid bacterium used in the production of many fermented foods, such as dairy products. Here, we report the genome sequence of L. lactis subsp. lactis TOMSC161, isolated from nonscalded curd pressed cheese. This genome sequence provides information in relation to dairy environment adaptation. PMID:25377704

  17. An information theoretic approach to macromolecular modeling: I. Sequence alignments.

    PubMed

    Aynechi, Tiba; Kuntz, Irwin D

    2005-11-01

    We are interested in applying the principles of information theory to structural biology calculations. In this article, we explore the information content of an important computational procedure: sequence alignment. Using a reference state developed from exhaustive sequences, we measure alignment statistics and evaluate gap penalties based on first-principle considerations and gap distributions. We show that there are different gap penalties for different alphabet sizes and that the gap penalties can depend on the length of the sequences being aligned. In a companion article, we examine the information content of molecular force fields. PMID:16254389

  18. Image encryption using random sequence generated from generalized information domain

    NASA Astrophysics Data System (ADS)

    Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu

    2016-05-01

    A novel image encryption method based on the random sequence generated from the generalized information domain and permutation–diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.

  19. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  20. Correlation between fibroin amino acid sequence and physical silk properties.

    PubMed

    Fedic, Robert; Zurovec, Michal; Sehnal, Frantisek

    2003-09-12

    The fiber properties of lepidopteran silk depend on the amino acid repeats that interact during H-fibroin polymerization. The aim of our research was to relate repeat composition to insect biology and fiber strength. Representative regions of the H-fibroin genes were sequenced and analyzed in three pyralid species: wax moth (Galleria mellonella), European flour moth (Ephestia kuehniella), and Indian meal moth (Plodia interpunctella). The amino acid repeats are species-specific, evidently a diversification of an ancestral region of 43 residues, and include three types of regularly dispersed motifs: modifications of GSSAASAA sequence, stretches of tripeptides GXZ where X and Z represent bulky residues, and sequences similar to PVIVIEE. No concatenations of GX dipeptide or alanine, which are typical for Bombyx silkworms and Antheraea silk moths, respectively, were found. Despite different repeat structure, the silks of G. mellonella and E. kuehniella exhibit similar tensile strength as the Bombyx and Antheraea silks. We suggest that in these latter two species, variations in the repeat length obstruct repeat alignment, but sufficiently long stretches of iterated residues get superposed to interact. In the pyralid H-fibroins, interactions of the widely separated and diverse motifs depend on the precision of repeat matching; silk is strong in G. mellonella and E. kuehniella, with 2-3 types of long homogeneous repeats, and nearly 10 times weaker in P. interpunctella, with seven types of shorter erratic repeats. The high proportion of large amino acids in the H-fibroin of pyralids has probably evolved in connection with the spinning habit of caterpillars that live in protective silk tubes and spin continuously, enlarging the tubes on one end and partly devouring the other one. The silk serves as a depot of energetically rich and essential amino acids that may be scarce in the diet. PMID:12816957

  1. The"minimum information about an environmental sequence" (MIENS) specification

    SciTech Connect

    Yilmaz, P.; Kottmann, R.; Field, D.; Knight, R.; Cole, J.R.; Amaral-Zettler, L.; Gilbert, J.A.; Karsch-Mizrachi, I.; Johnston, A.; Cochrane, G.; Vaughan, R.; Hunter, C.; Park, J.; Morrison, N.; Rocca-Serra, P.; Sterk, P.; Arumugam, M.; Baumgartner, L.; Birren, B.W.; Blaser, M.J.; Bonazzi, V.; Bork, P.; Buttigieg, P. L.; Chain, P.; Costello, E.K.; Huot-Creasy, H.; Dawyndt, P.; DeSantis, T.; Fierer, N.; Fuhrman, J.; Gallery, R.E.; Gibbs, R.A.; Giglio, M.G.; Gil, I. San; Gonzalez, A.; Gordon, J.I.; Guralnick, R.; Hankeln, W.; Highlander, S.; Hugenholtz, P.; Jansson, J.; Kennedy, J.; Knights, D.; Koren, O.; Kuczynski, J.; Kyrpides, N.; Larsen, R.; Lauber, C.L.; Legg, T.; Ley, R.E.; Lozupone, C.A.; Ludwig, W.; Lyons, D.; Maguire, E.; Methe, B.A.; Meyer, F.; Nakieny, S.; Nelson, K.E.; Nemergut, D.; Neufeld, J.D.; Pace, N.R.; Palanisamy, G.; Peplies, J.; Peterson, J.; Petrosino, J.; Proctor, L.; Raes, J.; Ratnasingham, S.; Ravel, J.; Relman, D.A.; Assunta-Sansone, S.; Schriml, L.; Sodergren, E.; Spor, A.; Stombaugh, J.; Tiedje, J.M.; Ward, D.V.; Weinstock, G.M.; Wendel, D.; White, O.; Wikle, A.; Wortman, J.R.; Glockner, F.O.; Bushman, F.D.; Charlson, E.; Gevers, D.; Kelley, S.T.; Neubold, L.K.; Oliver, A.E.; Pruesse, E.; Quast, C.; Schloss, P.D.; Sinha, R.; Whitely, A.

    2010-10-15

    We present the Genomic Standards Consortium's (GSC) 'Minimum Information about an ENvironmental Sequence' (MIENS) standard for describing marker genes. Adoption of MIENS will enhance our ability to analyze natural genetic diversity across the Tree of Life as it is currently being documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.

  2. Highly Informative Simple Sequence Repeat (SSR) Markers for Fingerprinting Hazelnut

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Simple sequence repeat (SSR) or microsatellite markers have many applications in breeding and genetic studies of plants, including fingerprinting of cultivars and investigations of genetic diversity, and therefore provide information for better management of germplasm collections. They are repeatab...

  3. Characterization and amino acid sequence of a fatty acid-binding protein from human heart.

    PubMed

    Offner, G D; Brecher, P; Sawlivich, W B; Costello, C E; Troxler, R F

    1988-05-15

    The complete amino acid sequence of a fatty acid-binding protein from human heart was determined by automated Edman degradation of CNBr, BNPS-skatole [3'-bromo-3-methyl-2-(2-nitrobenzenesulphenyl)indolenine], hydroxylamine, Staphylococcus aureus V8 proteinase, tryptic and chymotryptic peptides, and by digestion of the protein with carboxypeptidase A. The sequence of the blocked N-terminal tryptic peptide from citraconylated protein was determined by collisionally induced decomposition mass spectrometry. The protein contains 132 amino acid residues, is enriched with respect to threonine and lysine, lacks cysteine, has an acetylated valine residue at the N-terminus, and has an Mr of 14768 and an isoelectric point of 5.25. This protein contains two short internal repeated sequences from residues 48-54 and from residues 114-119 located within regions of predicted beta-structure and decreasing hydrophobicity. These short repeats are contained within two longer repeated regions from residues 48-60 and residues 114-125, which display 62% sequence similarity. These regions could accommodate the charged and uncharged moieties of long-chain fatty acids and may represent fatty acid-binding domains consistent with the finding that human heart fatty acid-binding protein binds 2 mol of oleate or palmitate/mol of protein. Detailed evidence for the amino acid sequences of the peptides has been deposited as Supplementary Publication SUP 50143 (23 pages) at the British Library Lending Division, Boston Spa, Yorkshire LS23 7BQ, U.K., from whom copies may be obtained as indicated in Biochem. J. (1988) 249, 5. PMID:3421901

  4. Molecular cloning and amino acid sequence of human 5-lipoxygenase

    SciTech Connect

    Matsumoto, T.; Funk, C.D.; Radmark, O.; Hoeoeg, J.O.; Joernvall, H.; Samuelsson, B.

    1988-01-01

    5-Lipoxygenase (EC 1.13.11.34), a Ca/sup 2 +/- and ATP-requiring enzyme, catalyzes the first two steps in the biosynthesis of the peptidoleukotrienes and the chemotactic factor leukotriene B/sub 4/. A cDNA clone corresponding to 5-lipoxygenase was isolated from a human lung lambda gt11 expression library by immunoscreening with a polyclonal antibody. Additional clones from a human placenta lambda gt11 cDNA library were obtained by plaque hybridization with the /sup 32/P-labeled lung cDNA clone. Sequence data obtained from several overlapping clones indicate that the composite DNAs contain the complete coding region for the enzyme. From the deduced primary structure, 5-lipoxygenase encodes a 673 amino acid protein with a calculated molecular weight of 77,839. Direct analysis of the native protein and its proteolytic fragments confirmed the deduced composition, the amino-terminal amino acid sequence, and the structure of many internal segments. 5-Lipoxygenase has no apparent sequence homology with leukotriene A/sub 4/ hydrolase or Ca/sup 2 +/-binding proteins. RNA blot analysis indicated substantial amounts of an mRNA species of approx. = 2700 nucleotides in leukocytes, lung, and placenta.

  5. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    DOEpatents

    Nolan, John P.; White, P. Scott

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  6. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications.

    SciTech Connect

    Yilmaz, P.; Kottmann, R.; Field, D.; Knight, R.; Cole, J. R.; Amaral-Zettler, L.; Gilbert, J. A.

    2011-05-01

    Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences - the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The 'environmental packages' apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.

  7. Self-sequencing of amino acids and origins of polyfunctional protocells.

    PubMed

    Fox, S W

    1984-01-01

    The primal role of the origins of proteins in molecular evolution is discussed. On the basis of this premise, the significance of the experimentally established self-sequencing of amino acids under simulated geological conditions is explained as due to the fact that the products are highly nonrandom and accordingly contain many kinds of information. When such thermal proteins are aggregated into laboratory protocells, an action that occurs readily, the resultant protocells also contain many kinds of information. Residue-by-residue order, enzymic activities, and lipid quality accordingly occur within each preparation of proteinoid (thermal protein). In this paper are reviewed briefly the phenomenon of self-sequencing of amino acids, its relationship to evolutionary processes, other significance of such self-ordering, and the experimental evidence for original polyfunctional protocells. PMID:6462684

  8. Self-Sequencing of Amino Acids and Origins of Polyfunctional Protocells

    NASA Astrophysics Data System (ADS)

    Fox, Sidney W.

    1984-12-01

    The primal role of the origins of proteins in molecular evolution is discussed. On the basis of this premise, the significance of the experimentally established self-sequencing of amino acids under simulated geological conditions is explained as due to the fact that the products are highly nonrandom and accordingly contain many kinds of information. When such thermal proteins are aggregated into laboratory protocells, an action that occurs readily, the resultant protocells also contain many kinds of information. Residue-by-residue order, enzymic activities, and lipid quality accordingly occur within each preparation of proteinoid (thermal protein). In this paper are reviewed briefly the phenomenon of self-sequencing of amino acids, its relationship to evolutionary processes, other significance of such self-ordering, and the experimental evidence for original polyfunctional protocells.

  9. Sequence databases: integrated information retrieval and data submission.

    PubMed

    Weisemann, J M; Boguski, M S; Ouellette, B F

    2001-05-01

    This unit describes the NCBI's Entrez database browser. Entrez integrates DNA and protein sequence data, three dimensional structures, and taxonomic information with its associated abstracts and citations contained in PubMed (MEDLINE). It is possible to search the Entrez information space using conventional search queries (authors, gene names, map location) as well as by bibliographic associations (articles that are related to one another) and sequence homology. Also described are the procedures for submission of new data, updates, and corrections to the sequence databases. PMID:18428302

  10. Self-sequencing of amino acids and origins of polyfunctional protocells

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1984-01-01

    The role of proteins in the origin of living things is discussed. It has been experimentally established that amino acids can sequence themselves under simulated geological conditions with highly nonrandom products which accordingly contain diverse information. Multiple copies of each type of macromolecule are formed, resulting in greater power for any protoenzymic molecule than would accrue from a single copy of each type. Thermal proteins are readily incorporated into laboratory protocells. The experimental evidence for original polyfunctional protocells is discussed.

  11. Blasting and Zipping: Sequence Alignment and Mutual Information

    NASA Astrophysics Data System (ADS)

    Penner, Orion; Grassberger, Peter; Paczuski, Maya

    2009-03-01

    Alignment of biological sequences such as DNA, RNA or proteins is one of the most widely used tools in computational bioscience. While the accomplishments of sequence alignment algorithms are undeniable the fact remains that these algorithms are based upon heuristic scoring schemes. Therefore, these algorithms do not provide model independent and objective measures for how similar two (or more) sequences actually are. Although information theory provides such a similarity measure - the mutual information (MI) - numerous previous attempts to connect sequence alignment and information have not produced realistic estimates for the MI from a given alignment. We report on a simple and flexible approach to get robust estimates of MI from global alignments. The presented results may help establish MI as a reliable tool for evaluating the quality of global alignments, judging the relative merits of different alignment algorithms, and estimating the significance of specific alignments.

  12. Amino acid sequence prerequisites for the formation of cn ions.

    PubMed

    Downard, K M; Biemann, K

    1993-11-01

    Ammo acid sequence prerequisites are described for the formation of c, ions observed in high-energy collision-induced decomposition spectra of peptides. It is shown that the formation of cn ions is promoted by the nature of the amino acid C-terminal to the cleavage site. A propensity for cn cleavage preceding threonine, and to a lesser extent tryptophan, lysine, and serine, is demonstrated where fragmentation is directed N-terminally at these residues. In addition, the nature of the residue N-terminal to the cleavage site is shown to have little effect on cn ion formation. A mechanism for cn ion formation is proposed and its applicability to the results observed is discussed. PMID:24227531

  13. Ultrasensitive nucleic acid sequence detection by single-molecule electrophoresis

    SciTech Connect

    Castro, A; Shera, E.B.

    1996-09-01

    This is the final report of a one-year laboratory-directed research and development project at Los Alamos National Laboratory. There has been considerable interest in the development of very sensitive clinical diagnostic techniques over the last few years. Many pathogenic agents are often present in extremely small concentrations in clinical samples, especially at the initial stages of infection, making their detection very difficult. This project sought to develop a new technique for the detection and accurate quantification of specific bacterial and viral nucleic acid sequences in clinical samples. The scheme involved the use of novel hybridization probes for the detection of nucleic acids combined with our recently developed technique of single-molecule electrophoresis. This project is directly relevant to the DOE`s Defense Programs strategic directions in the area of biological warfare counter-proliferation.

  14. Property-based sequence representations do not adequately encode local protein folding information.

    PubMed

    Solis, A D; Rackovsky, S

    2007-06-01

    We examine the informatic characteristics of amino acid representations based on physical properties. We demonstrate that sequences rewritten using contracted alphabets based on physical properties do not encode local folding information well. The best four-character alphabet can only encode approximately 57% of the maximum possible amount of structural information. This result suggests that property-based representations that operate on a local length scale are not likely to be useful in homology searches and fold-recognition exercises. PMID:17387739

  15. PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities

    PubMed Central

    2011-01-01

    Background Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. Results The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. Conclusions PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/. PMID:21385349

  16. N-Terminal Amino Acid Sequence Determination of Proteins by N-Terminal Dimethyl Labeling: Pitfalls and Advantages When Compared with Edman Degradation Sequence Analysis.

    PubMed

    Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P; Marians, Kenneth J; Erdjument-Bromage, Hediye

    2016-07-01

    In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods. PMID:27006647

  17. N-Terminal Amino Acid Sequence Determination of Proteins by N-Terminal Dimethyl Labeling: Pitfalls and Advantages When Compared with Edman Degradation Sequence Analysis

    PubMed Central

    Chang, Elizabeth; Pourmal, Sergei; Zhou, Chun; Kumar, Rupesh; Teplova, Marianna; Pavletich, Nikola P.; Marians, Kenneth J.

    2016-01-01

    In recent history, alternative approaches to Edman sequencing have been investigated, and to this end, the Association of Biomolecular Resource Facilities (ABRF) Protein Sequencing Research Group (PSRG) initiated studies in 2014 and 2015, looking into bottom-up and top-down N-terminal (Nt) dimethyl derivatization of standard quantities of intact proteins with the aim to determine Nt sequence information. We have expanded this initiative and used low picomole amounts of myoglobin to determine the efficiency of Nt-dimethylation. Application of this approach on protein domains, generated by limited proteolysis of overexpressed proteins, confirms that it is a universal labeling technique and is very sensitive when compared with Edman sequencing. Finally, we compared Edman sequencing and Nt-dimethylation of the same polypeptide fragments; results confirm that there is agreement in the identity of the Nt amino acid sequence between these 2 methods. PMID:27006647

  18. Use of a structural alphabet to find compatible folds for amino acid sequences

    PubMed Central

    Mahajan, Swapnil; de Brevern, Alexandre G; Sanejouand, Yves-Henri; Srinivasan, Narayanaswamy; Offmann, Bernard

    2015-01-01

    The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence-search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino-acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as “Protein Blocks” (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence-search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z-score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales-up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web-server that is freely available at http://www.bo-protscience.fr/forsa. PMID:25297700

  19. Use of a structural alphabet to find compatible folds for amino acid sequences.

    PubMed

    Mahajan, Swapnil; de Brevern, Alexandre G; Sanejouand, Yves-Henri; Srinivasan, Narayanaswamy; Offmann, Bernard

    2015-01-01

    The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence-search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino-acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as "Protein Blocks" (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence-search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z-score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales-up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web-server that is freely available at http://www.bo-protscience.fr/forsa. PMID:25297700

  20. Efficient Nucleic Acid Extraction and 16S rRNA Gene Sequencing for Bacterial Community Characterization.

    PubMed

    Anahtar, Melis N; Bowman, Brittany A; Kwon, Douglas S

    2016-01-01

    There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information. PMID:27168460

  1. Efficient Nucleic Acid Extraction and 16S rRNA Gene Sequencing for Bacterial Community Characterization

    PubMed Central

    Anahtar, Melis N.; Bowman, Brittany A.; Kwon, Douglas S.

    2016-01-01

    There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information. PMID:27168460

  2. Nucleotide sequence of the phosphoglycerate kinase gene from the extreme thermophile Thermus thermophilus. Comparison of the deduced amino acid sequence with that of the mesophilic yeast phosphoglycerate kinase.

    PubMed Central

    Bowen, D; Littlechild, J A; Fothergill, J E; Watson, H C; Hall, L

    1988-01-01

    Using oligonucleotide probes derived from amino acid sequencing information, the structural gene for phosphoglycerate kinase from the extreme thermophile, Thermus thermophilus, was cloned in Escherichia coli and its complete nucleotide sequence determined. The gene consists of an open reading frame corresponding to a protein of 390 amino acid residues (calculated Mr 41,791) with an extreme bias for G or C (93.1%) in the codon third base position. Comparison of the deduced amino acid sequence with that of the corresponding mesophilic yeast enzyme indicated a number of significant differences. These are discussed in terms of the unusual codon bias and their possible role in enhanced protein thermal stability. Images Fig. 1. PMID:3052437

  3. Sequence information signal processor for local and global string comparisons

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1997-01-01

    A sequence information signal processing integrated circuit chip designed to perform high speed calculation of a dynamic programming algorithm based upon the algorithm defined by Waterman and Smith. The signal processing chip of the present invention is designed to be a building block of a linear systolic array, the performance of which can be increased by connecting additional sequence information signal processing chips to the array. The chip provides a high speed, low cost linear array processor that can locate highly similar global sequences or segments thereof such as contiguous subsequences from two different DNA or protein sequences. The chip is implemented in a preferred embodiment using CMOS VLSI technology to provide the equivalent of about 400,000 transistors or 100,000 gates. Each chip provides 16 processing elements, and is designed to provide 16 bit, two's compliment operation for maximum score precision of between -32,768 and +32,767. It is designed to provide a comparison between sequences as long as 4,194,304 elements without external software and between sequences of unlimited numbers of elements with the aid of external software. Each sequence can be assigned different deletion and insertion weight functions. Each processor is provided with a similarity measure device which is independently variable. Thus, each processor can contribute to maximum value score calculation using a different similarity measure.

  4. Experiences with Obtaining Informed Consent for Genomic Sequencing

    PubMed Central

    Bernhardt, Barbara A.; Roche, Myra I.; Perry, Denise L.; Scollon, Sarah R.; Tomlinson, Ashley N.; Skinner, Debra

    2016-01-01

    Despite the increased utilization of genome and exome sequencing, little is known about the actual content and process of informed consent for sequencing. We addressed this by interviewing 29 genetic counselors and research coordinators experienced in obtaining informed consent for sequencing in research and clinical settings. Interviews focused on the process and content of informed consent; patients/participants’ common questions, concerns and misperceptions; and challenges to obtaining informed consent. Content analysis of transcribed interviews revealed that the main challenges to obtaining consent related to the broad scope and uncertainty of results, and patient/ participants’ unrealistic expectations about the likely number and utility of results. Interviewees modified their approach to sessions according to contextual issues surrounding the indication for testing, type of patient, and timing of testing. With experience, most interviewees structured sessions to place less emphasis on standard elements in the consent form and technological aspects of sequencing. They instead focused on addressing misperceptions and helping patients/participants develop realistic expectations about the types and implications of possible results, including secondary findings. These findings suggest that informed consent sessions should focus on key issues that may be misunderstood by patients/participants. Future research should address the extent to which various stakeholders agree on key elements of informed consent. PMID:26198374

  5. Reevaluation of the significance of sequence information for speech recognition

    NASA Astrophysics Data System (ADS)

    Sarukkai, Ramesh R.; Ballard, Dana H.

    1994-12-01

    A central difficulty with automatic speech recognition is the temporally inaccurate nature of the speech signal. Despite this, speech has been traditionally modeled as a purely sequential (albeit probabilistic) process. The usefulness of accurate sequence information is reevaluated in this paper, both at the acoustic and lexical levels for the task of speech recognition. At the acoustic level, speech segments are quantized into discrete vectors, and converted into set representations as opposed to accurate sequences. Recognition of the quantized vector sets dramatically improved performance as contrasted with the corresponding vector sequence representations. At the lexical level, our study suggests that accurate sequence information is, again, not crucial. In fact locally discarding phoneme sequence information may be useful for coping with errors (such as insertion, substitution). Based on the idea of phone set indexing, a lexical access algorithm is developed. Thus, this work questions the traditional approach of modeling speech as a purely sequential process, and suggests that discarding local sequential information may be a good idea. As an alternative to a purely sequential representation, a set representation seems to be a viable option.

  6. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... in the sequence. (4) The enumeration of amino acids may start at the first amino acid of the first..., counting backwards starting with the amino acid next to number 1. Otherwise, the enumeration of amino acids... sequence every 5 amino acids. The enumeration method for amino acid sequences that is set forth......

  7. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... in the sequence. (4) The enumeration of amino acids may start at the first amino acid of the first..., counting backwards starting with the amino acid next to number 1. Otherwise, the enumeration of amino acids... sequence every 5 amino acids. The enumeration method for amino acid sequences that is set forth......

  8. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... in the sequence. (4) The enumeration of amino acids may start at the first amino acid of the first..., counting backwards starting with the amino acid next to number 1. Otherwise, the enumeration of amino acids... sequence every 5 amino acids. The enumeration method for amino acid sequences that is set forth......

  9. Predicting protein disorder by analyzing amino acid sequence

    PubMed Central

    Yang, Jack Y; Yang, Mary Qu

    2008-01-01

    Background Many protein regions and some entire proteins have no definite tertiary structure, presenting instead as dynamic, disorder ensembles under different physiochemical circumstances. These proteins and regions are known as Intrinsically Unstructured Proteins (IUP). IUP have been associated with a wide range of protein functions, along with roles in diseases characterized by protein misfolding and aggregation. Results Identifying IUP is important task in structural and functional genomics. We exact useful features from sequences and develop machine learning algorithms for the above task. We compare our IUP predictor with PONDRs (mainly neural-network-based predictors), disEMBL (also based on neural networks) and Globplot (based on disorder propensity). Conclusion We find that augmenting features derived from physiochemical properties of amino acids (such as hydrophobicity, complexity etc.) and using ensemble method proved beneficial. The IUP predictor is a viable alternative software tool for identifying IUP protein regions and proteins. PMID:18831799

  10. Negative Ion In-Source Decay Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry for Sequencing Acidic Peptides

    NASA Astrophysics Data System (ADS)

    McMillen, Chelsea L.; Wright, Patience M.; Cassady, Carolyn J.

    2016-05-01

    Matrix-assisted laser desorption/ionization (MALDI) in-source decay was studied in the negative ion mode on deprotonated peptides to determine its usefulness for obtaining extensive sequence information for acidic peptides. Eight biological acidic peptides, ranging in size from 11 to 33 residues, were studied by negative ion mode ISD (nISD). The matrices 2,5-dihydroxybenzoic acid, 2-aminobenzoic acid, 2-aminobenzamide, 1,5-diaminonaphthalene, 5-amino-1-naphthol, 3-aminoquinoline, and 9-aminoacridine were used with each peptide. Optimal fragmentation was produced with 1,5-diaminonphthalene (DAN), and extensive sequence informative fragmentation was observed for every peptide except hirudin(54-65). Cleavage at the N-Cα bond of the peptide backbone, producing c' and z' ions, was dominant for all peptides. Cleavage of the N-Cα bond N-terminal to proline residues was not observed. The formation of c and z ions is also found in electron transfer dissociation (ETD), electron capture dissociation (ECD), and positive ion mode ISD, which are considered to be radical-driven techniques. Oxidized insulin chain A, which has four highly acidic oxidized cysteine residues, had less extensive fragmentation. This peptide also exhibited the only charged localized fragmentation, with more pronounced product ion formation adjacent to the highly acidic residues. In addition, spectra were obtained by positive ion mode ISD for each protonated peptide; more sequence informative fragmentation was observed via nISD for all peptides. Three of the peptides studied had no product ion formation in ISD, but extensive sequence informative fragmentation was found in their nISD spectra. The results of this study indicate that nISD can be used to readily obtain sequence information for acidic peptides.

  11. The Complete Genome Sequence of the Lactic Acid Bacterium Lactococcus lactis ssp. lactis IL1403

    PubMed Central

    Bolotin, Alexander; Wincker, Patrick; Mauger, Stéphane; Jaillon, Olivier; Malarme, Karine; Weissenbach, Jean; Ehrlich, S. Dusko; Sorokin, Alexei

    2001-01-01

    Lactococcus lactis is a nonpathogenic AT-rich gram-positive bacterium closely related to the genus Streptococcus and is the most commonly used cheese starter. It is also the best-characterized lactic acid bacterium. We sequenced the genome of the laboratory strain IL1403, using a novel two-step strategy that comprises diagnostic sequencing of the entire genome and a shotgun polishing step. The genome contains 2,365,589 base pairs and encodes 2310 proteins, including 293 protein-coding genes belonging to six prophages and 43 insertion sequence (IS) elements. Nonrandom distribution of IS elements indicates that the chromosome of the sequenced strain may be a product of recent recombination between two closely related genomes. A complete set of late competence genes is present, indicating the ability of L. lactis to undergo DNA transformation. Genomic sequence revealed new possibilities for fermentation pathways and for aerobic respiration. It also indicated a horizontal transfer of genetic information from Lactococcus to gram-negative enteric bacteria of Salmonella-Escherichia group. [The sequence data described in this paper has been submitted to the GenBank data library under accession no. AE005176.] PMID:11337471

  12. MACSIMS : multiple alignment of complete sequences information management system

    PubMed Central

    Thompson, Julie D; Muller, Arnaud; Waterhouse, Andrew; Procter, Jim; Barton, Geoffrey J; Plewniak, Frédéric; Poch, Olivier

    2006-01-01

    Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at . PMID:16792820

  13. Integrating Information Literacy with a Sequenced English Composition Curriculum

    ERIC Educational Resources Information Center

    Holliday, Wendy; Fagerheim, Britt

    2006-01-01

    This article details the process of implementing a sequenced information literacy program for two core English composition courses at Utah State University. An extensive needs assessment guided the project, leading to a curriculum design process with the goal of building a foundation for deeper critical thinking skills. The curriculum development…

  14. Peptide sequencing by using a combination of partial acid hydrolysis and fast-atom-bombardment mass spectrometry.

    PubMed Central

    De Angelis, F; Botta, M; Ceccarelli, S; Nicoletti, R

    1986-01-01

    To overcome the limit of the intensity of ions carrying sequence information in structural determinations of peptides by fast-atom-bombardment m.s., we have developed a method that consists in taking spectra of the peptide acid hydrolysates at different hydrolysis times. Peaks correspond to the oligomers arising from the peptide partial hydrolysis. The sequence can then be identified from the structurally overlapping fragments. PMID:2428356

  15. Structural gene and complete amino acid sequence of Pseudomonas aeruginosa IFO 3455 elastase.

    PubMed Central

    Fukushima, J; Yamamoto, S; Morihara, K; Atsumi, Y; Takeuchi, H; Kawamoto, S; Okuda, K

    1989-01-01

    The DNA encoding the elastase of Pseudomonas aeruginosa IFO 3455 was cloned, and its complete nucleotide sequence was determined. When the cloned gene was ligated to pUC18, the Escherichia coli expression vector, bacteria carrying the gene exhibited high levels of both elastase activity and elastase antigens. The amino acid sequence, deduced from the nucleotide sequence, revealed that the mature elastase consisted of 301 amino acids with a relative molecular mass of 32,926 daltons. The amino acid composition predicted from the DNA sequence was quite similar to the chemically determined composition of purified elastase reported previously. We also observed nucleotide sequence encoding a signal peptide and "pro" sequence consisting of 197 amino acids upstream from the mature elastase protein gene. The amino acid sequence analysis revealed that both the N-terminal sequence of the purified elastase and the N-terminal side sequences of the C-terminal tryptic peptide as well as the internal lysyl peptide fragment were completely identical to the deduced amino acid sequences. The pattern of identity of amino acid sequences was quite evident in the regions that include structurally and functionally important residues of Bacillus subtilis thermolysin. PMID:2493453

  16. Human Retroviruses and AIDS. A compilation and analysis of nucleic acid and amino acid sequences: I--II; III--V

    SciTech Connect

    Myers, G.; Korber, B.; Wain-Hobson, S.; Smith, R.F.; Pavlakis, G.N.

    1993-12-31

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (I) HIV and SIV Nucleotide Sequences; (II) Amino Acid Sequences; (III) Analyses; (IV) Related Sequences; and (V) Database Communications. Information within all the parts is updated at least twice in each year, which accounts for the modes of binding and pagination in the compendium.

  17. Natural vs. random protein sequences: Discovering combinatorics properties on amino acid words.

    PubMed

    Santoni, Daniele; Felici, Giovanni; Vergni, Davide

    2016-02-21

    Casual mutations and natural selection have driven the evolution of protein amino acid sequences that we observe at present in nature. The question about which is the dominant force of proteins evolution is still lacking of an unambiguous answer. Casual mutations tend to randomize protein sequences while, in order to have the correct functionality, one expects that selection mechanisms impose rigid constraints on amino acid sequences. Moreover, one also has to consider that the space of all possible amino acid sequences is so astonishingly large that it could be reasonable to have a well tuned amino acid sequence indistinguishable from a random one. In order to study the possibility to discriminate between random and natural amino acid sequences, we introduce different measures of association between pairs of amino acids in a sequence, and apply them to a dataset of 1047 natural protein sequences and 10,470 random sequences, carefully generated in order to preserve the relative length and amino acid distribution of the natural proteins. We analyze the multidimensional measures with machine learning techniques and show that, to a reasonable extent, natural protein sequences can be differentiated from random ones. PMID:26656109

  18. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza

    PubMed Central

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  19. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza.

    PubMed

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  20. Matrix genes of measles virus and canine distemper virus: cloning, nucleotide sequences, and deduced amino acid sequences.

    PubMed Central

    Bellini, W J; Englund, G; Richardson, C D; Rozenblatt, S; Lazzarini, R A

    1986-01-01

    The nucleotide sequences encoding the matrix (M) proteins of measles virus (MV) and canine distemper virus (CDV) were determined from cDNA clones containing these genes in their entirety. In both cases, single open reading frames specifying basic proteins of 335 amino acid residues were predicted from the nucleotide sequences. Both viral messages were composed of approximately 1,450 nucleotides and contained 400 nucleotides of presumptive noncoding sequences at their respective 3' ends. MV and CDV M-protein-coding regions were 67% homologous at the nucleotide level and 76% homologous at the amino acid level. Only chance homology was observed in the 400-nucleotide trailer sequences. Comparisons of the M protein sequences of MV and CDV with the sequence reported for Sendai virus (B. M. Blumberg, K. Rose, M. G. Simona, L. Roux, C. Giorgi, and D. Kolakofsky, J. Virol. 52:656-663; Y. Hidaka, T. Kanda, K. Iwasaki, A. Nomoto, T. Shioda, and H. Shibuta, Nucleic Acids Res. 12:7965-7973) indicated the greatest homology among these M proteins in the carboxyterminal third of the molecule. Secondary-structure analyses of this shared region indicated a structurally conserved, hydrophobic sequence which possibly interacted with the lipid bilayer. Images PMID:3754588

  1. Detection and isolation of nucleic acid sequences using a bifunctional hybridization probe

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2000-01-01

    A method for detecting and isolating a target sequence in a sample of nucleic acids is provided using a bifunctional hybridization probe capable of hybridizing to the target sequence that includes a detectable marker and a first complexing agent capable of forming a binding pair with a second complexing agent. A kit is also provided for detecting a target sequence in a sample of nucleic acids using a bifunctional hybridization probe according to this method.

  2. Always Look on Both Sides: Phylogenetic Information Conveyed by Simple Sequence Repeat Allele Sequences

    PubMed Central

    Barthe, Stéphanie; Gugerli, Felix; Barkley, Noelle A.; Maggia, Laurent; Cardi, Céline; Scotti, Ivan

    2012-01-01

    Simple sequence repeat (SSR) markers are widely used tools for inferences about genetic diversity, phylogeography and spatial genetic structure. Their applications assume that variation among alleles is essentially caused by an expansion or contraction of the number of repeats and that, accessorily, mutations in the target sequences follow the stepwise mutation model (SMM). Generally speaking, PCR amplicon sizes are used as direct indicators of the number of SSR repeats composing an allele with the data analysis either ignoring the extent of allele size differences or assuming that there is a direct correlation between differences in amplicon size and evolutionary distance. However, without precisely knowing the kind and distribution of polymorphism within an allele (SSR and the associated flanking region (FR) sequences), it is hard to say what kind of evolutionary message is conveyed by such a synthetic descriptor of polymorphism as DNA amplicon size. In this study, we sequenced several SSR alleles in multiple populations of three divergent tree genera and disentangled the types of polymorphisms contained in each portion of the DNA amplicon containing an SSR. The patterns of diversity provided by amplicon size variation, SSR variation itself, insertions/deletions (indels), and single nucleotide polymorphisms (SNPs) observed in the FRs were compared. Amplicon size variation largely reflected SSR repeat number. The amount of variation was as large in FRs as in the SSR itself. The former contributed significantly to the phylogenetic information and sometimes was the main source of differentiation among individuals and populations contained by FR and SSR regions of SSR markers. The presence of mutations occurring at different rates within a marker’s sequence offers the opportunity to analyse evolutionary events occurring on various timescales, but at the same time calls for caution in the interpretation of SSR marker data when the distribution of within

  3. Partial amino acid sequence of human factor D:homology with serine proteases.

    PubMed Central

    Volanakis, J E; Bhown, A; Bennett, J C; Mole, J E

    1980-01-01

    Human factor D purified to homogeneity by a modified procedure was subjected to NH2-terminal amino acid sequence analysis by using a modified automated Beckman sequencer. We identified 48 of the first 57 NH2-terminal amino acids in a single sequencer run, using microgram quantities of factor D. The deduced amino acid sequence represents approximately 25% of the primary structure of factor D. This extended NH2-terminal amino acid sequence of factor D was compared to that of other trypsin-related serine proteases. By visual inspection, strong homologies (33--50% identity) were observed with all the serine proteases included in the comparison. Interestingly, factor D showed a higher degree of homology to serine proteases of pancreatic origin than to those of serum origin. Images PMID:6987665

  4. Amino acid sequence of Japanese quail (Coturnix japonica) and northern bobwhite (Colinus virginianus) myoglobin.

    PubMed

    Goodson, John; Beckstead, Robert B; Payne, Jason; Singh, Rakesh K; Mohan, Anand

    2015-08-15

    Myoglobin has an important physiological role in vertebrates, and as the primary sarcoplasmic pigment in meat, influences quality perception and consumer acceptability. In this study, the amino acid sequences of Japanese quail and northern bobwhite myoglobin were deduced by cDNA cloning of the coding sequence from mRNA. Japanese quail myoglobin was isolated from quail cardiac muscles, purified using ammonium sulphate precipitation and gel-filtration, and subjected to multiple enzymatic digestions. Mass spectrometry corroborated the deduced protein amino acid sequence at the protein level. Sequence analysis revealed both species' myoglobin structures consist of 153 amino acids, differing at only three positions. When compared with chicken myoglobin, Japanese quail showed 98% sequence identity, and northern bobwhite 97% sequence identity. The myoglobin in both quail species contained eight histidine residues instead of the nine present in chicken and turkey. PMID:25794748

  5. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.

  6. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-03-24

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.

  7. tax and rex Sequences of bovine leukaemia virus from globally diverse isolates: rex amino acid sequence more variable than tax.

    PubMed

    McGirr, K M; Buehring, G C

    2005-02-01

    Bovine leukaemia virus (BLV) is an important agricultural problem with high costs to the dairy industry. Here, we examine the variation of the tax and rex genes of BLV. The tax and rex genes share 420 bases and have overlapping reading frames. The tax gene encodes a protein that functions as a transactivator of the BLV promoter, is required for viral replication, acts on cellular promoters, and is responsible for oncogenesis. The rex facilitates the export of viral mRNAs from the nucleus and regulates transcription. We have sequenced five new isolates of the tax/rex gene. We examined the five new and three previously published tax/rex DNA and predicted amino acid sequences of BLV isolates from cattle in representative regions worldwide. The highest variation among nucleic acid sequences for tax and rex was 7% and 5%, respectively; among predicted amino acid sequences for Tax and Rex, 9% and 11%, respectively. Significantly more nucleotide changes resulted in predicted amino acid changes in the rex gene than in the tax gene (P < or = 0.0006). This variability is higher than previously reported for any region of the viral genome. This research may also have implications for the development of Tax-based vaccines. PMID:15702995

  8. Swfoldrate: predicting protein folding rates from amino acid sequence with sliding window method.

    PubMed

    Cheng, Xiang; Xiao, Xuan; Wu, Zhi-cheng; Wang, Pu; Lin, Wei-zhong

    2013-01-01

    Protein folding is the process by which a protein processes from its denatured state to its specific biologically active conformation. Understanding the relationship between sequences and the folding rates of proteins remains an important challenge. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. In this study, the long-range and short-range contact in protein were used to derive extended version of the pseudo amino acid composition based on sliding window method. This method is capable of predicting the protein folding rates just from the amino acid sequence without the aid of any structural class information. We systematically studied the contributions of individual features to folding rate prediction. The optimal feature selection procedures are adopted by means of combining the forward feature selection and sequential backward selection method. Using the jackknife cross validation test, the method was demonstrated on the large dataset. The predictor was achieved on the basis of multitudinous physicochemical features and statistical features from protein using nonlinear support vector machine (SVM) regression model, the method obtained an excellent agreement between predicted and experimentally observed folding rates of proteins. The correlation coefficient is 0.9313 and the standard error is 2.2692. The prediction server is freely available at http://www.jci-bioinfo.cn/swfrate/input.jsp. PMID:22933332

  9. The amino acid sequence of protein CM-3 from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Joubert, F J

    1985-01-01

    Protein CM-3 from Dendroaspis polylepis polylepis venom was purified by gel filtration and ion exchange chromatography. It comprises 65 amino acids including eight half-cystines. The complete amino acid sequence of protein CM-3 has been elucidated. The sequence (residues 1-50) resembles that of the N-terminal sequence of the subunits of a synergistic type protein and residues 51-65 that of the C-terminal sequence of an angusticeps type protein. Mixtures of protein CM-3 and angusticeps type proteins showed no apparent synergistic effect, in that their toxicity in combination was no greater than the sum of their individual toxicities. PMID:4029488

  10. Isolation and amino acid sequences of squirrel monkey (Saimiri sciurea) insulin and glucagon.

    PubMed Central

    Yu, J H; Eng, J; Yalow, R S

    1990-01-01

    It was reported two decades ago that insulin was not detectable in the glucose-stimulated state in Saimiri sciurea, the New World squirrel monkey, by a radioimmunoassay system developed with guinea pig anti-pork insulin antibody and labeled pork insulin. With the same system, reasonable levels were observed in rhesus monkeys and chimpanzees. This suggested that New World monkeys, like the New World hystricomorph rodents such as the guinea pig and the coypu, might have insulins whose sequences differ markedly from those of Old World mammals. In this report we describe the purification and amino acid sequences of squirrel monkey insulin and glucagon. We demonstrate that the substitutions at B29, B27, A2, A4, and A17 of squirrel monkey insulin are identical with those previously found in another New World primate, the owl monkey (Aotus trivirgatus). The immunologic cross-reactivity of this insulin in our immunoassay system is only a few percent of that of human insulin. Squirrel monkey glucagon is identical with the usual glucagon found in Old World mammals, which predicts that the glucagons of other New World monkeys would not differ from the usual Old World mammalian glucagon. It appears that the peptides of the New World monkeys have diverged less from those of the Old World mammals than have those of the New World hystricomorph rodents. The striking improvements in peptide purification and sequencing have the potential for adding new information concerning the evolutionary divergence of species. PMID:2263627

  11. Isolation and amino acid sequences of squirrel monkey (Saimiri sciurea) insulin and glucagon

    SciTech Connect

    Yu, Jinghua ); Eng, J.; Yalow, R.S. City Univ. of New York, NY )

    1990-12-01

    It was reported two decades ago that insulin was not detectable in the glucose-stimulated state in Saimiri sciurea, the New World squirrel monkey, by a radioimmunoassay system developed with guinea pig anti-pork insulin antibody and labeled park insulin. With the same system, reasonable levels were observed in rhesus monkeys and chimpanzees. This suggested that New World monkeys, like the New World hystricomorph rodents such as the guinea pig and the coypu, might have insulins whose sequences differ markedly from those of Old World mammals. In this report the authors describe the purification and amino acid sequences of squirrel monkey insulin and glucagon. They demonstrate that the substitutions at B29, B27, A2, A4, and A17 of squirrel monkey insulin are identical with those previously found in another New World primate, the owl monkey (Aotus trivirgatus). The immunologic cross-reactivity of this insulin in their immunoassay system is only a few percent of that of human insulin. It appears that the peptides of the New World monkeys have diverged less from those of the Old World mammals than have those of the New World hystricomorph rodents. The striking improvements in peptide purification and sequencing have the potential for adding new information concerning the evolutionary divergence of species.

  12. Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models

    PubMed Central

    Maaskola, Jonas; Rajewsky, Nikolaus

    2014-01-01

    We present a discriminative learning method for pattern discovery of binding sites in nucleic acid sequences based on hidden Markov models. Sets of positive and negative example sequences are mined for sequence motifs whose occurrence frequency varies between the sets. The method offers several objective functions, but we concentrate on mutual information of condition and motif occurrence. We perform a systematic comparison of our method and numerous published motif-finding tools. Our method achieves the highest motif discovery performance, while being faster than most published methods. We present case studies of data from various technologies, including ChIP-Seq, RIP-Chip and PAR-CLIP, of embryonic stem cell transcription factors and of RNA-binding proteins, demonstrating practicality and utility of the method. For the alternative splicing factor RBM10, our analysis finds motifs known to be splicing-relevant. The motif discovery method is implemented in the free software package Discrover. It is applicable to genome- and transcriptome-scale data, makes use of available repeat experiments and aside from binary contrasts also more complex data configurations can be utilized. PMID:25389269

  13. The Chinese hamster Alu-equivalent sequence: a conserved highly repetitious, interspersed deoxyribonucleic acid sequence in mammals has a structure suggestive of a transposable element.

    PubMed Central

    Haynes, S R; Toomey, T P; Leinwand, L; Jelinek, W R

    1981-01-01

    A consensus sequence has been determined for a major interspersed deoxyribonucleic acid repeat in the genome of Chinese hamster ovary cells (CHO cells). This sequence is extensively homologous to (i) the human Alu sequence (P. L. Deininger et al., J. Mol. Biol., in press), (ii) the mouse B1 interspersed repetitious sequence (Krayev et al., Nucleic Acids Res. 8:1201-1215, 1980) (iii) an interspersed repetitious sequence from African green monkey deoxyribonucleic acid (Dhruva et al., Proc. Natl. Acad. Sci. U.S.A. 77:4514-4518, 1980) and (iv) the CHO and mouse 4.5S ribonucleic acid (this report; F. Harada and N. Kato, Nucleic Acids Res. 8:1273-1285, 1980). Because the CHO consensus sequence shows significant homology to the human Alu sequence it is termed the CHO Alu-equivalent sequence. A conserved structure surrounding CHO Alu-equivalent family members can be recognized. It is similar to that surrounding the human Alu and the mouse B1 sequences, and is represented as follows: direct repeat-CHO-Alu-A-rich sequence-direct repeat. A composite interspersed repetitious sequence has been identified. Its structure is represented as follows: direct repeat-residue 47 to 107 of CHO-Alu-non-Alu repetitious sequence-A-rich sequence-direct repeat. Because the Alu flanking sequences resemble those that flank known transposable elements, we think it likely that the Alu sequence dispersed throughout the mammalian genome by transposition. Images PMID:9279371

  14. Acid rain information book. Draft final report

    SciTech Connect

    1980-12-01

    Acid rain is one of the most widely publicized environmental issues of the day. The potential consequences of increasingly widespread acid rain demand that this phenomenon be carefully evaluated. Reveiw of the literature shows a rapidly growing body of knowledge, but also reveals major gaps in understanding that need to be narrowed. This document discusses major aspects of the acid rain phenomenon, points out areas of uncertainty, and summarizes current and projected research by responsible government agencies and other concerned organizations.

  15. Computer Simulation of the Determination of Amino Acid Sequences in Polypeptides

    ERIC Educational Resources Information Center

    Daubert, Stephen D.; Sontum, Stephen F.

    1977-01-01

    Describes a computer program that generates a random string of amino acids and guides the student in determining the correct sequence of a given protein by using experimental analytic data for that protein. (MLH)

  16. Information recovery from low coverage whole-genome bisulfite sequencing

    PubMed Central

    Libertini, Emanuele; Heath, Simon C.; Hamoudi, Rifat A.; Gut, Marta; Ziller, Michael J.; Czyz, Agata; Ruotti, Victor; Stunnenberg, Hendrik G.; Frontini, Mattia; Ouwehand, Willem H.; Meissner, Alexander; Gut, Ivo G.; Beck, Stephan

    2016-01-01

    The cost of whole-genome bisulfite sequencing (WGBS) remains a bottleneck for many studies and it is therefore imperative to extract as much information as possible from a given dataset. This is particularly important because even at the recommend 30X coverage for reference methylomes, up to 50% of high-resolution features such as differentially methylated positions (DMPs) cannot be called with current methods as determined by saturation analysis. To address this limitation, we have developed a tool that dynamically segments WGBS methylomes into blocks of comethylation (COMETs) from which lost information can be recovered in the form of differentially methylated COMETs (DMCs). Using this tool, we demonstrate recovery of ∼30% of the lost DMP information content as DMCs even at very low (5X) coverage. This constitutes twice the amount that can be recovered using an existing method based on differentially methylated regions (DMRs). In addition, we explored the relationship between COMETs and haplotypes in lymphoblastoid cell lines of African and European origin. Using best fit analysis, we show COMETs to be correlated in a population-specific manner, suggesting that this type of dynamic segmentation may be useful for integrated (epi)genome-wide association studies in the future. PMID:27346250

  17. Information recovery from low coverage whole-genome bisulfite sequencing.

    PubMed

    Libertini, Emanuele; Heath, Simon C; Hamoudi, Rifat A; Gut, Marta; Ziller, Michael J; Czyz, Agata; Ruotti, Victor; Stunnenberg, Hendrik G; Frontini, Mattia; Ouwehand, Willem H; Meissner, Alexander; Gut, Ivo G; Beck, Stephan

    2016-01-01

    The cost of whole-genome bisulfite sequencing (WGBS) remains a bottleneck for many studies and it is therefore imperative to extract as much information as possible from a given dataset. This is particularly important because even at the recommend 30X coverage for reference methylomes, up to 50% of high-resolution features such as differentially methylated positions (DMPs) cannot be called with current methods as determined by saturation analysis. To address this limitation, we have developed a tool that dynamically segments WGBS methylomes into blocks of comethylation (COMETs) from which lost information can be recovered in the form of differentially methylated COMETs (DMCs). Using this tool, we demonstrate recovery of ∼30% of the lost DMP information content as DMCs even at very low (5X) coverage. This constitutes twice the amount that can be recovered using an existing method based on differentially methylated regions (DMRs). In addition, we explored the relationship between COMETs and haplotypes in lymphoblastoid cell lines of African and European origin. Using best fit analysis, we show COMETs to be correlated in a population-specific manner, suggesting that this type of dynamic segmentation may be useful for integrated (epi)genome-wide association studies in the future. PMID:27346250

  18. Characterization of mouse cellular deoxyribonucleic acid homologous to Abelson murine leukemia virus-specific sequences.

    PubMed Central

    Dale, B; Ozanne, B

    1981-01-01

    The genome of Abelson murine leukemia virus (A-MuLV) consists of sequences derived from both BALB/c mouse deoxyribonucleic acid and the genome of Moloney murine leukemia virus. Using deoxyribonucleic acid linear intermediates as a source of retroviral deoxyribonucleic acid, we isolated a recombinant plasmid which contained 1.9 kilobases of the 3.5-kilobase mouse-derived sequences found in A-MuLV (A-MuLV-specific sequences). We used this clone, designated pSA-17, as a probe restriction enzyme and Southern blot analyses to examine the arrangement of homologous sequences in BALB/c deoxyribonucleic acid (endogenous Abelson sequences). The endogenous Abelson sequences within the mouse genome were interrupted by noncoding regions, suggesting that a rearrangement of the cell sequences was required to produce the sequence found in the virus. Endogenous Abelson sequences were arranged similarly in mice that were susceptible to A-MuLV tumors and in mice that were resistant to A-MuLV tumors. An examination of three BALB/c plasmacytomas and a BALB/c early B-cell tumor likewise revealed no alteration in the arrangement of the endogenous Abelson sequences. Homology to pSA-17 was also observed in deoxyribonucleic acids prepared from rat, hamster, chicken, and human cells. An isolate of A-MuLV which encoded a 160,000-dalton transforming protein (P160) contained 700 more base pairs of mouse sequences than the standard A-MuLV isolate, which encoded a 120,000-dalton transforming protein (P120). Images PMID:9279386

  19. The amino acid sequence of monal pheasant lysozyme and its activity.

    PubMed

    Araki, T; Matsumoto, T; Torikata, T

    1998-10-01

    The amino acid sequence of monal pheasant lysozyme and its activity were analyzed. Carboxymethylated lysozyme was digested with trypsin and the resulting peptides were sequenced. The established amino acid sequence had one amino acid substitution at position 102 (Arg to Gly) comparing with Indian peafowl lysozyme and four amino acid substitutions at positions 3 (Phe to Tyr), 15 (His to Leu), 41 (Gln to His), and 121 (Gln to His) with chicken lysozyme. Analysis of the time-courses of reaction using N-acetylglucosamine pentamer as a substrate showed a difference of binding free energy change (-0.4 kcal/mol) at subsites A between monal pheasant and Indian peafowl lysozyme. This was assumed to be caused by the amino acid substitution at subsite A with loss of a positive charge at position 102 (Arg102 to Gly). PMID:9836434

  20. Studies on monotreme proteins. VII. Amino acid sequence of myoglobin from the platypus, Ornithoryhynchus anatinus.

    PubMed

    Fisher, W K; Thompson, E O

    1976-03-01

    Myoglobin isolated from skeletal muscle of the platypus contains 153 amino acid residues. The complete amino acid sequence has been determined following cleavage with cyanogen bromide and further digestion of the four fragments with trypsin, chymotrypsin, pepsin and thermolysin. Sequences of the purified peptides were determined by the dansyl-Edman procedure. The amino acid sequence showed 25 differences from human myoglobin and 24 from kangaroo myoglobin. Amino acid sequences in myoglobins are more conserved than sequences in the alpha- and beta-globin chains, and platypus myoglobin shows a similar number of variations in sequence to kangaroo myoglobin when compared with myoglobin of other species. The date of divergence of the platypus from other mammals was estimated at 102 +/- 31 million years, based on the number of amino acid differences between species and allowing for mutations during the evolutionary period. This estimate differs widely from the estimate given by similar treatment of the alpha- and beta-chain sequences and a constant rate of mutation of globin chains is not supported. PMID:962722

  1. cDNA-derived amino acid sequences of myoglobins from nine species of whales and dolphins.

    PubMed

    Iwanami, Kentaro; Mita, Hajime; Yamamoto, Yasuhiko; Fujise, Yoshihiro; Yamada, Tadasu; Suzuki, Tomohiko

    2006-10-01

    We determined the myoglobin (Mb) cDNA sequences of nine cetaceans, of which six are the first reports of Mb sequences: sei whale (Balaenoptera borealis), Bryde's whale (Balaenoptera edeni), pygmy sperm whale (Kogia breviceps), Stejneger's beaked whale (Mesoplodon stejnegeri), Longman's beaked whale (Indopacetus pacificus), and melon-headed whale (Peponocephala electra), and three confirm the previously determined chemical amino acid sequences: sperm whale (Physeter macrocephalus), common minke whale (Balaenoptera acutorostrata) and pantropical spotted dolphin (Stenella attenuata). We found two types of Mb in the skeletal muscle of pantropical spotted dolphin: Mb I with the same amino acid sequence as that deposited in the protein database, and Mb II, which differs at two amino acid residues compared with Mb I. Using an alignment of the amino acid or cDNA sequences of cetacean Mb, we constructed a phylogenetic tree by the NJ method. Clustering of cetacean Mb amino acid and cDNA sequences essentially follows the classical taxonomy of cetaceans, suggesting that Mb sequence data is valid for classification of cetaceans at least to the family level. PMID:16962803

  2. Clostridium sticklandii, a specialist in amino acid degradation:revisiting its metabolism through its genome sequence

    PubMed Central

    2010-01-01

    Background Clostridium sticklandii belongs to a cluster of non-pathogenic proteolytic clostridia which utilize amino acids as carbon and energy sources. Isolated by T.C. Stadtman in 1954, it has been generally regarded as a "gold mine" for novel biochemical reactions and is used as a model organism for studying metabolic aspects such as the Stickland reaction, coenzyme-B12- and selenium-dependent reactions of amino acids. With the goal of revisiting its carbon, nitrogen, and energy metabolism, and comparing studies with other clostridia, its genome has been sequenced and analyzed. Results C. sticklandii is one of the best biochemically studied proteolytic clostridial species. Useful additional information has been obtained from the sequencing and annotation of its genome, which is presented in this paper. Besides, experimental procedures reveal that C. sticklandii degrades amino acids in a preferential and sequential way. The organism prefers threonine, arginine, serine, cysteine, proline, and glycine, whereas glutamate, aspartate and alanine are excreted. Energy conservation is primarily obtained by substrate-level phosphorylation in fermentative pathways. The reactions catalyzed by different ferredoxin oxidoreductases and the exergonic NADH-dependent reduction of crotonyl-CoA point to a possible chemiosmotic energy conservation via the Rnf complex. C. sticklandii possesses both the F-type and V-type ATPases. The discovery of an as yet unrecognized selenoprotein in the D-proline reductase operon suggests a more detailed mechanism for NADH-dependent D-proline reduction. A rather unusual metabolic feature is the presence of genes for all the enzymes involved in two different CO2-fixation pathways: C. sticklandii harbours both the glycine synthase/glycine reductase and the Wood-Ljungdahl pathways. This unusual pathway combination has retrospectively been observed in only four other sequenced microorganisms. Conclusions Analysis of the C. sticklandii genome and

  3. Draft Genome Sequences of Two Novel Acidimicrobiaceae Members from an Acid Mine Drainage Biofilm Metagenome.

    PubMed

    Pinto, Ameet J; Sharp, Jonathan O; Yoder, Michael J; Almstrand, Robert

    2016-01-01

    Bacteria belonging to the family Acidimicrobiaceae are frequently encountered in heavy metal-contaminated acidic environments. However, their phylogenetic and metabolic diversity is poorly resolved. We present draft genome sequences of two novel and phylogenetically distinct Acidimicrobiaceae members assembled from an acid mine drainage biofilm metagenome. PMID:26769942

  4. Draft Genome Sequences of Two Novel Acidimicrobiaceae Members from an Acid Mine Drainage Biofilm Metagenome

    PubMed Central

    Pinto, Ameet J.; Sharp, Jonathan O.; Yoder, Michael J.

    2016-01-01

    Bacteria belonging to the family Acidimicrobiaceae are frequently encountered in heavy metal-contaminated acidic environments. However, their phylogenetic and metabolic diversity is poorly resolved. We present draft genome sequences of two novel and phylogenetically distinct Acidimicrobiaceae members assembled from an acid mine drainage biofilm metagenome. PMID:26769942

  5. Two distinct ferredoxins from Rhodobacter capsulatus: complete amino acid sequences and molecular evolution.

    PubMed

    Saeki, K; Suetsugu, Y; Yao, Y; Horio, T; Marrs, B L; Matsubara, H

    1990-09-01

    Two distinct ferredoxins were purified from Rhodobacter capsulatus SB1003. Their complete amino acid sequences were determined by a combination of protease digestion, BrCN cleavage and Edman degradation. Ferredoxins I and II were composed of 64 and 111 amino acids, respectively, with molecular weights of 6,728 and 12,549 excluding iron and sulfur atoms. Both contained two Cys clusters in their amino acid sequences. The first cluster of ferredoxin I and the second cluster of ferredoxin II had a sequence, CxxCxxCxxxCP, in common with the ferredoxins found in Clostridia. The second cluster of ferredoxin I had a sequence, CxxCxxxxxxxxCxxxCM, with extra amino acids between the second and third Cys, which has been reported for other photosynthetic bacterial ferredoxins and putative ferredoxins (nif-gene products) from nitrogen-fixing bacteria, and with a unique occurrence of Met. The first cluster of ferredoxin II had a CxxCxxxxCxxxCP sequence, with two additional amino acids between the second and third Cys, a characteristics feature of Azotobacter-[3Fe-4S] [4Fe-4S]-ferredoxin. Ferredoxin II was also similar to Azotobacter-type ferredoxins with an extended carboxyl (C-) terminal sequence compared to the common Clostridium-type. The evolutionary relationship of the two together with a putative one recently found to be encoded in nifENXQ region in this bacterium [Moreno-Vivian et al. (1989) J. Bacteriol. 171, 2591-2598] is discussed. PMID:2277040

  6. Amino Acid Sequence of Anionic Peroxidase from the Windmill Palm Tree Trachycarpus fortunei

    PubMed Central

    2015-01-01

    Palm peroxidases are extremely stable and have uncommon substrate specificity. This study was designed to fill in the knowledge gap about the structures of a peroxidase from the windmill palm tree Trachycarpus fortunei. The complete amino acid sequence and partial glycosylation were determined by MALDI-top-down sequencing of native windmill palm tree peroxidase (WPTP), MALDI-TOF/TOF MS/MS of WPTP tryptic peptides, and cDNA sequencing. The propeptide of WPTP contained N- and C-terminal signal sequences which contained 21 and 17 amino acid residues, respectively. Mature WPTP was 306 amino acids in length, and its carbohydrate content ranged from 21% to 29%. Comparison to closely related royal palm tree peroxidase revealed structural features that may explain differences in their substrate specificity. The results can be used to guide engineering of WPTP and its novel applications. PMID:25383699

  7. Protein chemotaxonomy. XIII. Amino acid sequence of ferredoxin from Panax ginseng.

    PubMed

    Mino, Yoshiki

    2006-08-01

    The complete amino acid sequence of [2Fe-2S] ferredoxin from Panax ginseng (Araliaceae) has been determined by automated Edman degradation of the entire S-carboxymethylcysteinyl protein and of the peptides obtained by enzymatic digestion. This ferredoxin has a unique amino acid sequence, which includes an insertion of Tyr at the 3rd position from the amino-terminus and a deletion of two amino acid residues at the carboxyl terminus. This ferredoxin had 18 differences in its amino acid sequence compared to that of Petroselinum sativum (Umbelliferae). In contrast, 23-33 differences were observed compared to other dicotyledonous plants. This suggests that Panax ginseng is related taxonomically to umbelliferous plants. PMID:16880642

  8. Complete amino acid sequence and structure characterization of the taste-modifying protein, miraculin.

    PubMed

    Theerasilp, S; Hitotsuya, H; Nakajo, S; Nakaya, K; Nakamura, Y; Kurihara, Y

    1989-04-25

    The taste-modifying protein, miraculin, has the unusual property of modifying sour taste into sweet taste. The complete amino acid sequence of miraculin purified from miracle fruits by a newly developed method (Theerasilp, S., and Kurihara, Y. (1988) J. Biol. Chem. 263, 11536-11539) was determined by an automatic Edman degradation method. Miraculin was a single polypeptide with 191 amino acid residues. The calculated molecular weight based on the amino acid sequence and the carbohydrate content (13.9%) was 24,600. Asn-42 and Asn-186 were linked N-glycosidically to carbohydrate chains. High homology was found between the amino acid sequences of miraculin and soybean trypsin inhibitor. PMID:2708331

  9. Complete cDNA and derived amino acid sequence of human factor V

    SciTech Connect

    Jenny, R.J.; Pittman, D.D.; Toole, J.J.; Kriz, R.W.; Aldape, R.A.; Hewick, R.M.; Kaufman, R.J.; Mann, K.G.

    1987-07-01

    cDNA clones encoding human factor V have been isolated from an oligo(dT)-primed human fetal liver cDNA library prepared with vector Charon 21A. The cDNA sequence of factor V from three overlapping clones includes a 6672-base-pair (bp) coding region, a 90-bp 5' untranslated region, and a 163-bp 3' untranslated region within which is a poly(A)tail. The deduced amino acid sequence consists of 2224 amino acids inclusive of a 28-amino acid leader peptide. Direct comparison with human factor VIII reveals considerable homology between proteins in amino acid sequence and domain structure: a triplicated A domain and duplicated C domain show approx. 40% identity with the corresponding domains in factor VIII. As in factor VIII, the A domains of factor V share approx. 40% amino acid-sequence homology with the three highly conserved domains in ceruloplasmin. The B domain of factor V contains 35 tandem and approx. 9 additional semiconserved repeats of nine amino acids of the form Asp-Leu-Ser-Gln-Thr-Thr/Asn-Leu-Ser-Pro and 2 additional semiconserved repeats of 17 amino acids. Factor V contains 37 potential N-linked glycosylation sites, 25 of which are in the B domain, and a total of 19 cysteine residues.

  10. N-terminal sequence of amino acids and some properties of an acid-stable alpha-amylase from citric acid-koji (Aspergillus usamii var.).

    PubMed

    Suganuma, T; Tahara, N; Kitahara, K; Nagahama, T; Inuzuka, K

    1996-01-01

    An acid-stable alpha-amylase (AA) was purified from an acidic extract of citric acid-koji (A. usamii var.). The N-terminal sequence of the first 20 amino acids of the enzyme was identical with that of AA from A. niger, but the two enzymes differed in molecular weight. HPLC analysis for identifying the anomers of products indicated that the AA hydrolyzed maltopentaose (G5) at the third glycoside bond predominantly, which differed from Taka-amylase A and the neutral alpha-amylase (NA) from the citric acid-koji. PMID:8824843

  11. Integration of Temporal and Ordinal Information During Serial Interception Sequence Learning

    PubMed Central

    Gobel, Eric W.; Sanchez, Daniel J.; Reber, Paul J.

    2011-01-01

    The expression of expert motor skills typically involves learning to perform a precisely timed sequence of movements (e.g., language production, music performance, athletic skills). Research examining incidental sequence learning has previously relied on a perceptually-cued task that gives participants exposure to repeating motor sequences but does not require timing of responses for accuracy. Using a novel perceptual-motor sequence learning task, learning a precisely timed cued sequence of motor actions is shown to occur without explicit instruction. Participants learned a repeating sequence through practice and showed sequence-specific knowledge via a performance decrement when switched to an unfamiliar sequence. In a second experiment, the integration of representation of action order and timing sequence knowledge was examined. When either action order or timing sequence information was selectively disrupted, performance was reduced to levels similar to completely novel sequences. Unlike prior sequence-learning research that has found timing information to be secondary to learning action sequences, when the task demands require accurate action and timing information, an integrated representation of these types of information is acquired. These results provide the first evidence for incidental learning of fully integrated action and timing sequence information in the absence of an independent representation of action order, and suggest that this integrative mechanism may play a material role in the acquisition of complex motor skills. PMID:21417511

  12. Integration of temporal and ordinal information during serial interception sequence learning.

    PubMed

    Gobel, Eric W; Sanchez, Daniel J; Reber, Paul J

    2011-07-01

    The expression of expert motor skills typically involves learning to perform a precisely timed sequence of movements. Research examining incidental sequence learning has relied on a perceptually cued task that gives participants exposure to repeating motor sequences but does not require timing of responses for accuracy. In the 1st experiment, a novel perceptual-motor sequence learning task was used, and learning a precisely timed cued sequence of motor actions was shown to occur without explicit instruction. Participants learned a repeating sequence through practice and showed sequence-specific knowledge via a performance decrement when switched to an unfamiliar sequence. In the 2nd experiment, the integration of representation of action order and timing sequence knowledge was examined. When either action order or timing sequence information was selectively disrupted, performance was reduced to levels similar to completely novel sequences. Unlike prior sequence-learning research that has found timing information to be secondary to learning action sequences, when the task demands require accurate action and timing information, an integrated representation of these types of information is acquired. These results provide the first evidence for incidental learning of fully integrated action and timing sequence information in the absence of an independent representation of action order and suggest that this integrative mechanism may play a material role in the acquisition of complex motor skills. PMID:21417511

  13. Sequence of cDNA for rat cystathionine gamma-lyase and comparison of deduced amino acid sequence with related Escherichia coli enzymes.

    PubMed Central

    Erickson, P F; Maxwell, I H; Su, L J; Baumann, M; Glode, L M

    1990-01-01

    A cDNA clone for cystathionine gamma-lyase was isolated from a rat cDNA library in lambda gt11 by screening with a monospecific antiserum. The identity of this clone, containing 600 bp proximal to the 3'-end of the gene, was confirmed by positive hybridization selection. Northern-blot hybridization showed the expected higher abundance of the corresponding mRNA in liver than in brain. Two further cDNA clones from a plasmid pcD library were isolated by colony hybridization with the first clone and were found to contain inserts of 1600 and 1850 bp. One of these was confirmed as encoding cystathionine gamma-lyase by hybridization with two independent pools of oligodeoxynucleotides corresponding to partial amino acid sequence information for cystathionine gamma-lyase. The other clone (estimated to represent all but 8% of the 5'-end of the mRNA) was sequenced and its deduced amino acid sequence showed similarity to those of the Escherichia coli enzymes cystathionine beta-lyase and cystathionine gamma-synthase throughout its length, especially to that of the latter. Images Fig. 1. Fig. 2. Fig. 3. Fig. 5. PMID:2201285

  14. Distress vocalization sequences broadcasted by bats carry redundant information.

    PubMed

    Hechavarría, Julio C; Beetz, M Jerome; Macias, Silvio; Kössl, Manfred

    2016-07-01

    Distress vocalizations (also known as alarm or screams) are an important component of the vocal repertoire of a number of animal species, including bats, humans, monkeys and birds, among others. Although the behavioral relevance of distress vocalizations is undeniable, at present, little is known about the rules that govern vocalization production when in alarmful situations. In this article, we show that when distressed, bats of the species Carollia perspicillata produce repetitive vocalization sequences in which consecutive syllables are likely to be similar to one another regarding their physical attributes. The uttered distress syllables are broadband (12-73 kHz) with most of their energy focussing at 23 kHz. Distress syllables are short (~4 ms), their average sound pressure level is close to 70 dB SPL, and they are produced at high repetition rates (every 14 ms). We discuss that, because of their physical attributes, bat distress vocalizations could serve a dual purpose: (1) advertising threatful situations to conspecifics, and (2) informing the threatener that the bats are ready to defend themselves. We also discuss possible advantages of advertising danger/discomfort using repetitive utterances, a calling strategy that appears to be ubiquitous across the animal kingdom. PMID:27277892

  15. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1997-01-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided.

  16. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1997-04-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided. 7 figs.

  17. A knowledge engineering approach to recognizing and extracting sequences of nucleic acids from scientific literature.

    PubMed

    García-Remesal, Miguel; Maojo, Victor; Crespo, José

    2010-01-01

    In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences. PMID:21096556

  18. A computer program for the estimation of protein and nucleic acid sequence diversity in random point mutagenesis libraries

    PubMed Central

    Volles, Michael J.; Lansbury, Peter T.

    2005-01-01

    A computer program for the generation and analysis of in silico random point mutagenesis libraries is described. The program operates by mutagenizing an input nucleic acid sequence according to mutation parameters specified by the user for each sequence position and type of point mutation. The program can mimic almost any type of random mutagenesis library, including those produced via error-prone PCR (ep-PCR), mutator Escherichia coli strains, chemical mutagenesis, and doped or random oligonucleotide synthesis. The program analyzes the generated nucleic acid sequences and/or the associated protein library to produce several estimates of library diversity (number of unique sequences, point mutations, and single point mutants) and the rate of saturation of these diversities during experimental screening or selection of clones. This information allows one to select the optimal screen size for a given mutagenesis library, necessary to efficiently obtain a certain coverage of the sequence-space. The program also reports the abundance of each specific protein mutation at each sequence position, which is useful as a measure of the level and type of mutation bias in the library. Alternatively, one can use the program to evaluate the relative merits of preexisting libraries, or to examine various hypothetical mutation schemes to determine the optimal method for creating a library that serves the screen/selection of interest. Simulated libraries of at least 109 sequences are accessible by the numerical algorithm with currently available personal computers; an analytical algorithm is also available which can rapidly calculate a subset of the numerical statistics in libraries of arbitrarily large size. A multi-type double-strand stochastic model of ep-PCR is developed in an appendix to demonstrate the applicability of the algorithm to amplifying mutagenesis procedures. Estimators of DNA polymerase mutation-type-specific error rates are derived using the model. Analyses of an

  19. Conversion of amino-acid sequence in proteins to classical music: search for auditory patterns

    PubMed Central

    2007-01-01

    We have converted genome-encoded protein sequences into musical notes to reveal auditory patterns without compromising musicality. We derived a reduced range of 13 base notes by pairing similar amino acids and distinguishing them using variations of three-note chords and codon distribution to dictate rhythm. The conversion will help make genomic coding sequences more approachable for the general public, young children, and vision-impaired scientists. PMID:17477882

  20. Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes

    PubMed Central

    2015-01-01

    Background Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. Results This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. Conclusions The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein

  1. Ab initio detection of fuzzy amino acid tandem repeats in protein sequences

    PubMed Central

    2012-01-01

    Background Tandem repetitions within protein amino acid sequences often correspond to regular secondary structures and form multi-repeat 3D assemblies of varied size and function. Developing internal repetitions is one of the evolutionary mechanisms that proteins employ to adapt their structure and function under evolutionary pressure. While there is keen interest in understanding such phenomena, detection of repeating structures based only on sequence analysis is considered an arduous task, since structure and function is often preserved even under considerable sequence divergence (fuzzy tandem repeats). Results In this paper we present PTRStalker, a new algorithm for ab-initio detection of fuzzy tandem repeats in protein amino acid sequences. In the reported results we show that by feeding PTRStalker with amino acid sequences from the UniProtKB/Swiss-Prot database we detect novel tandemly repeated structures not captured by other state-of-the-art tools. Experiments with membrane proteins indicate that PTRStalker can detect global symmetries in the primary structure which are then reflected in the tertiary structure. Conclusions PTRStalker is able to detect fuzzy tandem repeating structures in protein sequences, with performance beyond the current state-of-the art. Such a tool may be a valuable support to investigating protein structural properties when tertiary X-ray data is not available. PMID:22536906

  2. The amino-acid sequence of leghemoglobin component a from Phaseolus vulgaris (kidney bean).

    PubMed

    Lehtovaara, P; Ellfolk, N

    1975-06-01

    1. Leghemoglobin component a from Phaseolus vulgaris (kidney bean) was digested with trypsin; 15 tryptic peptides and free lysine were purified and the amino acid sequences of the peptides determined. 2. The internal order of the tryptic peptides was determined by the bridge peptides obtained from the thermolytic digest and the dilute acid hydrolyzate of kidney bean leghemoglobin a; 12 thermolytic peptides and two acid hydrolysis peptides were purified and the sequences were partially or completely determined. 3. The complete amino acid sequence of kidney bean leghemoglobin a is compared to that of leghemoglobin a from soybean (Glycine max) and to some animal globins. As regards sequence, the kidney bean globin has 79% identity with the soybean globin and 21% identity with human hemoglobin gamma-chain. Seven of the 14 amino acid residues common to most globins are found in the kidney bean globin. Trp-15 and Tyr-145 are evolutionarily conserved in this globin, which confirms the concept of a common origin of animal and plant globins. PMID:809270

  3. Informational structure of genetic sequences and nature of gene splicing

    NASA Astrophysics Data System (ADS)

    Trifonov, E. N.

    1991-10-01

    Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.

  4. Unraveling the sequence information in COI barcode to achieve higher taxon assignment based on Indian freshwater fishes.

    PubMed

    Chakraborty, Mohua; Ghosh, Sankar Kumar

    2015-04-01

    Efficacy of cytochrome c oxidase subunit I (COI) DNA barcode in higher taxon assignment is still under debate in spite of several attempts, using the conventional DNA barcoding methods, to assign higher taxa. Here we try to understand whether nucleotide and amino acid sequence in COI gene carry sufficient information to assign species to their higher taxonomic rank, using 160 species of Indian freshwater fishes. Our results reveal that with increase in the taxonomic rank, sequence conservation decreases for both nucleotides and amino acids. Order level exhibits lowest conservation with 50% of the nucleotides and amino acids being conserved. Among the variable sites, 30-50% were found to carry high information content within an order, while it was 70-80% within a family and 80-99% within a genus. High information content shows sites with almost conserved sequence but varying at one or two locations, which can be due to variations at species or population level. Thus, the potential of COI gene in higher taxon assignment is revealed with validation of ample inherent signals latent in the gene. PMID:24409929

  5. Draft genome sequence of the docosahexaenoic acid producing thraustochytrid Aurantiochytrium sp. T66.

    PubMed

    Liu, Bin; Ertesvåg, Helga; Aasen, Inga Marie; Vadstein, Olav; Brautaset, Trygve; Heggeset, Tonje Marita Bjerkan

    2016-06-01

    Thraustochytrids are unicellular, marine protists, and there is a growing industrial interest in these organisms, particularly because some species, including strains belonging to the genus Aurantiochytrium, accumulate high levels of docosahexaenoic acid (DHA). Here, we report the draft genome sequence of Aurantiochytrium sp. T66 (ATCC PRA-276), with a size of 43 Mbp, and 11,683 predicted protein-coding sequences. The data has been deposited at DDBJ/EMBL/Genbank under the accession LNGJ00000000. The genome sequence will contribute new insight into DHA biosynthesis and regulation, providing a basis for metabolic engineering of thraustochytrids. PMID:27222814

  6. Application of combined mass spectrometry and partial amino acid sequence to the identification of gel-separated proteins.

    PubMed

    Patterson, S D; Thomas, D; Bradshaw, R A

    1996-05-01

    The combined use of peptide mass information with amino acid sequence information derived by chemical sequencing or mass spectrometry (MS)-based approaches provides a powerful means of protein identification. We have used a two-part strategy to identify proteins from nerve growth factor (NGF)-stimulated rat adrenal pheochromocytoma cell line PC-12 cell lysates that associate with the adaptor protein Shc (Shc homologous and collagen protein). Initial experiments with metabolically radiolabeled cell extracts separated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) revealed a number of proteins that coimmunoprecipitated with anti-Shc antibody compared with control (unstimulated) cell extracts. The experiment was scaled up and cell lysate from NGF-stimulated PC-12 cells was applied to a glutathione-S-transferase (GST)-Shc affinity column, eluted, separated by SDS-PAGE and blotted to Immobilon-CD. The blotted proteins were proteolytically digested in situ, and the masses obtained from the extracted peptides were used in a peptide-mass search program in an attempt to identify the protein. Even if a strong candidate was found using this search, an additional step was performed to confirm the identification. The mixtures were fractionated by reversed-phase high-performance liquid chromatography (RP-HPLC) and subjected to chemical sequencing to obtain (partial) sequence information, or post-source decay (PSD-) matrix-assisted laser-desorption ionization (MALDI)-MS to obtain sequence-specific fragment ions. This data was used in a peptide-sequence tag search to confirm the identity of the proteins. This combined approach allowed identification of four proteins of M(r) 43,000 to 200,000. In one case the identified protein clearly did not correspond to the radiolabeled band, but to a protein contaminant from the column. The advantages and pitfalls of the approach are discussed. PMID:8783013

  7. A classification of glycosyl hydrolases based on amino acid sequence similarities.

    PubMed Central

    Henrissat, B

    1991-01-01

    The amino acid sequences of 301 glycosyl hydrolases and related enzymes have been compared. A total of 291 sequences corresponding to 39 EC entries could be classified into 35 families. Only ten sequences (less than 5% of the sample) could not be assigned to any family. With the sequences available for this analysis, 18 families were found to be monospecific (containing only one EC number) and 17 were found to be polyspecific (containing at least two EC numbers). Implications on the folding characteristics and mechanism of action of these enzymes and on the evolution of carbohydrate metabolism are discussed. With the steady increase in sequence and structural data, it is suggested that the enzyme classification system should perhaps be revised. PMID:1747104

  8. New families in the classification of glycosyl hydrolases based on amino acid sequence similarities.

    PubMed Central

    Henrissat, B; Bairoch, A

    1993-01-01

    301 glycosyl hydrolases and related enzymes corresponding to 39 EC entries of the I.U.B. classification system have been classified into 35 families on the basis of amino-acid-sequence similarities [Henrissat (1991) Biochem. J. 280, 309-316]. Approximately half of the families were found to be monospecific (containing only one EC number), whereas the other half were found to be polyspecific (containing at least two EC numbers). A > 60% increase in sequence data for glycosyl hydrolases (181 additional enzymes or enzyme domains sequences have since become available) allowed us to update the classification not only by the addition of more members to already identified families, but also by the finding of ten new families. On the basis of a comparison of 482 sequences corresponding to 52 EC entries, 45 families, out of which 22 are polyspecific, can now be defined. This classification has been implemented in the SWISS-PROT protein sequence data bank. PMID:8352747

  9. Sequence-specific purification of nucleic acids by PNA-controlled hybrid selection.

    PubMed

    Orum, H; Nielsen, P E; Jørgensen, M; Larsson, C; Stanley, C; Koch, T

    1995-09-01

    Using an oligohistidine peptide nucleic acids (oligohistidine-PNA) chimera, we have developed a rapid hybrid selection method that allows efficient, sequence-specific purification of a target nucleic acid. The method exploits two fundamental features of PNA. First, that PNA binds with high affinity and specificity to its complementary nucleic acid. Second, that amino acids are easily attached to the PNA oligomer during synthesis. We show that a (His)6-PNA chimera exhibits strong binding to chelated Ni2+ ions without compromising its native PNA hybridization properties. We further show that these characteristics allow the (His)6-PNA/DNA complex to be purified by the well-established method of metal ion affinity chromatography using a Ni(2+)-NTA (nitrilotriactic acid) resin. Specificity and efficiency are the touchstones of any nucleic acid purification scheme. We show that the specificity of the (His)6-PNA selection approach is such that oligonucleotides differing by only a single nucleotide can be selectively purified. We also show that large RNAs (2224 nucleotides) can be captured with high efficiency by using multiple (His)6-PNA probes. PNA can hybridize to nucleic acids in low-salt concentrations that destabilize native nucleic acid structures. We demonstrate that this property of PNA can be utilized to purify an oligonucleotide in which the target sequence forms part of an intramolecular stem/loop structure. PMID:7495562

  10. In silico comparative analysis of DNA and amino acid sequences for prion protein gene.

    PubMed

    Kim, Y; Lee, J; Lee, C

    2008-01-01

    Genetic variability might contribute to species specificity of prion diseases in various organisms. In this study, structures of the prion protein gene (PRNP) and its amino acids were compared among species of which sequence data were available. Comparisons of PRNP DNA sequences among 12 species including human, chimpanzee, monkey, bovine, ovine, dog, mouse, rat, wallaby, opossum, chicken and zebrafish allowed us to identify candidate regulatory regions in intron 1 and 3'-untranslated region (UTR) in addition to the coding region. Highly conserved putative binding sites for transcription factors, such as heat shock factor 2 (HSF2) and myocite enhancer factor 2 (MEF2), were discovered in the intron 1. In 3'-UTR, the functional sequence (ATTAAA) for nucleus-specific polyadenylation was found in all the analysed species. The functional sequence (TTTTTAT) for maturation-specific polyadenylation was identically observed only in ovine, and one or two nucleotide mismatches in the other species. A comparison of the amino acid sequences in 53 species revealed a large sequence identity. Especially the octapeptide repeat region was observed in all the species but frog and zebrafish. Functional changes and susceptibility to prion diseases with various isoforms of prion protein could be caused by numeric variability and conformational changes discovered in the repeat sequences. PMID:18397498

  11. Antibody-specific model of amino acid substitution for immunological inferences from alignments of antibody sequences.

    PubMed

    Mirsky, Alexander; Kazandjian, Linda; Anisimova, Maria

    2015-03-01

    Antibodies are glycoproteins produced by the immune system as a dynamically adaptive line of defense against invading pathogens. Very elegant and specific mutational mechanisms allow B lymphocytes to produce a large and diversified repertoire of antibodies, which is modified and enhanced throughout all adulthood. One of these mechanisms is somatic hypermutation, which stochastically mutates nucleotides in the antibody genes, forming new sequences with different properties and, eventually, higher affinity and selectivity to the pathogenic target. As somatic hypermutation involves fast mutation of antibody sequences, this process can be described using a Markov substitution model of molecular evolution. Here, using large sets of antibody sequences from mice and humans, we infer an empirical amino acid substitution model AB, which is specific to antibody sequences. Compared with existing general amino acid models, we show that the AB model provides significantly better description for the somatic evolution of mice and human antibody sequences, as demonstrated on large next generation sequencing (NGS) antibody data. General amino acid models are reflective of conservation at the protein level due to functional constraints, with most frequent amino acids exchanges taking place between residues with the same or similar physicochemical properties. In contrast, within the variable part of antibody sequences we observed an elevated frequency of exchanges between amino acids with distinct physicochemical properties. This is indicative of a sui generis mutational mechanism, specific to antibody somatic hypermutation. We illustrate this property of antibody sequences by a comparative analysis of the network modularity implied by the AB model and general amino acid substitution models. We recommend using the new model for computational studies of antibody sequence maturation, including inference of alignments and phylogenetic trees describing antibody somatic hypermutation in

  12. Amino acid sequence of a vitamin K-dependent Ca2+-binding peptide from bovine prothrombin.

    PubMed

    Howard, J B; Fausch, M D

    1975-08-10

    The amino acid sequence of a 31-residue peptide from bovine prothrombin has been determined. This peptide has been shown to contain the vitamin K-dependent modification required for Ca2+ binding (Nelsestuen, G. L., and Suttie, J. W. (1973) Proc. Natl. Acad. Sci. U. S. A. 70, 3366-3370) and the modified amino acid, gamma-carboxyglutamic acid (Nelsestuen, G. L., Zytkovicz, T., and Howard, J. B. (1974) J. Biol. Chem. 249, 6347-6350). The peptide was shown to correspond to residues 12 to 42 of prothrombin. PMID:807581

  13. Amino acid sequences around the cysteine residues of rabbit muscle triose phosphate isomerase

    PubMed Central

    Miller, Janet C.; Waley, S. G.

    1971-01-01

    1. The nature of the subunits in rabbit muscle triose phosphate isomerase has been investigated. 2. Amino acid analyses show that there are five cysteine residues and two methionine residues/subunit. 3. The amino acid sequences around the cysteine residues have been determined; these account for about 75 residues. 4. Cleavage at the methionine residues with cyanogen bromide gave three fragments. 5. These results show that the subunits correspond to polypeptide chains, containing about 230 amino acid residues. The chains in triose phosphate isomerase seem to be shorter than those of other glycolytic enzymes. PMID:5165707

  14. Complete amino acid sequence of the Mu heavy chain of a human IgM immunoglobulin.

    PubMed

    Putnam, F W; Florent, G; Paul, C; Shinoda, T; Shimizu, A

    1973-10-19

    The amino acid sequence of the micro, chain of a human IgM immunoglobulin, including the location of all disulfide bridges and oligosaccharides, has been determined. The homology of the constant regions of immunoglobulin micro, gamma, alpha, and epsilon heavy chains reveals evolutionary relationships and suggests that two genes code for each heavy chain. PMID:4742735

  15. Draft Genome Sequence of the Butyric Acid Producer Clostridium tyrobutyricum Strain CIP I-776 (IFP923)

    PubMed Central

    Clément, Benjamin; Lopes Ferreira, Nicolas

    2016-01-01

    Here, we report the draft genome sequence of Clostridium tyrobutyricum CIP I-776 (IFP923), an efficient producer of butyric acid. The genome consists of a single chromosome of 3.19 Mb and provides useful data concerning the metabolic capacities of the strain. PMID:26941139

  16. Draft Genome Sequence of Perfluorooctane Acid-Degrading Bacterium Pseudomonas parafulva YAB-1

    PubMed Central

    Tang, Chongjian; Peng, Qingjing; Peng, Qingzhong

    2015-01-01

    Pseudomonas parafulva YAB-1, isolated from perfluorinated compound-contaminated soil, has the ability to degrade perfluorooctane acid (PFOA) compound. Here, we report the draft genome sequence and annotation of the PFOA-degrading bacterium P. parafulva YAB-1. The data provide the basis to investigate the molecular mechanism of PFOA metabolism. PMID:26337877

  17. Sequence Learning in Infancy: The Independent Contributions of Conditional Probability and Pair Frequency Information

    ERIC Educational Resources Information Center

    Marcovitch, Stuart; Lewkowicz, David J.

    2009-01-01

    The ability to perceive sequences is fundamental to cognition. Previous studies have shown that infants can learn visual sequences as early as 2 months of age and it has been suggested that this ability is mediated by sensitivity to conditional probability information. Typically, conditional probability information has covaried with frequency…

  18. The amino acid sequence of cytochrome c-555 from the methane-oxidizing bacterium Methylococcus capsulatus.

    PubMed Central

    Ambler, R P; Dalton, H; Meyer, T E; Bartsch, R G; Kamen, M D

    1986-01-01

    The amino acid sequence of the cytochrome c-555 from the obligate methanotroph Methylococcus capsulatus strain Bath (N.C.I.B. 11132) was determined. It is a single polypeptide chain of 96 residues, binding a haem group through the cysteine residues at positions 19 and 22, and the only methionine residue is a position 59. The sequence does not closely resemble that of any other cytochrome c that has yet been characterized. Detailed evidence for the amino acid sequence of the protein has been deposited as Supplementary Publication SUP 50131 (12 pages) at the British Library Lending Division, Boston Spa, West Yorkshire LS23 7BQ, U.K., from whom copies are available on prepayment. PMID:3006666

  19. Allelic polymorphism in arabian camel ribonuclease and the amino acid sequence of bactrian camel ribonuclease.

    PubMed

    Welling, G W; Mulder, H; Beintema, J J

    1976-04-01

    Pancreatic ribonucleases from several species (whitetail deer, roe deer, guinea pig, and arabian camel) exhibit more than one amino acid at particular positions in their amino acid sequences. Since these enzymes were isolated from pooled pancreas, the origin of this heterogeneity is not clear. The pancreatic ribonucleases from 11 individual arabian camels (Camelus dromedarius) have been investigated with respect to the lysine-glutamine heterogeneity at position 103 (Welling et al., 1975). Six ribonucleases showed only one basic band and five showed two bands after polyacrylamide gel electrophoresis, suggesting a gene frequency of about 0.75 for the Lys gene and about 0.25 for the Gln gene. The amino acid sequence of bactrian camel (Camelus bactrianus) ribonuclease isolated from individual pancreatic tissue was determined and compared with that of arabian camel ribonuclease. The only difference was observed at position 103. In the ribonucleases from two unrelated bactrian camels, only glutamine was observed at that position. PMID:962846

  20. Enzymatic generation of peptides flanked by basic amino acids to obtain MS/MS spectra with 2× sequence coverage

    PubMed Central

    Ebhardt, H Alexander; Nan, Jie; Chaulk, Steven G; Fahlman, Richard P; Aebersold, Ruedi

    2014-01-01

    RATIONALE Tandem mass (MS/MS) spectra generated by collision-induced dissociation (CID) typically lack redundant peptide sequence information in the form of e.g. b- and y-ion series due to frequent use of sequence-specific endopeptidases cleaving C- or N-terminal to Arg or Lys residues. METHODS Here we introduce arginyl-tRNA protein transferase (ATE, EC 2.3.2.8) for proteomics. ATE recognizes acidic amino acids or oxidized Cys at the N-terminus of a substrate peptide and conjugates an arginine from an aminoacylated tRNAArg onto the N-terminus of the substrate peptide. This enzymatic reaction is carried out under physiological conditions and, in combination with Lys-C/Asp-N double digest, results in arginylated peptides with basic amino acids on both termini. RESULTS We demonstrate that in vitro arginylation of peptides using yeast arginyl tRNA protein transferase 1 (yATE1) is a robust enzymatic reaction, specific to only modifying N-terminal acidic amino acids. Precursors originating from arginylated peptides generally have an increased protonation state compared with their non-arginylated forms. Furthermore, the product ion spectra of arginylated peptides show near complete 2× fragment ladders within the same MS/MS spectrum using commonly available electrospray ionization peptide fragmentation modes. Unexpectedly, arginylated peptides generate complete y- and c-ion series using electron transfer dissociation (ETD) despite having an internal proline residue. CONCLUSIONS We introduce a rapid enzymatic method to generate peptides flanked on either terminus by basic amino acids, resulting in a rich, redundant MS/MS fragment pattern. © 2014 The Authors. Rapid Communications in Mass Spectrometry published by John Wiley & Sons Ltd. PMID:25380496

  1. Software scripts for quality checking of high-throughput nucleic acid sequencers.

    PubMed

    Lazo, G R; Tong, J; Miller, R; Hsia, C; Rausch, C; Kang, Y; Anderson, O D

    2001-06-01

    We have developed a graphical interface to allow the researcher to view and assess the quality of sequencing results using a series of program scripts developed to process data generated by automated sequencers. The scripts are written in Perl programming language and are executable under the cgibin directory of a Web server environment. The scripts direct nucleic acid sequencing trace file data output from automated sequencers to be analyzed by the phred molecular biology program and are displayed as graphical hypertext mark-up language (HTML) pages. The scripts are mainly designed to handle 96-well microtiter dish samples, but the scripts are also able to read data from 384-well microtiter dishes 96 samples at a time. The scripts may be customized for different laboratory environments and computer configurations. Web links to the sources and discussion page are provided. PMID:11414222

  2. Always look on both sides: Phylogenetic information conveyed by simple sequence repeat allele sequences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Simple sequence repeat (SSR) markers are widely used tools for inferences about genetic diversity, phylogeography and spatial genetic structure. Their applications assume that variation among alleles is essentially caused by an expansion or contraction of the number of repeats and that, accessorily,...

  3. ENTPRISE: An Algorithm for Predicting Human Disease-Associated Amino Acid Substitutions from Sequence Entropy and Predicted Protein Structures

    PubMed Central

    Zhou, Hongyi; Gao, Mu; Skolnick, Jeffrey

    2016-01-01

    The advance of next-generation sequencing technologies has made exome sequencing rapid and relatively inexpensive. A major application of exome sequencing is the identification of genetic variations likely to cause Mendelian diseases. This requires processing large amounts of sequence information and therefore computational approaches that can accurately and efficiently identify the subset of disease-associated variations are needed. The accuracy and high false positive rates of existing computational tools leave much room for improvement. Here, we develop a boosted tree regression machine-learning approach to predict human disease-associated amino acid variations by utilizing a comprehensive combination of protein sequence and structure features. On comparing our method, ENTPRISE, to the state-of-the-art methods SIFT, PolyPhen-2, MUTATIONASSESSOR, MUTATIONTASTER, FATHMM, ENTPRISE exhibits significant improvement. In particular, on a testing dataset consisting of only proteins with balanced disease-associated and neutral variations defined as having the ratio of neutral/disease-associated variations between 0.3 and 3, the Mathews Correlation Coefficient by ENTPRISE is 0.493 as compared to 0.432 by PPH2-HumVar, 0.406 by SIFT, 0.403 by MUTATIONASSESSOR, 0.402 by PPH2-HumDiv, 0.305 by MUTATIONTASTER, and 0.181 by FATHMM. ENTPRISE is then applied to nucleic acid binding proteins in the human proteome. Disease-associated predictions are shown to be highly correlated with the number of protein-protein interactions. Both these predictions and the ENTPRISE server are freely available for academic users as a web service at http://cssb.biology.gatech.edu/entprise/. PMID:26982818

  4. Nucleotide and predicted amino acid sequences of cloned human and mouse preprocathepsin B cDNAs.

    PubMed Central

    Chan, S J; San Segundo, B; McCormick, M B; Steiner, D F

    1986-01-01

    Cathepsin B is a lysosomal thiol proteinase that may have additional extralysosomal functions. To further our investigations on the structure, mode of biosynthesis, and intracellular sorting of this enzyme, we have determined the complete coding sequences for human and mouse preprocathepsin B by using cDNA clones isolated from human hepatoma and kidney phage libraries. The nucleotide sequences predict that the primary structure of preprocathepsin B contains 339 amino acids organized as follows: a 17-residue NH2-terminal prepeptide sequence followed by a 62-residue propeptide region, 254 residues in mature (single chain) cathepsin B, and a 6-residue extension at the COOH terminus. A comparison of procathepsin B sequences from three species (human, mouse, and rat) reveals that the homology between the propeptides is relatively conserved with a minimum of 68% sequence identity. In particular, two conserved sequences in the propeptide that may be functionally significant include a potential glycosylation site and the presence of a single cysteine at position 59. Comparative analysis of the three sequences also suggests that processing of procathepsin B is a multistep process, during which enzymatically active intermediate forms may be generated. The availability of the cDNA clones will facilitate the identification of possible active or inactive intermediate processive forms as well as studies on the transcriptional regulation of the cathepsin B gene. PMID:3463996

  5. How Can Psychological Science Inform Research About Genetic Counseling for Clinical Genomic Sequencing?

    PubMed Central

    Rini, Christine; Bernhardt, Barbara A.; Roberts, J. Scott; Christensen, Kurt D.; Evans, James P.; Brothers, Kyle B.; Roche, Myra I.; Berg, Jonathan S.; Henderson, Gail E.

    2016-01-01

    Next generation genomic sequencing technologies (including whole genome or whole exome sequencing) are being increasingly applied to clinical care. Yet, the breadth and complexity of sequencing information raise questions about how best to communicate and return sequencing information to patients and families in ways that facilitate comprehension and optimal health decisions. Obtaining answers to such questions will require multidisciplinary research. In this paper, we focus on how psychological science research can address questions related to clinical genomic sequencing by explaining emotional, cognitive, and behavioral processes in response to different types of genomic sequencing information (e.g., diagnostic results and incidental findings). We highlight examples of psychological science that can be applied to genetic counseling research to inform the following questions: (1) What factors influence patients' and providers' informational needs for developing an accurate understanding of what genomic sequencing results do and do not mean?; (2) How and by whom should genomic sequencing results be communicated to patients and their family members?; and (3) How do patients and their families respond to uncertainties related to genomic information? PMID:25488723

  6. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Patel, Kamlesh D [Ken]; SNL,

    2013-01-25

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  7. The amino acid sequence of ribonuclease U2 from Ustilago sphaerogena.

    PubMed Central

    Sato, S; Uchida, T

    1975-01-01

    1. RNAase (ribonuclease) U2, a purine-specific RNAase, was reduced, aminoethylated and hydrolysed with trypsin, chymotrypsin and thermolysin. On the basis of the analyses of the resulting peptides, the complete amino acid sequence of RNAase U2 was determined, 2. When the sequence was compared with the amino acid sequence of RNAase T1 (EC 3.1.4.8), the following regions were found to be similar in the two enzymes; Tyr-Pro-His-Gln-Tyr (38-42) in RNAase U2 and Tyr-Pro-His-Lys-Tyr (38-42) in RNAase T1, Glu-Phe-Pro-Leu-Val (61-65) in RNAase U2 and Glu-Trp-Pro-Ile-Leu (58-62) in RNAase T1, Asp-Arg-Val-Ile-Tyr-Gln (83-88) in RNAase U2 and Asp-Arg-Val-Phe-Asn (76-81) in RNAase T1 and Val-Thr-His-Thr-Gly-Ala (98-103) in RNAase U2 and Ile-Thr-His-Thr-Gly-Ala (90-95) in RNAase T1. All of the amino acid residues, histidine-40, glutamate-58, arginine-77 and histidine-92, which were found to play a crucial role in the biological activity of RNAase T1, were included in the regions cited here. 3. Detailed evidence for the amino acid sequence of the sequence of the proteins has been deposited as Supplementary Publication SUP 50041 (33 PAGES) AT THE British Library (Lending Division)(formerly the National Lending Library for Science and Technology), Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1975), 145, 5. PMID:1156364

  8. Amino acid alphabet reduction preserves fold information contained in contact interactions in proteins.

    PubMed

    Solis, Armando D

    2015-12-01

    To reduce complexity, understand generalized rules of protein folding, and facilitate de novo protein design, the 20-letter amino acid alphabet is commonly reduced to a smaller alphabet by clustering amino acids based on some measure of similarity. In this work, we seek the optimal alphabet that preserves as much of the structural information found in long-range (contact) interactions among amino acids in natively-folded proteins. We employ the Information Maximization Device, based on information theory, to partition the amino acids into well-defined clusters. Numbering from 2 to 19 groups, these optimal clusters of amino acids, while generated automatically, embody well-known properties of amino acids such as hydrophobicity/polarity, charge, size, and aromaticity, and are demonstrated to maintain the discriminative power of long-range interactions with minimal loss of mutual information. Our measurements suggest that reduced alphabets (of less than 10) are able to capture virtually all of the information residing in native contacts and may be sufficient for fold recognition, as demonstrated by extensive threading tests. In an expansive survey of the literature, we observe that alphabets derived from various approaches-including those derived from physicochemical intuition, local structure considerations, and sequence alignments of remote homologs-fare consistently well in preserving contact interaction information, highlighting a convergence in the various factors thought to be relevant to the folding code. Moreover, we find that alphabets commonly used in experimental protein design are nearly optimal and are largely coherent with observations that have arisen in this work. PMID:26407535

  9. Deduced amino acid sequence of human pulmonary surfactant proteolipid: SPL(pVal)

    SciTech Connect

    Whitsett, J.A.; Glasser, S.W.; Korfhagen, T.R.; Weaver, T.E.; Clark, J.; Pilot-Matias, T.; Meuth, J.; Fox, J.L.

    1987-05-01

    Hydrophobic, proteolipid-like protein of Mr 6500 was isolated from ether/ethanol extracts of human, canine and bovine pulmonary surfactant. Amino acid composition of the protein demonstrated a remarkable abundance of hydrophobic residues, particularly valine and leucine. The N-terminal amino acid sequence of the human protein was determined: N-Leu-Ile-Pro-Cys-Cys-Pro-Val-Asn-Leu-Lys-Arg-Leu-Leu-Ile-Val4... An oligonucleotide probe was used to screen an adult human lung cDNA library and resulted in detection of cDNA clones with predicted amino acid sequence with close identity to the N-terminal amino acid sequence of the human peptide. SPL(pVal) was found within the reading frame of a larger peptide. SPL(pVal) results from proteolytic processing of a larger preprotein. Northern blot analysis detected in a single 1.0 kilobase SPL(pVal) RNA which was less abundant in fetal than in adult lung. Mixtures of purified canine and bovine SPL(pVal) and synthetic phospholipids display properties of rapid adsorption and surface tension lowering activity characteristic of surfactant. Human SPL(pVal) is a pulmonary surfactant proteolipid which may therefore be useful in combination with phospholipids and/or other surfactant proteins for the treatment of surfactant deficiency such as hyaline membrane disease in newborn infants.

  10. Complete nucleic acid sequence of Penaeus stylirostris densovirus (PstDNV) from India.

    PubMed

    Rai, Praveen; Safeena, Muhammed P; Karunasagar, Iddya; Karunasagar, Indrani

    2011-06-01

    Infectious hypodermal and hematopoietic necrosis virus (IHHNV) of shrimp, recently been classified as Penaeus stylirostris densovirus (PstDNV). The complete nucleic acid sequence of PstDNV from India was obtained by cloning and sequencing of different DNA fragment of the virus. The genome organisation of PstDNV revealed that there were three major coding domains: a left ORF (NS1) of 2001 bp, a mid ORF (NS2) of 1092 bp and a right ORF (VP) of 990 bp. The complete genome and amino acid sequences of three proteins viz., NS1, NS2 and VP were compared with the genomes of the virus reported from Hawaii, China and Mexico and with partial sequence available from isolates from different regions. The phylogenetic analysis of shrimp, insect and vertebrate parvovirus sequences showed that the Indian PstDNV isolate is phylogenetically more closely related to one of the three isolates from Taiwan (AY355307), and two isolates (AY362547 and AY102034) from Thailand. PMID:21402111

  11. Human liver type pyruvate kinase: complete amino acid sequence and the expression in mammalian cells.

    PubMed Central

    Tani, K; Fujii, H; Nagata, S; Miwa, S

    1988-01-01

    Pyruvate kinase (PK) has four isozymes (L, R, M1, M2) that are encoded by two different genes. Among these isozymes, abnormalities of liver (L)-type PK is considered to be associated with hereditary nonspherocytic hemolytic anemia in humans. We isolated and determined the full-length sequence of human L-type PK cDNA. The cDNA contains 1629 base pairs encoding 543 amino acids, 68 base pairs of 5'-noncoding sequence, and 734 base pairs of 3'-noncoding sequence. The similarity between human and rat L-type PK was 86.9% at the nucleotide sequence level and 92.4% at the amino acid sequence level. The full-length L-type PK cDNA was placed under the promoter of simian virus 40 and introduced into monkey COS cells. Human L-type PK activity was detected in the extract of COS cells by the classical PK electrophoresis method. Images PMID:3126495

  12. Human liver type pyruvate kinase: Complete amino acid sequence and the expression in mammalian cells

    SciTech Connect

    Tani, Kenzaburo; Nagata, Shigekazu ); Fujii, Hisaichi ); Miwa, Shiro )

    1988-03-01

    Pyruvate kinase (PK) has four isozymes (L, R, M{sub 1}, M{sub 2}) that are encoded by two different genes. Among these isozymes, abnormalities of liver (L)-type PK is considered to be associated with hereditary nonspherocytic hemolytic anemia in humans. The authors isolated and determined the full-length sequence of human L-type PK cDNA. The cDNA contains 1,629 base pairs encoding 543 amino acids, 68 base pairs of 5{prime}-noncoding sequence, and 734 base pairs of 3{prime}-noncoding sequence. The similarity between human and rat L-type PK was 86.9% at the nucleotide sequence level and 92.4% at the amino acid sequence level. The full-length L-type PK cDNA was placed under the promoter of simian virus 40 and introduced into monkey COS cells. Human L-type PK activity was detected in the extract of COS cells by the classical PK electrophoresis method.

  13. Molecular cytogenetics by polymerase catalyzed amplification or in situ labelling of specific nucleic acid sequences

    SciTech Connect

    Bolund, L.; Brandt, C.; Hindkjaer, J.; Koch, J.; Koelvraa, S.; Pedersen, S. )

    1993-01-01

    The Polymerase Chain Reaction (PCR) can be performed on isolated cells or chromosomes and the product can be analyzed by DNA technology or by FISH to test metaphases. The authors have good experiences analyzing aberrant chromosomes by FACS sorting, PCR with degenerated primers and painting of test metaphases with the PCR product. They also utilize polymerases for PRimed IN Situ labelling (PRINS) of specific nucleic acid sequences. In PRINS oligonucleotides are hybridized to their target sequences and labeled nucleotides are incorporated at the site of hybridization with the oligonucleotide as primer. PRINS may eventually allow the study of individual genes, gene expression and even somatic mutations (in mRNA) in single cells.

  14. DNA Cloning of Plasmodium falciparum Circumsporozoite Gene: Amino Acid Sequence of Repetitive Epitope

    NASA Astrophysics Data System (ADS)

    Enea, Vincenzo; Ellis, Joan; Zavala, Fidel; Arnot, David E.; Asavanich, Achara; Masuda, Aoi; Quakyi, Isabella; Nussenzweig, Ruth S.

    1984-08-01

    A clone of complementary DNA encoding the circumsporozoite (CS) protein of the human malaria parasite Plasmodium falciparum has been isolated by screening an Escherichia coli complementary DNA library with a monoclonal antibody to the CS protein. The DNA sequence of the complementary DNA insert encodes a four-amino acid sequence: proline-asparagine-alanine-asparagine, tandemly repeated 23 times. The CS β -lactamase fusion protein specifically binds monoclonal antibodies to the CS protein and inhibits the binding of these antibodies to native Plasmodium falciparum CS protein. These findings provide a basis for the development of a vaccine against Plasmodium falciparum malaria.

  15. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F.W.

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.

  16. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F. William

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.

  17. Partial amino acid sequence of apolipoprotein(a) shows that it is homologous to plasminogen

    SciTech Connect

    Eaton, D.L.; Fless, G.M.; Kohr, W.J.; McLean, J.W.; Xu, Q.T.; Miller, C.G.; Lawn, R.M.; Scanu, A.M.

    1987-05-01

    Apolipoprotein(a) (apo(a)) is a glycoprotein with M/sub r/ approx. 280,000 that is disulfide linked to apolipoprotein B in lipoprotein(a) particles. Elevated plasma levels of lipoprotein(a) are correlated with atherosclerosis. Partial amino acid sequence of apo(a) shows that it has striking homology to plasminogen. Plasminogen is a plasma serine protease zymogen that consists of five homologous and tandemly repeated domains called kringles and a trypsin-like protease domain. The amino-terminal sequence obtained for apo(a) is homologous to the beginning of kringle 4 but not the amino terminus of plasminogen. Apo(a) was subjected to limited proteolysis by trypsin or V8 protease, and fragments generated were isolated and sequenced. Sequences obtained from several of these fragments are highly (77-100%) homologous to plasminogen residues 391-421, which reside within kringle 4. Analysis of these internal apo(a) sequences revealed that apo(a) may contain at least two kringle 4-like domains. A sequence obtained from another tryptic fragment also shows homology to the end of kringle 4 and the beginning of kringle 5. Sequence data obtained from the two tryptic fragments shows homology with the protease domain of plasminogen. One of these sequences is homologous to the sequences surrounding the activation site of plasminogen. Plasminogen is activated by the cleavage of a specific arginine residue by urokinase and tissue plasminogen activator; however, the corresponding site in apo(a) is a serine that would not be cleaved by tissue plasminogen activator or urokinase. Using a plasmin-specific assay, no proteolytic activity could be demonstrated for lipoprotein(a) particles. These results suggest that apo(a) contains kringle-like domains and an inactive protease domain.

  18. FASTR: A novel data format for concomitant representation of RNA sequence and secondary structure information.

    PubMed

    Bose, Tungadri; Dutta, Anirban; Mh, Mohammed; Gandhi, Hemang; Mande, Sharmila S

    2015-09-01

    Given the importance of RNA secondary structures in defining their biological role, it would be convenient for researchers seeking RNA data if both sequence and structural information pertaining to RNA molecules are made available together. Current nucleotide data repositories archive only RNA sequence data. Furthermore, storage formats which can frugally represent RNA sequence as well as structure data in a single file, are currently unavailable. This article proposes a novel storage format, 'FASTR', for concomitant representation of RNA sequence and structure. The storage efficiency of the proposed FASTR format has been evaluated using RNA data from various microorganisms. Results indicate that the size of FASTR formatted files (containing both RNA sequence as well as structure information) are equivalent to that of FASTA-format files, which contain only RNA sequence information. RNA secondary structure is typically represented using a combination of a string of nucleotide characters along with the corresponding dot-bracket notation indicating structural attributes. 'FASTR' - the novel storage format proposed in the present study enables a frugal representation of both RNA sequence and structural information in the form of a single string. In spite of having a relatively smaller storage footprint, the resultant 'fastr' string(s) retain all sequence as well as secondary structural information that could be stored using a dot-bracket notation. An implementation of the 'FASTR' methodology is available for download at http://metagenomics.atc.tcs.com/compression/fastr. PMID:26333403

  19. PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and 3-dimensional structural information

    PubMed Central

    Pei, Jimin; Grishin, Nick V.

    2015-01-01

    SUMMARY Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remains a difficult computational task. The constantly increasing protein sequences and structures in public databases could be used to improve alignment quality. PROMALS3D is a tool for protein MSA construction enhanced with additional evolutionary and structural information from database searches. PROMALS3D automatically identifies homologs from sequence and structure databases for input proteins, derives structure-based constraints from alignments of 3-dimensional structures, and combines them with sequence-based constraints of profile-profile alignments in a consistency-based framework to construct high-quality multiple sequence alignments. PROMALS3D output is a consensus alignment enriched with sequence and structural information about input proteins and their homologs. PROMALS3D web server and package are available at http://prodata.swmed.edu/PROMALS3D. PMID:24170408

  20. PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural information.

    PubMed

    Pei, Jimin; Grishin, Nick V

    2014-01-01

    Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remains a difficult computational task. The constantly increasing protein sequences and structures in public databases could be used to improve alignment quality. PROMALS3D is a tool for protein MSA construction enhanced with additional evolutionary and structural information from database searches. PROMALS3D automatically identifies homologs from sequence and structure databases for input proteins, derives structure-based constraints from alignments of three-dimensional structures, and combines them with sequence-based constraints of profile-profile alignments in a consistency-based framework to construct high-quality multiple sequence alignments. PROMALS3D output is a consensus alignment enriched with sequence and structural information about input proteins and their homologs. PROMALS3D Web server and package are available at http://prodata.swmed.edu/PROMALS3D. PMID:24170408

  1. On human disease-causing amino acid variants: statistical study of sequence and structural patterns

    PubMed Central

    Alexov, Emil

    2015-01-01

    Statistical analysis was carried out on large set of naturally occurring human amino acid variations and it was demonstrated that there is a preference for some amino acid substitutions to be associated with diseases. At an amino acid sequence level, it was shown that the disease-causing variants frequently involve drastic changes of amino acid physico-chemical properties of proteins such as charge, hydrophobicity and geometry. Structural analysis of variants involved in diseases and being frequently observed in human population showed similar trends: disease-causing variants tend to cause more changes of hydrogen bond network and salt bridges as compared with harmless amino acid mutations. Analysis of thermodynamics data reported in literature, both experimental and computational, indicated that disease-causing variants tend to destabilize proteins and their interactions, which prompted us to investigate the effects of amino acid mutations on large databases of experimentally measured energy changes in unrelated proteins. Although the experimental datasets were linked neither to diseases nor exclusory to human proteins, the observed trends were the same: amino acid mutations tend to destabilize proteins and their interactions. Having in mind that structural and thermodynamics properties are interrelated, it is pointed out that any large change of any of them is anticipated to cause a disease. PMID:25689729

  2. Improved peptide elution time prediction for reversed-phase liquid chromatography-MS by incorporating peptide sequence information

    SciTech Connect

    Petritis, Konstantinos; Kangas, Lars J.; Yan, Bo; Monroe, Matthew E.; Strittmatter, Eric F.; Qian, Weijun; Adkins, Joshua N.; Moore, Ronald J.; Xu, Ying; Lipton, Mary S.; Camp, David G.; Smith, Richard D.

    2006-07-15

    We describe an improved artificial neural network (ANN)-based method for predicting peptide retention times in reversed phase liquid chromatography. In addition to the peptide amino acid composition, this study investigated several other peptide descriptors to improve the predictive capability, such as peptide length, sequence, hydrophobicity and hydrophobic moment, and nearest neighbor amino acid, as well as peptide predicted structural configurations (i.e., helix, sheet, coil). An ANN architecture that consisted of 1052 input nodes, 24 hidden nodes, and 1 output node was used to fully consider the amino acid residue sequence in each peptide. The network was trained using {approx}345,000 non-redundant peptides identified from a total of 12,059 LC-MS/MS analyses of more than 20 different organisms, and the predictive capability of the model was tested using 1303 confidently identified peptides that were not included in the training set. The model demonstrated an average elution time precision of {approx}1.5% and was able to distinguish among isomeric peptides based upon the inclusion of peptide sequence information. The prediction power represents a significant improvement over our earlier report (Petritis et al., Anal. Chem. 2003, 75, 1039-1048) and other previously reported models.

  3. Sequence of morphological transitions in two-dimensional pattern growth from aqueous ascorbic Acid solutions.

    PubMed

    Paranjpe, A S

    2002-08-12

    A sequence of morphological transitions in two-dimensional dehydration patterns of aqueous solutions of ascorbic acid is observed with humidity as a control parameter. Change in morphology occurs due to humidity induced variation in the concentration of the metastable supersaturated solution phase formed after initial solvent evaporation. As percent humidity is varied from 40 to 80, patterns change from compact circular --> radial --> density modulated radial (a new morphology) --> density modulated circular --> density modulated dendritic (a new morphology) --> dense branching. PMID:12190528

  4. Snake venom. The amino acid sequence of protein A from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Joubert, F J; Strydom, D J

    1980-12-01

    Protein A from Dendroaspis polylepis polylepis venom comprises 81 amino acids, including ten half-cystine residues. The complete primary structures of protein A and its variant A' were elucidated. The sequences of proteins A and A', which differ in a single position, show no homology with various neurotoxins and non-neurotoxic proteins and represent a new type of elapid venom protein. PMID:7461607

  5. Ehapp2: Estimate haplotype frequencies from pooled sequencing data with prior database information.

    PubMed

    Cao, Chang-Chang; Sun, Xiao

    2016-08-01

    To reduce the cost of large-scale re-sequencing, multiple individuals are pooled together and sequenced called pooled sequencing. Pooled sequencing could provide a cost-effective alternative to sequencing individuals separately. To facilitate the application of pooled sequencing in haplotype-based diseases association analysis, the critical procedure is to accurately estimate haplotype frequencies from pooled samples. Here we present Ehapp2 for estimating haplotype frequencies from pooled sequencing data by utilizing a database which provides prior information of known haplotypes. We first translate the problem of estimating frequency for each haplotype into finding a sparse solution for a system of linear equations, where the NNREG algorithm is employed to achieve the solution. Simulation experiments reveal that Ehapp2 is robust to sequencing errors and able to estimate the frequencies of haplotypes with less than 3% average relative difference for pooled sequencing of mixture of real Drosophila haplotypes with 50× total coverage even when the sequencing error rate is as high as 0.05. Owing to the strategy that proportions for local haplotypes spanning multiple SNPs are accurately calculated first, Ehapp2 retains excellent estimation for recombinant haplotypes resulting from chromosomal crossover. Comparisons with present methods reveal that Ehapp2 is state-of-the-art for many sequencing study designs and more suitable for current massive parallel sequencing. PMID:27216711

  6. Identifying the Critical Time Period for Information Extraction when Recognizing Sequences of Play

    ERIC Educational Resources Information Center

    North, Jamie S.; Williams, A. Mark

    2008-01-01

    The authors attempted to determine the critical time period for information extraction when recognizing play sequences in soccer. Although efforts have been made to identify the perceptual information underpinning such decisions, no researchers have attempted to determine "when" this information may be extracted from the display. The authors…

  7. Characterization of the microbial acid mine drainage microbial community using culturing and direct sequencing techniques.

    PubMed

    Auld, Ryan R; Myre, Maxine; Mykytczuk, Nadia C S; Leduc, Leo G; Merritt, Thomas J S

    2013-05-01

    We characterized the bacterial community from an AMD tailings pond using both classical culturing and modern direct sequencing techniques and compared the two methods. Acid mine drainage (AMD) is produced by the environmental and microbial oxidation of minerals dissolved from mining waste. Surprisingly, we know little about the microbial communities associated with AMD, despite the fundamental ecological roles of these organisms and large-scale economic impact of these waste sites. AMD microbial communities have classically been characterized by laboratory culturing-based techniques and more recently by direct sequencing of marker gene sequences, primarily the 16S rRNA gene. In our comparison of the techniques, we find that their results are complementary, overall indicating very similar community structure with similar dominant species, but with each method identifying some species that were missed by the other. We were able to culture the majority of species that our direct sequencing results indicated were present, primarily species within the Acidithiobacillus and Acidiphilium genera, although estimates of relative species abundance were only obtained from direct sequencing. Interestingly, our culture-based methods recovered four species that had been overlooked from our sequencing results because of the rarity of the marker gene sequences, likely members of the rare biosphere. Further, direct sequencing indicated that a single genus, completely missed in our culture-based study, Legionella, was a dominant member of the microbial community. Our results suggest that while either method does a reasonable job of identifying the dominant members of the AMD microbial community, together the methods combine to give a more complete picture of the true diversity of this environment. PMID:23485423

  8. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... approved by the Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51... base or modified or unusual amino acid may be presented in a given sequence as the corresponding unmodified base or amino acid if the modified base or modified or unusual amino acid is one of those...

  9. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... approved by the Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51... base or modified or unusual amino acid may be presented in a given sequence as the corresponding unmodified base or amino acid if the modified base or modified or unusual amino acid is one of those...

  10. MSA-PAD: DNA multiple sequence alignment framework based on PFAM accessed domain information.

    PubMed

    Balech, Bachir; Vicario, Saverio; Donvito, Giacinto; Monaco, Alfonso; Notarangelo, Pasquale; Pesole, Graziano

    2015-08-01

    Here we present the MSA-PAD application, a DNA multiple sequence alignment framework that uses PFAM protein domain information to align DNA sequences encoding either single or multiple protein domains. MSA-PAD has two alignment options: gene and genome mode. PMID:25819080

  11. Nanopore Analysis of Nucleic Acids: Single-Molecule Studies of Molecular Dynamics, Structure, and Base Sequence

    NASA Astrophysics Data System (ADS)

    Olasagasti, Felix; Deamer, David W.

    Nucleic acids are linear polynucleotides in which each base is covalently linked to a pentose sugar and a phosphate group carrying a negative charge. If a pore having roughly the crosssectional diameter of a single-stranded nucleic acid is embedded in a thin membrane and a voltage of 100 mV or more is applied, individual nucleic acids in solution can be captured by the electrical field in the pore and translocated through by single-molecule electrophoresis. The dimensions of the pore cannot accommodate anything larger than a single strand, so each base in the molecule passes through the pore in strict linear sequence. The nucleic acid strand occupies a large fraction of the pore's volume during translocation and therefore produces a transient blockade of the ionic current created by the applied voltage. If it could be demonstrated that each nucleotide in the polymer produced a characteristic modulation of the ionic current during its passage through the nanopore, the sequence of current modulations would reflect the sequence of bases in the polymer. According to this basic concept, nanopores are analogous to a Coulter counter that detects nanoscopic molecules rather than microscopic [1,2]. However, the advantage of nanopores is that individual macromolecules can be characterized because different chemical and physical properties affect their passage through the pore. Because macromolecules can be captured in the pore as well as translocated, the nanopore can be used to detect individual functional complexes that form between a nucleic acid and an enzyme. No other technique has this capability.

  12. Information transfer from peptide nucleic acids to RNA by template-directed syntheses

    NASA Technical Reports Server (NTRS)

    Schmidt, J. G.; Nielsen, P. E.; Orgel, L. E.; Bada, J. L. (Principal Investigator)

    1997-01-01

    Peptide nucleic acids (PNAs) are uncharged analogs of DNA and RNA in which the ribose-phosphate backbone is substituted by a backbone held together by amide bonds. PNAs are interesting as models of alternative genetic systems because they form potentially informational base paired helical structures. A PNA C10 oligomer has been shown to act as template for efficient formation of oligoguanylates from activated guanosine ribonucleotides. In a previous paper we used heterosequences of DNA as templates in sequence-dependent polymerization of PNA dimers. In this paper we show that information can be transferred from PNA to RNA. We describe the reactions of activated mononucleotides on heterosequences of PNA. Adenylic, cytidylic and guanylic acids were incorporated into the products opposite their complement on PNA, although less efficiently than on DNA templates.

  13. Complete amino acid sequence of a histidine-rich proteolytic fragment of human ceruloplasmin.

    PubMed

    Kingston, I B; Kingston, B L; Putnam, F W

    1979-04-01

    The complete amino acid sequence has been determined for a fragment of human ceruloplasmin [ferroxidase; iron(II):oxygen oxidoreductase, EC 1.16.3.1]. The fragment (designated Cp F5) contains 159 amino acid residues and has a molecular weight of 18,650; it lacks carbohydrate, is rich in histidine, and contains one free cysteine that may be part of a copper-binding site. This fragment is present in most commercial preparations of ceruloplasmin, probably owing to proteolytic degradation, but can also be obtained by limited cleavage of single-chain ceruloplasmin with plasmin. Cp F5 probably is an intact domain attached to the COOH-terminal end of single-chain ceruloplasmin via a labile interdomain peptide bond. A model of the secondary structure predicted by empirical methods suggests that almost one-third of the amino acid residues are distributed in alpha helices, about a third in beta-sheet structure, and the remainder in beta turns and unidentified structures. Computer analysis of the amino acid sequence has not demonstrated a statistically significant relationship between this ceruloplasmin fragment and any other protein, but there is some evidence for an internal duplication. PMID:287005

  14. The amino acid sequence of Lady Amherst's pheasant (Chrysolophus amherstiae) and golden pheasant (Chrysolophus pictus) egg-white lysozymes.

    PubMed

    Araki, T; Kuramoto, M; Torikata, T

    1990-09-01

    The amino acids of Lady Amherst's pheasant and golden pheasant egg-white lysozymes have been sequenced. The carboxymethylated lysozymes were digested with trypsin followed by sequencing of the tryptic peptides. Lady Amherst's pheasant lysozyme proved to consist of 129 amino acid residues, and a relative molecular mass of 14,423 Da was calculated. This lysozyme had 6 amino acids substitutions when compared with hen egg-white lysozyme: Phe3 to Tyr, His15 to Leu, Gln41 to His, Asn77 to His, Gln 121 to Asn, and a newly found substitution of Ile124 to Thr. The amino acid sequence of golden pheasant lysozyme was identical to that of Lady Amherst's phesant lysozyme. The phylogenetic tree constructured by the comparison of amino acid sequences of phasianoid birds lysozymes revealed a minimum genetic distance between these pheasants and the turkey-peafowl group. PMID:1368578

  15. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    PubMed Central

    Rhee, Mun Su; Moritz, Brélan E.; Xie, Gary; Glavina del Rio, T.; Dalin, E.; Tice, H.; Bruce, D.; Goodwin, L.; Chertkov, O.; Brettin, T.; Han, C.; Detter, C.; Pitluck, S.; Land, Miriam L.; Patel, Milind; Ou, Mark; Harbrucker, Roberta; Ingram, Lonnie O.; Shanmugam, K. T.

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 °C and pH 5.0 and ferments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 °C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemicellulose. This bacterium is also considered as a potential probiotic. Complete genome sequence of a representative strain, B. coagulans strain 36D1, is presented and discussed. PMID:22675583

  16. Complete amino acid sequence of globin chains and biological activity of fragmented crocodile hemoglobin (Crocodylus siamensis).

    PubMed

    Srihongthong, Saowaluck; Pakdeesuwan, Anawat; Daduang, Sakda; Araki, Tomohiro; Dhiravisit, Apisak; Thammasirirak, Sompong

    2012-08-01

    Hemoglobin, α-chain, β-chain and fragmented hemoglobin of Crocodylus siamensis demonstrated both antibacterial and antioxidant activities. Antibacterial and antioxidant properties of the hemoglobin did not depend on the heme structure but could result from the compositions of amino acid residues and structures present in their primary structure. Furthermore, thirteen purified active peptides were obtained by RP-HPLC analyses, corresponding to fragments in the α-globin chain and the β-globin chain which are mostly located at the N-terminal and C-terminal parts. These active peptides operate on the bacterial cell membrane. The globin chains of Crocodylus siamensis showed similar amino acids to the sequences of Crocodylus niloticus. The novel amino acid substitutions of α-chain and β-chain are not associated with the heme binding site or the bicarbonate ion binding site, but could be important through their interactions with membranes of bacteria. PMID:22648692

  17. [Partial sequence homology of FtsZ in phylogenetics analysis of lactic acid bacteria].

    PubMed

    Zhang, Bin; Dong, Xiu-zhu

    2005-10-01

    FtsZ is a structurally conserved protein, which is universal among the prokaryotes. It plays a key role in prokaryote cell division. A partial fragment of the ftsZ gene about 800bp in length was amplified and sequenced and a partial FtsZ protein phylogenetic tree for the lactic acid bacteria was constructed. By comparing the FtsZ phylogenetic tree with the 16S rDNA tree, it was shown that the two trees were similar in topology. Both trees revealed that Pediococcus spp. were closely related with L. casei group of Lactobacillus spp. , but less related with other lactic acid cocci such as Enterococcus and Streptococcus. The results also showed that the discriminative power of FtsZ was higher than that of 16S rDNA for either inter-species or inter-genus and could be a very useful tool in species identification of lactic acid bacteria. PMID:16342751

  18. Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids.

    PubMed

    Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

    2010-04-01

    Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279-284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614

  19. Genome sequence of the acid-tolerant Burkholderia sp. strain WSM2232 from Karijini National Park, Australia

    PubMed Central

    Walker, Robert; Watkin, Elizabeth; Tian, Rui; Bräu, Lambert; O’Hara, Graham; Goodwin, Lynne; Han, James; Reddy, Tatiparthi; Huntemann, Marcel; Pati, Amrita; Woyke, Tanja; Mavromatis, Konstantinos; Markowitz, Victor; Ivanova, Natalia; Kyrpides, Nikos; Reeve, Wayne

    2013-01-01

    Burkholderia sp. strain WSM2232 is an aerobic, motile, Gram-negative, non-spore-forming acid-tolerant rod that was trapped in 2001 from acidic soil collected from Karijini National Park (Australia) using Gastrolobium capitatum as a host. WSM2232 was effective in nitrogen fixation with G. capitatum but subsequently lost symbiotic competence during long-term storage. Here we describe the features of Burkholderia sp. strain WSM2232, together with genome sequence information and its annotation. The 7,208,311 bp standard-draft genome is arranged into 72 scaffolds of 72 contigs containing 6,322 protein-coding genes and 61 RNA-only encoding genes. The loss of symbiotic capability can now be attributed to the loss of nodulation and nitrogen fixation genes from the genome. This rhizobial genome is one of 100 sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) project. PMID:25197442

  20. Protein sequence analysis by incorporating modified chaos game and physicochemical properties into Chou's general pseudo amino acid composition.

    PubMed

    Xu, Chunrui; Sun, Dandan; Liu, Shenghui; Zhang, Yusen

    2016-10-01

    In this contribution we introduced a novel graphical method to compare protein sequences. By mapping a protein sequence into 3D space based on codons and physicochemical properties of 20 amino acids, we are able to get a unique P-vector from the 3D curve. This approach is consistent with wobble theory of amino acids. We compute the distance between sequences by their P-vectors to measure similarities/dissimilarities among protein sequences. Finally, we use our method to analyze four datasets and get better results compared with previous approaches. PMID:27375218

  1. Extracting meaningful information from video sequences for intelligent searches.

    SciTech Connect

    Muguira, Maritza Rosa; Russ, Trina Denise

    2005-02-01

    Video and image data are knowledge-rich sources of information, but their utility for current and future systems is limited without autonomous methods for understanding and characterizing their content. Semantic-based video understanding may benefit systems dedicated to the detection of insiders, alarm patterns, unauthorized activities in material monitoring applications, etc. A direct benefit of this technology is not only intelligent alarm analysis, but the ability to browse and perform query-based searches for useful and interesting information after video data has been acquired and stored. These searches can provide a tremendous benefit for use in intelligence agency, government, military, and DOE site investigations. This report provides an initial investigation into the algorithms and methods needed to characterize and understand video content. Such algorithms include background modeling, detecting dynamic image regions, grouping dynamic pixels into coherent objects, and robust tracking strategies. With solid approaches for addressing these problems, analysis can be performed seeking to recognize distinctive objects and their motions leading to semantic-based video searches.

  2. Information measure for long-range correlated sequences: the case of the 24 human chromosomes.

    PubMed

    Carbone, A

    2013-01-01

    A new approach to estimate the Shannon entropy of a long-range correlated sequence is proposed. The entropy is written as the sum of two terms corresponding respectively to power-law (ordered) and exponentially (disordered) distributed blocks (clusters). The approach is illustrated on the 24 human chromosome sequences by taking the nucleotide composition as the relevant information to be encoded/decoded. Interestingly, the nucleotide composition of the ordered clusters is found, on the average, comparable to the one of the whole analyzed sequence, while that of the disordered clusters fluctuates. From the information theory standpoint, this means that the power-law correlated clusters carry the same information of the whole analysed sequence. Furthermore, the fluctuations of the nucleotide composition of the disordered clusters are linked to relevant biological properties, such as segmental duplications and gene density. PMID:24056670

  3. Bacteria obtained from a sequencing batch reactor that are capable of growth on dehydroabietic acid.

    PubMed Central

    Mohn, W W

    1995-01-01

    Eleven isolates capable of growth on the resin acid dehydroabietic acid (DhA) were obtained from a sequencing batch reactor designed to treat a high-strength process stream from a paper mill. The isolates belonged to two groups, represented by strains DhA-33 and DhA-35, which were characterized. In the bioreactor, bacteria like DhA-35 were more abundant than those like DhA-33. The population in the bioreactor of organisms capable of growth on DhA was estimated to be 1.1 x 10(6) propagules per ml, based on a most-probable-number determination. Analysis of small-subunit rRNA partial sequences indicated that DhA-33 was most closely related to Sphingomonas yanoikuyae (Sab = 0.875) and that DhA-35 was most closely related to Zoogloea ramigera (Sab = 0.849). Both isolates additionally grew on other abietanes, i.e., abietic and palustric acids, but not on the pimaranes, pimaric and isopimaric acids. For DhA-33 and DhA-35 with DhA as the sole organic substrate, doubling times were 2.7 and 2.2 h, respectively, and growth yields were 0.30 and 0.25 g of protein per g of DhA, respectively. Glucose as a cosubstrate stimulated growth of DhA-33 on DhA and stimulated DhA degradation by the culture. Pyruvate as a cosubstrate did not stimulate growth of DhA-35 on DhA and reduced the specific rate of DhA degradation of the culture. DhA induced DhA and abietic acid degradation activities in both strains, and these activities were heat labile. Cell suspensions of both strains consumed DhA at a rate of 6 mumol mg of protein-1 h-1.(ABSTRACT TRUNCATED AT 250 WORDS) PMID:7793937

  4. Nucleic and amino acid sequences relating to a novel transketolase, and methods for the expression thereof

    DOEpatents

    Croteau, Rodney Bruce; Wildung, Mark Raymond; Lange, Bernd Markus; McCaskill, David G.

    2001-01-01

    cDNAs encoding 1-deoxyxylulose-5-phosphate synthase from peppermint (Mentha piperita) have been isolated and sequenced, and the corresponding amino acid sequences have been determined. Accordingly, isolated DNA sequences (SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7) are provided which code for the expression of 1-deoxyxylulose-5-phosphate synthase from plants. In another aspect the present invention provides for isolated, recombinant DXPS proteins, such as the proteins having the sequences set forth in SEQ ID NO:4, SEQ ID NO:6 and SEQ ID NO:8. In other aspects, replicable recombinant cloning vehicles are provided which code for plant 1-deoxyxylulose-5-phosphate synthases, or for a base sequence sufficiently complementary to at least a portion of 1-deoxyxylulose-5-phosphate synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding a plant 1-deoxyxylulose-5-phosphate synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant 1-deoxyxylulose-5-phosphate synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant 1-deoxyxylulose-5-phosphate synthase may be used to obtain expression or enhanced expression of 1-deoxyxylulose-5-phosphate synthase in plants in order to enhance the production of 1-deoxyxylulose-5-phosphate, or its derivatives such as isopentenyl diphosphate (BP), or may be otherwise employed for the regulation or expression of 1-deoxyxylulose-5-phosphate synthase, or the production of its products.

  5. Novel method for PIK3CA mutation analysis: locked nucleic acid--PCR sequencing.

    PubMed

    Ang, Daphne; O'Gara, Rebecca; Schilling, Amy; Beadling, Carol; Warrick, Andrea; Troxell, Megan L; Corless, Christopher L

    2013-05-01

    Somatic mutations in PIK3CA are commonly seen in invasive breast cancer and several other carcinomas, occurring in three hotspots: codons 542 and 545 of exon 9 and in codon 1047 of exon 20. We designed a locked nucleic acid (LNA)-PCR sequencing assay to detect low levels of mutant PIK3CA DNA with attention to avoiding amplification of a pseudogene on chromosome 22 that has >95% homology to exon 9 of PIK3CA. We tested 60 FFPE breast DNA samples with known PIK3CA mutation status (48 cases had one or more PIK3CA mutations, and 12 were wild type) as identified by PCR-mass spectrometry. PIK3CA exons 9 and 20 were amplified in the presence or absence of LNA-oligonucleotides designed to bind to the wild-type sequences for codons 542, 545, and 1047, and partially suppress their amplification. LNA-PCR sequencing confirmed all 51 PIK3CA mutations; however, the mutation detection rate by standard Sanger sequencing was only 69% (35 of 51). Of the 12 PIK3CA wild-type cases, LNA-PCR sequencing detected three additional H1047R mutations in "normal" breast tissue and one E545K in usual ductal hyperplasia. Histopathological review of these three normal breast specimens showed columnar cell change in two (both with known H1047R mutations) and apocrine metaplasia in one. The novel LNA-PCR shows higher sensitivity than standard Sanger sequencing and did not amplify the known pseudogene. PMID:23541593

  6. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3.

    PubMed

    Wang, Xiaoyu; Chen, Meili; Xiao, Jingfa; Hao, Lirui; Crowley, David E; Zhang, Zhewen; Yu, Jun; Huang, Ning; Huo, Mingxin; Wu, Jiayan

    2015-01-01

    Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals. PMID:26301592

  7. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3

    PubMed Central

    Xiao, Jingfa; Hao, Lirui; Crowley, David E.; Zhang, Zhewen; Yu, Jun; Huang, Ning; Huo, Mingxin; Wu, Jiayan

    2015-01-01

    Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals. PMID:26301592

  8. Bile acid sulfotransferase I from rat liver sulfates bile acids and 3-hydroxy steroids: purification, N-terminal amino acid sequence, and kinetic properties.

    PubMed

    Barnes, S; Buchina, E S; King, R J; McBurnett, T; Taylor, K B

    1989-04-01

    A bile acid:3'phosphoadenosine-5'phosphosulfate:sulfotransferase (BAST I) from adult female rat liver cytosol has been purified 157-fold by a two-step isolation procedure. The N-terminal amino acid sequence of the 30,000 subunit has been determined for the first 35 residues. The Vmax of purified BAST I is 18.7 nmol/min per mg protein with N-(3-hydroxy-5 beta-cholanoyl)glycine (glycolithocholic acid) as substrate, comparable to that of the corresponding purified human BAST (Chen, L-J., and I. H. Segel, 1985. Arch. Biochem. Biophys. 241: 371-379). BAST I activity has a broad pH optimum from 5.5-7.5. Although maximum activity occurs with 5 mM MgCl2, Mg2+ is not essential for BAST I activity. The greatest sulfotransferase activity and the highest substrate affinity is observed with bile acids or steroids that have a steroid nucleus containing a 3 beta-hydroxy group and a 5-6 double bond or a trans A-B ring junction. These substrates have normal hyperbolic initial velocity curves with substrate inhibition occurring above 5 microM. Of the saturated 5 beta-bile acids, those with a single 3-hydroxy group are the most active. The addition of a second hydroxy group at the 6- or 7-position eliminates more than 99% of the activity. In contrast, 3 alpha,12 alpha-dihydroxy-5 beta-cholan-24-oic acid (deoxycholic acid) is an excellent substrate. The initial velocity curves for glycolithocholic and deoxycholic acid conjugates are sigmoidal rather than hyperbolic, suggestive of an allosteric effect. Maximum activity is observed at 80 microM for glycolithocholic acid. All substrates, bile acids and steroids, are inhibited by the 5 beta-bile acid, 3-keto-5 beta-cholanoic acid. The data suggest that BAST I is the same protein as hydrosteroid sulfotransferase 2 (Marcus, C. J., et al. 1980. Anal. Biochem. 107: 296-304). PMID:2754334

  9. Incorporating substrate sequence motifs and spatial amino acid composition to identify kinase-specific phosphorylation sites on protein three-dimensional structures

    PubMed Central

    2013-01-01

    Background Protein phosphorylation catalyzed by kinases plays crucial regulatory roles in cellular processes. Given the high-throughput mass spectrometry-based experiments, the desire to annotate the catalytic kinases for in vivo phosphorylation sites has motivated. Thus, a variety of computational methods have been developed for performing a large-scale prediction of kinase-specific phosphorylation sites. However, most of the proposed methods solely rely on the local amino acid sequences surrounding the phosphorylation sites. An increasing number of three-dimensional structures make it possible to physically investigate the structural environment of phosphorylation sites. Results In this work, all of the experimental phosphorylation sites are mapped to the protein entries of Protein Data Bank by sequence identity. It resulted in a total of 4508 phosphorylation sites containing the protein three-dimensional (3D) structures. To identify phosphorylation sites on protein 3D structures, this work incorporates support vector machines (SVMs) with the information of linear motifs and spatial amino acid composition, which is determined for each kinase group by calculating the relative frequencies of 20 amino acid types within a specific radial distance from central phosphorylated amino acid residue. After the cross-validation evaluation, most of the kinase-specific models trained with the consideration of structural information outperform the models considering only the sequence information. Furthermore, the independent testing set which is not included in training set has demonstrated that the proposed method could provide a comparable performance to other popular tools. Conclusion The proposed method is shown to be capable of predicting kinase-specific phosphorylation sites on 3D structures and has been implemented as a web server which is freely accessible at http://csb.cse.yzu.edu.tw/PhosK3D/. Due to the difficulty of identifying the kinase-specific phosphorylation

  10. Sequence-defined bioactive macrocycles via an acid-catalysed cascade reaction

    NASA Astrophysics Data System (ADS)

    Porel, Mintu; Thornlow, Dana N.; Phan, Ngoc N.; Alabi, Christopher A.

    2016-06-01

    Synthetic macrocycles derived from sequence-defined oligomers are a unique structural class whose ring size, sequence and structure can be tuned via precise organization of the primary sequence. Similar to peptides and other peptidomimetics, these well-defined synthetic macromolecules become pharmacologically relevant when bioactive side chains are incorporated into their primary sequence. In this article, we report the synthesis of oligothioetheramide (oligoTEA) macrocycles via a one-pot acid-catalysed cascade reaction. The versatility of the cyclization chemistry and modularity of the assembly process was demonstrated via the synthesis of >20 diverse oligoTEA macrocycles. Structural characterization via NMR spectroscopy revealed the presence of conformational isomers, which enabled the determination of local chain dynamics within the macromolecular structure. Finally, we demonstrate the biological activity of oligoTEA macrocycles designed to mimic facially amphiphilic antimicrobial peptides. The preliminary results indicate that macrocyclic oligoTEAs with just two-to-three cationic charge centres can elicit potent antibacterial activity against Gram-positive and Gram-negative bacteria.

  11. Unconventional amino acid sequence of the sun anemone (Stoichactis helianthus) polypeptide neurotoxin

    SciTech Connect

    Kem, W.; Dunn, B.; Parten, B.; Pennington, M.; Price, D.

    1986-05-01

    A 5000 dalton polypeptide neurotoxin (Sh-NI) purified by G50 Sephadex, P-cellulose, and SP-Sephadex chromatography was homogeneous by isoelectric focusing. Sh-NI was highly toxic to crayfish (LD/sub 50/ 0.6 ..mu..g/kg) but without effect upon mice at 15,000 ..mu..g/kg (i.p. injection). The reduced, /sup 3/H-carboxymethylated toxin and its fragments were subjected to automatic Edman degradation and the resulting PTH-amino acids were identified by HPLC, back hydrolysis, and scintillation counting. Peptides resulting from proteolytic (clostripain, staphylococcal protease) and chemical (tryptophan) cleavage were sequenced. The sequence is: AACKCDDEGPDIRTAPLTGTVDLGSCNAGWEKCASYYTIIADCCRKKK. This sequence differs considerably from the homologous Anemonia and Anthopleura toxins; many of the identical residues (6 half-cystines, G9, P10, R13, G19, G29, W30) are probably critical for folding rather than receptor recognition. However, the Sh-NI sequence closely resembles Radioanthus macrodactylus neurotoxin III and r. paumotensis II. The authors propose that Sh-NI and related Radioanthus toxins act upon a different site on the sodium channel.

  12. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, H.U.G.; Gray, J.W.

    1995-06-27

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers and probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity. 18 figs.

  13. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, Heinz-Ulrich G.; Gray, Joe W.

    1995-01-01

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers ard probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity.

  14. Detection of Nucleic Acids with Graphene Nanopores: Ab Initio Characterization of a Novel Sequencing Device

    NASA Astrophysics Data System (ADS)

    Nelson, Tammie; Zhang, Bo; Prezhdo, Oleg

    2010-03-01

    We report an ab initio study of the interaction of two nucleobases, cytosine and adenine, with a novel graphene nanopore device for detecting the base sequence of a single-stranded nucleic acid (ssDNA or RNA). The nucleobases were inserted into a pore in a graphene nanoribbon, and the electrical current and conductance spectra were calculated as functions of voltage applied across the nanoribbon. The conductance spectra and charge densities were analyzed in the presence of each nucleobase in the graphene nanopore. The results indicate that, due to significant differences in the conductance spectra, the proposed device has adequate sensitivity to discriminate between different nucleotides. Moreover, we show that the nucleotide conductance spectra is not affected by its orientation inside the graphene nanopore. The proposed technique may be extremely useful for real applications in developing ultrafast, low cost DNA sequencing methods.

  15. Species specific amino acid sequence-protein local structure relationships: An analysis in the light of a structural alphabet.

    PubMed

    de Brevern, Alexandre G; Joseph, Agnel Praveen

    2011-05-01

    Protein structure analysis and prediction methods are based on non-redundant data extracted from the available protein structures, regardless of the species from which the protein originates. Hence, these datasets represent the global knowledge on protein folds, which constitutes a generic distribution of amino acid sequence-protein structure (AAS-PS) relationships. In this study, we try to elucidate whether the AAS-PS relationship could possess specificities depending on the specie. For this purpose, we have chosen three different species: Saccharomyces cerevisiae, Plasmodium falciparum and Arabidopsis thaliana. We analyzed the AAS-PS behaviors of the proteins from these three species and compared it to the "expected" distribution of a classical non-redundant databank. With the classical secondary structure description, only slight differences in amino acid preferences could be observed. With a more precise description of local protein structures (Protein Blocks), significant changes could be highlighted. S. cerevisiae's AAS-PS relationship is close to the general distribution, while striking differences are observed in the case of A. thaliana. P. falciparum is the most distant one. This study presents some interesting view-points on AAS-PS relationship. Certain species exhibit unique preferences for amino acids to be associated with protein local structural elements. Thus, AAS-PS relationships are species dependent. These results can give useful insights for improving prediction methodologies which take the species specific information into account. PMID:21333657

  16. Three subsets of sequence complexity and their relevance to biopolymeric information.

    PubMed

    Abel, David L; Trevors, Jack T

    2005-01-01

    Genetic algorithms instruct sophisticated biological organization. Three qualitative kinds of sequence complexity exist: random (RSC), ordered (OSC), and functional (FSC). FSC alone provides algorithmic instruction. Random and Ordered Sequence Complexities lie at opposite ends of the same bi-directional sequence complexity vector. Randomness in sequence space is defined by a lack of Kolmogorov algorithmic compressibility. A sequence is compressible because it contains redundant order and patterns. Law-like cause-and-effect determinism produces highly compressible order. Such forced ordering precludes both information retention and freedom of selection so critical to algorithmic programming and control. Functional Sequence Complexity requires this added programming dimension of uncoerced selection at successive decision nodes in the string. Shannon information theory measures the relative degrees of RSC and OSC. Shannon information theory cannot measure FSC. FSC is invariably associated with all forms of complex biofunction, including biochemical pathways, cycles, positive and negative feedback regulation, and homeostatic metabolism. The algorithmic programming of FSC, not merely its aperiodicity, accounts for biological organization. No empirical evidence exists of either RSC of OSC ever having produced a single instance of sophisticated biological organization. Organization invariably manifests FSC rather than successive random events (RSC) or low-informational self-ordering phenomena (OSC). PMID:16095527

  17. Morphological tranformation of calcite crystal growth by prismatic "acidic" polypeptide sequences.

    SciTech Connect

    Kim, I; Giocondi, J L; Orme, C A; Collino, J; Evans, J S

    2007-02-13

    Many of the interesting mechanical and materials properties of the mollusk shell are thought to stem from the prismatic calcite crystal assemblies within this composite structure. It is now evident that proteins play a major role in the formation of these assemblies. Recently, a superfamily of 7 conserved prismatic layer-specific mollusk shell proteins, Asprich, were sequenced, and the 42 AA C-terminal sequence region of this protein superfamily was found to introduce surface voids or porosities on calcite crystals in vitro. Using AFM imaging techniques, we further investigate the effect that this 42 AA domain (Fragment-2) and its constituent subdomains, DEAD-17 and Acidic-2, have on the morphology and growth kinetics of calcite dislocation hillocks. We find that Fragment-2 adsorbs on terrace surfaces and pins acute steps, accelerates then decelerates the growth of obtuse steps, forms clusters and voids on terrace surfaces, and transforms calcite hillock morphology from a rhombohedral form to a rounded one. These results mirror yet are distinct from some of the earlier findings obtained for nacreous polypeptides. The subdomains Acidic-2 and DEAD-17 were found to accelerate then decelerate obtuse steps and induce oval rather than rounded hillock morphologies. Unlike DEAD-17, Acidic-2 does form clusters on terrace surfaces and exhibits stronger obtuse velocity inhibition effects than either DEAD-17 or Fragment-2. Interestingly, a 1:1 mixture of both subdomains induces an irregular polygonal morphology to hillocks, and exhibits the highest degree of acute step pinning and obtuse step velocity inhibition. This suggests that there is some interplay between subdomains within an intra (Fragment-2) or intermolecular (1:1 mixture) context, and sequence interplay phenomena may be employed by biomineralization proteins to exert net effects on crystal growth and morphology.

  18. JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms.

    PubMed

    Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim

    2015-01-01

    The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/. PMID:26424080

  19. JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms

    PubMed Central

    Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim

    2015-01-01

    The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/ PMID:26424080

  20. Fast computational methods for predicting protein structure from primary amino acid sequence

    DOEpatents

    Agarwal, Pratul Kumar

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  1. Amino-terminal amino acid sequence of the major structural polypeptides of avian retroviruses: sequence homology between reticuloendotheliosis virus p30 and p30s of mammalian retroviruses.

    PubMed Central

    Hunter, E; Bhown, A S; Bennett, J C

    1978-01-01

    The major structural polypeptides, p30 of reticuloendotheliosis virus (REV) (strain T) and p27 of avian sarcoma virus B77, have been compared with regard to amino acid composition. NH2-terminal amino acid sequence, and immunological crossreactions. The amino acid composition of the two polypeptides is distinct, and a comparison of the first 30 NH2-terminal amino acids of REV p30 with that for the first 25 of B77 p27 yields only three homologous residues. In competition radioimmunoassays the polypeptides show no crossreactivity. A comparison of the amino acid composition and NH2-terminal amino acid sequence of REV p30 with those reported for several mammalian retrovirus p30s shows remarkable similarities. Both REV and mammalian p30s contain a large number of polar residues in their amino acid composition and show approximately 40% homology in the first 30 NH2-terminal amino acids. No crossreactivity could be observed, however, in competition radioimmunoassays between Rauscher murine leukemia virus p30 and that of REV. The observations reported here suggest a close evolutionary relationship between REV and the mammalian retroviruses. Images PMID:208072

  2. Purification and amino acid sequence of aminopeptidase P from pig kidney.

    PubMed

    Vergas Romero, C; Neudorfer, I; Mann, K; Schäfer, W

    1995-04-01

    Aminopeptidase P from kidney cortex was purified in high yield (recovery greater than or equal to 20%) by a series of column chromatographic steps after solubilization of the membrane-bound glycoprotein with n-butanol. A coupled enzymic assay, using Gly-Pro-Pro-NH-Nap as substrate and dipeptidyl-peptidase IV as auxilliary enzyme, was used to monitor the purification. The purification procedure yielded two forms of aminopeptidase P differing in their carbohydrate composition (glycoforms). Both enzyme preparations were homogeneous as assessed by SDS/PAGE silver staining, and isoelectric focusing. Both forms possessed the same substrate specificity, catalysed the same reaction, and consisted of identical protein chains. The amino acid sequence determined by Edman degradation and mass spectrometry consisted of 623 amino acids. Six N-glycosylation sites, all contained in the N-terminal half of the protein, were characterized. PMID:7744038

  3. Draft Genome Sequence of Cupriavidus sp. Strain SK-3, a 4-Chlorobiphenyl- and 4-Clorobenzoic Acid-Degrading Bacterium

    PubMed Central

    Vilo, Claudia; Benedik, Michael J.; Ilori, Matthew

    2014-01-01

    We report the draft genome sequence of Cupriavidus sp. strain SK-3, which can use 4-chlorobiphenyl and 4-clorobenzoic acid as the sole carbon source for growth. The draft genome sequence allowed the study of the polychlorinated biphenyl degradation mechanism and the recharacterization of the strain SK-3 as a Cupriavidus species. PMID:24994805

  4. Draft Genome Sequence of Bacillus subtilis subsp. natto Strain CGMCC 2108, a High Producer of Poly-γ-Glutamic Acid

    PubMed Central

    Tan, Siyuan; Su, Anping; Zhang, Chen; Ren, Yuanyuan

    2016-01-01

    Here, we report the 4.1-Mb draft genome sequence of Bacillus subtilis subsp. natto strain CGMCC 2108, a high producer of poly-γ-glutamic acid (γ-PGA). This sequence will provide further help for the biosynthesis of γ-PGA and will greatly facilitate research efforts in metabolic engineering of B. subtilis subsp. natto strain CGMCC 2108. PMID:27231363

  5. New monoclonal antibodies to the Ebola virus glycoprotein: Identification and analysis of the amino acid sequence of the variable domains.

    PubMed

    Panina, A A; Aliev, T K; Shemchukova, O B; Dement'yeva, I G; Varlamov, N E; Pozdnyakova, L P; Bokov, M N; Dolgikh, D A; Sveshnikov, P G; Kirpichnikov, M P

    2016-03-01

    We determined the nucleotide and amino acid sequences of variable domains of three new monoclonal antibodies to the glycoprotein of Ebola virus capsid. The framework and hypervariable regions of immunoglobulin heavy and light chains were identified. The primary structures were confirmed using massspectrometry analysis. Immunoglobulin database search showed the uniqueness of the sequences obtained. PMID:27193713

  6. Draft Genome Sequence of Bacillus subtilis subsp. natto Strain CGMCC 2108, a High Producer of Poly-γ-Glutamic Acid.

    PubMed

    Tan, Siyuan; Meng, Yonghong; Su, Anping; Zhang, Chen; Ren, Yuanyuan

    2016-01-01

    Here, we report the 4.1-Mb draft genome sequence of Bacillus subtilis subsp. natto strain CGMCC 2108, a high producer of poly-γ-glutamic acid (γ-PGA). This sequence will provide further help for the biosynthesis of γ-PGA and will greatly facilitate research efforts in metabolic engineering of B. subtilis subsp. natto strain CGMCC 2108. PMID:27231363

  7. "Not Tied Up Neatly with a Bow": Professionals' Challenging Cases in Informed Consent for Genomic Sequencing.

    PubMed

    Tomlinson, Ashley N; Skinner, Debra; Perry, Denise L; Scollon, Sarah R; Roche, Myra I; Bernhardt, Barbara A

    2016-02-01

    As the use of genomic technology has expanded in research and clinical settings, issues surrounding informed consent for genome and exome sequencing have surfaced. Despite the importance of informed consent, little is known about the specific challenges that professionals encounter when consenting patients or research participants for genomic sequencing. We interviewed 29 genetic counselors and research coordinators with considerable experience obtaining informed consent for genomic sequencing to understand their experiences and perspectives. As part of this interview, 24 interviewees discussed an informed consent case they found particularly memorable or challenging. We analyzed these case examples to determine the primary issue or challenge represented by each case. Challenges fell into two domains: participant understanding, and facilitating decisions about testing or research participation. Challenges related to participant understanding included varying levels of general and genomic literacy, difficulty managing participant expectations, and contextual factors that impeded participant understanding. Challenges related to facilitating decision-making included complicated family dynamics such as disagreement or coercion, situations in which it was unclear whether sequencing research would be a good use of participant time or resources, and situations in which the professional experienced disagreement or discomfort with participant decisions. The issues highlighted in these case examples are instructive in preparing genetics professionals to obtain informed consent for genomic sequencing. PMID:25911622

  8. ANTICALIgN: visualizing, editing and analyzing combined nucleotide and amino acid sequence alignments for combinatorial protein engineering.

    PubMed

    Jarasch, Alexander; Kopp, Melanie; Eggenstein, Evelyn; Richter, Antonia; Gebauer, Michaela; Skerra, Arne

    2016-07-01

    ANTIC ALIGN: is an interactive software developed to simultaneously visualize, analyze and modify alignments of DNA and/or protein sequences that arise during combinatorial protein engineering, design and selection. ANTIC ALIGN: combines powerful functions known from currently available sequence analysis tools with unique features for protein engineering, in particular the possibility to display and manipulate nucleotide sequences and their translated amino acid sequences at the same time. ANTIC ALIGN: offers both template-based multiple sequence alignment (MSA), using the unmutated protein as reference, and conventional global alignment, to compare sequences that share an evolutionary relationship. The application of similarity-based clustering algorithms facilitates the identification of duplicates or of conserved sequence features among a set of selected clones. Imported nucleotide sequences from DNA sequence analysis are automatically translated into the corresponding amino acid sequences and displayed, offering numerous options for selecting reading frames, highlighting of sequence features and graphical layout of the MSA. The MSA complexity can be reduced by hiding the conserved nucleotide and/or amino acid residues, thus putting emphasis on the relevant mutated positions. ANTIC ALIGN: is also able to handle suppressed stop codons or even to incorporate non-natural amino acids into a coding sequence. We demonstrate crucial functions of ANTIC ALIGN: in an example of Anticalins selected from a lipocalin random library against the fibronectin extradomain B (ED-B), an established marker of tumor vasculature. Apart from engineered protein scaffolds, ANTIC ALIGN: provides a powerful tool in the area of antibody engineering and for directed enzyme evolution. PMID:27261456

  9. Formation Sequences of Iron Minerals in the Acidic Alteration Products and Variation of Hydrothermal Fluid Conditions

    NASA Astrophysics Data System (ADS)

    Isobe, H.; Yoshizawa, M.

    2008-12-01

    Iron minerals have important role in environmental issues not only on the Earth but also other terrestrial planets. Iron mineral species related to alteration products of primary minerals with surface or subsurface fluids are characterized by temperature, acidity and redox conditions of the fluids. We can see various iron- bearing alteration products in alteration products around fumaroles in geothermal/volcanic areas. In this study, zonal structures of iron minerals in alteration products of the geothermal area are observed to elucidate temporal and spatial variation of hydrothermal fluids. Alteration of the pyroxene-amphibole andesite of Garan-dake volcano, Oita, Japan occurs by the acidic hydrothermal fluid to form cristobalite leaching out elements other than Si. Hand specimens with unaltered or weakly altered core and cristobalite crust show various sequences of layers. XRD analysis revealed that the alteration degree is represented by abundance of cristobalite. Intermediately altered layers are characterized by occurrence including alunite, pyrite, kaolinite, goethite and hematite. A specimen with reddish brown core surrounded by cristobalite-rich white crust has brown colored layers at the boundary of core and the crust. Reddish core is characterized by occurrence of crystalline hematite by XRD. Another hand specimen has light gray core, which represents reduced conditions, and white cristobalite crust with light brown and reddish brown layers of ferric iron minerals between the core and the crust. On the other hand, hornblende crystals, typical ferrous iron-bearing mineral of the host rock, are well preserved in some samples with strongly decolorized cristobalite-rich groundmass. Hydrothermal alteration experiments of iron-rich basaltic material shows iron mineral species depend on acidity and temperature of the fluid. Oxidation states of the iron-bearing mineral species are strongly influenced by the acidity and redox conditions. Variations of alteration

  10. Multiple Amino Acid Sequence Alignment Nitrogenase Component 1: Insights into Phylogenetics and Structure-Function Relationships

    PubMed Central

    Howard, James B.; Kechris, Katerina J.; Rees, Douglas C.; Glazer, Alexander N.

    2013-01-01

    Amino acid residues critical for a protein's structure-function are retained by natural selection and these residues are identified by the level of variance in co-aligned homologous protein sequences. The relevant residues in the nitrogen fixation Component 1 α- and β-subunits were identified by the alignment of 95 protein sequences. Proteins were included from species encompassing multiple microbial phyla and diverse ecological niches as well as the nitrogen fixation genotypes, anf, nif, and vnf, which encode proteins associated with cofactors differing at one metal site. After adjusting for differences in sequence length, insertions, and deletions, the remaining >85% of the sequence co-aligned the subunits from the three genotypes. Six Groups, designated Anf, Vnf , and Nif I-IV, were assigned based upon genetic origin, sequence adjustments, and conserved residues. Both subunits subdivided into the same groups. Invariant and single variant residues were identified and were defined as “core” for nitrogenase function. Three species in Group Nif-III, Candidatus Desulforudis audaxviator, Desulfotomaculum kuznetsovii, and Thermodesulfatator indicus, were found to have a seleno-cysteine that replaces one cysteinyl ligand of the 8Fe:7S, P-cluster. Subsets of invariant residues, limited to individual groups, were identified; these unique residues help identify the gene of origin (anf, nif, or vnf) yet should not be considered diagnostic of the metal content of associated cofactors. Fourteen of the 19 residues that compose the cofactor pocket are invariant or single variant; the other five residues are highly variable but do not correlate with the putative metal content of the cofactor. The variable residues are clustered on one side of the cofactor, away from other functional centers in the three dimensional structure. Many of the invariant and single variant residues were not previously recognized as potentially critical and their identification provides the bases

  11. Relation between mRNA expression and sequence information in Desulfovibrio vulgaris: Combinatorial contributions of upstream regulatory motifs and coding sequence features to variations in mRNA abundance

    SciTech Connect

    Wu, Gang; Nie, Lei; Zhang, Weiwen

    2006-05-26

    ABSTRACT-The context-dependent expression of genes is the core for biological activities, and significant attention has been given to identification of various factors contributing to gene expression at genomic scale. However, so far this type of analysis has been focused whether on relation between mRNA expression and non-coding sequence features such as upstream regulatory motifs or on correlation between mRN abundance and non-random features in coding sequences (e.g. codon usage and amino acid usage). In this study multiple regression analyses of the mRNA abundance and all sequence information in Desulfovibrio vulgaris were performed, with the goal to investigate how much coding and non-coding sequence features contribute to the variations in mRNA expression, and in what manner they act together...

  12. Nucleic Acid Database: a Repository of Three-Dimensional Information about Nucleic Acids

    DOE Data Explorer

    Berman, H. M.; Olson, W. K.; Beveridge, D. L.; Westbrook, J.; Gelbin, A.; Demeny, T.; Hsieh, S. H.; Srinivasan, A. R.; Schneider, B.

    The Nucleic Acid Database (NDB) provides 3-D structural information about nucleic acids.  It is a relational database designed to facilitate the easy search for nucleic acid structures using any of the stored primary or derived structural features. Reports can then be created describing any properties of the selected structures and structures may be viewed in several different formats, including the mmCIF format, the NDB Atlas format, the NDB coordinate format, or the PDB coordinate format. Browsing structure images created directly from coordinates in the repository can also be done. More than 7000 structures have been released as of May 2014. This website also includes a number of specialized tools and interfaces. The NDB Project is funded by the National Institutes of Health and has been funded by the National Science Foundation and the Department of Energy in the past.

  13. Prediction of DNA-binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature

    PubMed Central

    Wu, Jiansheng; Liu, Hongde; Duan, Xueye; Ding, Yan; Wu, Hongtao; Bai, Yunfei; Sun, Xiao

    2009-01-01

    Motivation: In this work, we aim to develop a computational approach for predicting DNA-binding sites in proteins from amino acid sequences. To avoid overfitting with this method, all available DNA-binding proteins from the Protein Data Bank (PDB) are used to construct the models. The random forest (RF) algorithm is used because it is fast and has robust performance for different parameter values. A novel hybrid feature is presented which incorporates evolutionary information of the amino acid sequence, secondary structure (SS) information and orthogonal binary vector (OBV) information which reflects the characteristics of 20 kinds of amino acids for two physical–chemical properties (dipoles and volumes of the side chains). The numbers of binding and non-binding residues in proteins are highly unbalanced, so a novel scheme is proposed to deal with the problem of imbalanced datasets by downsizing the majority class. Results: The results show that the RF model achieves 91.41% overall accuracy with Matthew's correlation coefficient of 0.70 and an area under the receiver operating characteristic curve (AUC) of 0.913. To our knowledge, the RF method using the hybrid feature is currently the computationally optimal approach for predicting DNA-binding sites in proteins from amino acid sequences without using three-dimensional (3D) structural information. We have demonstrated that the prediction results are useful for understanding protein–DNA interactions. Availability: DBindR web server implementation is freely available at http://www.cbi.seu.edu.cn/DBindR/DBindR.htm. Contact: xsun@seu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19008251

  14. Draft Genome Sequences of Gluconobacter cerinus CECT 9110 and Gluconobacter japonicus CECT 8443, Acetic Acid Bacteria Isolated from Grape Must

    PubMed Central

    Sainz, Florencia

    2016-01-01

    We report here the draft genome sequences of Gluconobacter cerinus strain CECT9110 and Gluconobacter japonicus CECT8443, acetic acid bacteria isolated from grape must. Gluconobacter species are well known for their ability to oxidize sugar alcohols into the corresponding acids. Our objective was to select strains to oxidize effectively d-glucose. PMID:27365351

  15. Redundant sensory information does not enhance sequence learning in the serial reaction time task.

    PubMed

    Abrahamse, Elger L; van der Lubbe, Rob H J; Verwey, Willem B; Szumska, Izabela; Jaśkowski, Piotr

    2012-01-01

    In daily life we encounter multiple sources of sensory information at any given moment. Unknown is whether such sensory redundancy in some way affects implicit learning of a sequence of events. In the current paper we explored this issue in a serial reaction time task. Our results indicate that redundant sensory information does not enhance sequence learning when all sensory information is presented at the same location (responding to the position and/or color of the stimuli; Experiment 1), even when the distinct sensory sources provide more or less similar baseline response latencies (responding to the shape and/or color of the stimuli; Experiment 2). These findings support the claim that sequence learning does not (necessarily) benefit from sensory redundancy. Moreover, transfer was observed between various sets of stimuli, indicating that learning was predominantly response-based. PMID:22679466

  16. Redundant sensory information does not enhance sequence learning in the serial reaction time task

    PubMed Central

    Abrahamse, Elger L.; van der Lubbe, Rob H. J.; Verwey, Willem B.; Szumska, Izabela; Jaśkowski, Piotr

    2012-01-01

    In daily life we encounter multiple sources of sensory information at any given moment. Unknown is whether such sensory redundancy in some way affects implicit learning of a sequence of events. In the current paper we explored this issue in a serial reaction time task. Our results indicate that redundant sensory information does not enhance sequence learning when all sensory information is presented at the same location (responding to the position and/or color of the stimuli; Experiment 1), even when the distinct sensory sources provide more or less similar baseline response latencies (responding to the shape and/or color of the stimuli; Experiment 2). These findings support the claim that sequence learning does not (necessarily) benefit from sensory redundancy. Moreover, transfer was observed between various sets of stimuli, indicating that learning was predominantly response-based. PMID:22679466

  17. From amino acid sequence to bioactivity: The biomedical potential of antitumor peptides.

    PubMed

    Blanco-Míguez, Aitor; Gutiérrez-Jácome, Alberto; Pérez-Pérez, Martín; Pérez-Rodríguez, Gael; Catalán-García, Sandra; Fdez-Riverola, Florentino; Lourenço, Anália; Sánchez, Borja

    2016-06-01

    Chemoprevention is the use of natural and/or synthetic substances to block, reverse, or retard the process of carcinogenesis. In this field, the use of antitumor peptides is of interest as, (i) these molecules are small in size, (ii) they show good cell diffusion and permeability, (iii) they affect one or more specific molecular pathways involved in carcinogenesis, and (iv) they are not usually genotoxic. We have checked the Web of Science Database (23/11/2015) in order to collect papers reporting on bioactive peptide (1691 registers), which was further filtered searching terms such as "antiproliferative," "antitumoral," or "apoptosis" among others. Works reporting the amino acid sequence of an antiproliferative peptide were kept (60 registers), and this was complemented with the peptides included in CancerPPD, an extensive resource for antiproliferative peptides and proteins. Peptides were grouped according to one of the following mechanism of action: inhibition of cell migration, inhibition of tumor angiogenesis, antioxidative mechanisms, inhibition of gene transcription/cell proliferation, induction of apoptosis, disorganization of tubulin structure, cytotoxicity, or unknown mechanisms. The main mechanisms of action of those antiproliferative peptides with known amino acid sequences are presented and finally, their potential clinical usefulness and future challenges on their application is discussed. PMID:27010507

  18. The amino acid sequences and activities of synergistic hemolysins from Staphylococcus cohnii.

    PubMed

    Mak, Pawel; Maszewska, Agnieszka; Rozalska, Malgorzata

    2008-10-01

    Staphylococcus cohnii ssp. cohnii and S. cohnii ssp. urealyticus are a coagulase-negative staphylococci considered for a long time as unable to cause infections. This situation changed recently and pathogenic strains of these bacteria were isolated from hospital environments, patients and medical staff. Most of the isolated strains were resistant to many antibiotics. The present work describes isolation and characterization of several synergistic peptide hemolysins produced by these bacteria and acting as virulence factors responsible for hemolytic and cytotoxic activities. Amino acid sequences of respective hemolysins from S. cohnii ssp. cohnii (named as H1C, H2C and H3C) and S. cohnii ssp. urealyticus (H1U, H2U and H3U) were identical. Peptides H1 and H3 possessed significant amino acid homology to three synergistic hemolysins secreted by Staphylococcus lugdunensis and to putative antibacterial peptide produced by Staphylococcus saprophyticus ssp. saprophyticus. On the other hand, hemolysin H2 had a unique sequence. All isolated peptides lysed red cells from different mammalian species and exerted a cytotoxic effect on human fibroblasts. PMID:18752624

  19. Applications of statistical physics and information theory to the analysis of DNA sequences

    NASA Astrophysics Data System (ADS)

    Grosse, Ivo

    2000-10-01

    DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.

  20. 40 CFR 72.31 - Information requirements for Acid Rain permit applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 40 Protection of Environment 17 2013-07-01 2013-07-01 false Information requirements for Acid Rain... (CONTINUED) AIR PROGRAMS (CONTINUED) PERMITS REGULATION Acid Rain Permit Applications § 72.31 Information requirements for Acid Rain permit applications. A complete Acid Rain permit application shall include...

  1. 40 CFR 72.31 - Information requirements for Acid Rain permit applications.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 16 2011-07-01 2011-07-01 false Information requirements for Acid Rain... (CONTINUED) AIR PROGRAMS (CONTINUED) PERMITS REGULATION Acid Rain Permit Applications § 72.31 Information requirements for Acid Rain permit applications. A complete Acid Rain permit application shall include...

  2. 40 CFR 72.31 - Information requirements for Acid Rain permit applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 40 Protection of Environment 17 2012-07-01 2012-07-01 false Information requirements for Acid Rain... (CONTINUED) AIR PROGRAMS (CONTINUED) PERMITS REGULATION Acid Rain Permit Applications § 72.31 Information requirements for Acid Rain permit applications. A complete Acid Rain permit application shall include...

  3. 40 CFR 72.31 - Information requirements for Acid Rain permit applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 40 Protection of Environment 17 2014-07-01 2014-07-01 false Information requirements for Acid Rain... (CONTINUED) AIR PROGRAMS (CONTINUED) PERMITS REGULATION Acid Rain Permit Applications § 72.31 Information requirements for Acid Rain permit applications. A complete Acid Rain permit application shall include...

  4. Templated synthesis of peptide nucleic acids via sequence-selective base-filling reactions.

    PubMed

    Heemstra, Jennifer M; Liu, David R

    2009-08-19

    The templated synthesis of nucleic acids has previously been achieved through the backbone ligation of preformed nucleotide monomers or oligomers. In contrast, here we demonstrate templated nucleic acid synthesis using a base-filling approach in which individual bases are added to abasic sites of a peptide nucleic acid (PNA). Because nucleobase substrates in this approach are not self-reactive, a base-filling approach may reduce the formation of nontemplated reaction products. Using either reductive amination or amine acylation chemistries, we observed efficient and selective addition of each of the four nucleobases to an abasic site in the middle of the PNA strand. We also describe the addition of single nucleobases to the end of a PNA strand through base filling, as well as the tandem addition of two bases to the middle of the PNA strand. These findings represent an experimental foundation for nonenzymatic information transfer through base filling. PMID:19722647

  5. Templated Synthesis of Peptide Nucleic Acids via Sequence-Selective Base-Filling Reactions

    PubMed Central

    2009-01-01

    The templated synthesis of nucleic acids has previously been achieved through the backbone ligation of preformed nucleotide monomers or oligomers. In contrast, here we demonstrate templated nucleic acid synthesis using a base-filling approach in which individual bases are added to abasic sites of a peptide nucleic acid (PNA). Because nucleobase substrates in this approach are not self-reactive, a base-filling approach may reduce the formation of nontemplated reaction products. Using either reductive amination or amine acylation chemistries, we observed efficient and selective addition of each of the four nucleobases to an abasic site in the middle of the PNA strand. We also describe the addition of single nucleobases to the end of a PNA strand through base filling, as well as the tandem addition of two bases to the middle of the PNA strand. These findings represent an experimental foundation for nonenzymatic information transfer through base filling. PMID:19722647

  6. Complete amino acid sequence of the myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani.

    PubMed

    Jones, B N; Wang, C C; Dwulet, F E; Lehman, L D; Meuth, J L; Bogardt, R A; Gurd, F R

    1979-04-25

    The complete amino acid sequence of the major component myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani, was determined by the automated Edman degradation of several large peptides obtained by specific cleavage of the protein. The acetimidated apomyoglobin was selectively cleaved at its two methionyl residues with cyanogen bromide and at its three arginyl residues by trypsin. By subjecting four of these peptides and the apomyoglobin to automated Edman degradation, over 80% of the primary structure of the protein was obtained. The remainder of the covalent structure was determined by the sequence analysis of peptides that resulted from further digestion of the central cyanogen bromide fragment. This fragment was cleaved at its glutamyl residues with staphylococcal protease and its lysyl residues with trypsin. The action of trypsin was restricted to the lysyl residues by chemical modification of the single arginyl residue of the fragment with 1,2-cyclohexanedione. The primary structure of this myoglobin proved to be identical with that from the Atlantic bottlenosed dolphin and Pacific common dolphin but differs from the myoglobins of the killer whale and pilot whale at two positions. The above sequence identities and differences reflect the close taxonomic relationship of these five species of Cetacea. PMID:454657

  7. Nucleotide and derived amino acid sequences of the major porin of Comamonas acidovorans and comparison of porin primary structures.

    PubMed Central

    Gerbl-Rieger, S; Peters, J; Kellermann, J; Lottspeich, F; Baumeister, W

    1991-01-01

    The DNA sequence of the gene which codes for the major outer membrane porin (Omp32) of Comamonas acidovorans has been determined. The structural gene encodes a precursor consisting of 351 amino acid residues with a signal peptide of 19 amino acid residues. Comparisons with amino acid sequences of outer membrane proteins and porins from several other members of the class Proteobacteria and of the Chlamydia trachomatis porin and the Neurospora crassa mitochondrial porin revealed a motif of eight regions of local homology. The results of this analysis are discussed with regard to common structural features of porins. PMID:1848840

  8. Sequence-specific nucleic acid mobility using a reversible block copolymer gel matrix and DNA amphiphiles (lipid-DNA) in capillary and microfluidic electrophoretic separations.

    PubMed

    Wagler, Patrick; Minero, Gabriel Antonio S; Tangen, Uwe; de Vries, Jan Willem; Prusty, Deepak; Kwak, Minseok; Herrmann, Andreas; McCaskill, John S

    2015-10-01

    Reversible noncovalent but sequence-dependent attachment of DNA to gels is shown to allow programmable mobility processing of DNA populations. The covalent attachment of DNA oligomers to polyacrylamide gels using acrydite-modified oligonucleotides has enabled sequence-specific mobility assays for DNA in gel electrophoresis: sequences binding to the immobilized DNA are delayed in their migration. Such a system has been used for example to construct complex DNA filters facilitating DNA computations. However, these gels are formed irreversibly and the choice of immobilized sequences is made once off during fabrication. In this work, we demonstrate the reversible self-assembly of gels combined with amphiphilic DNA molecules, which exhibit hydrophobic hydrocarbon chains attached to the nucleobase. This amphiphilic DNA, which we term lipid-DNA, is synthesized in advance and is blended into a block copolymer gel to induce sequence-dependent DNA retention during electrophoresis. Furthermore, we demonstrate and characterize the programmable mobility shift of matching DNA in such reversible gels both in thin films and microchannels using microelectrode arrays. Such sequence selective separation may be employed to select nucleic acid sequences of similar length from a mixture via local electronics, a basic functionality that can be employed in novel electronic chemical cell designs and other DNA information-processing systems. PMID:26095642

  9. Complete genome sequence of probiotic Bacillus coagulans HM-08: A potential lactic acid producer.

    PubMed

    Yao, Guoqiang; Gao, Pengfei; Zhang, Wenyi

    2016-06-20

    Bacillus coagulans HM-08 is a commercialized probiotic strain in China. Its genome contains a 3.62Mb circular chromosome with an average GC content of 46.3%. In silico analysis revealed the presence of one xyl operon as well as several other genes that are correlated to xylose utilization. The genetic information provided here may help to expand its future biotechnology potential in lactic acid production. PMID:27130497

  10. Amino acid sequence analysis and characterization of a ribonuclease from starfish Asterias amurensis.

    PubMed

    Motoyoshi, Naomi; Kobayashi, Hiroko; Itagaki, Tadashi; Inokuchi, Norio

    2016-09-01

    The aim of this study was to phylogenetically characterize the location of the RNase T2 enzyme in the starfish (Asterias amurensis). We isolated an RNase T2 ribonuclease (RNase Aa) from the ovaries of starfish and determined its amino acid sequence by protein chemistry and cloning cDNA encoding RNase Aa. The isolated protein had 231 amino acid residues, a predicted molecular mass of 25,906 Da, and an optimal pH of 5.0. RNase Aa preferentially released guanylic acid from the RNA. The catalytic sites of the RNase T2 family are conserved in RNase Aa; furthermore, the distribution of the cysteine residues in RNase Aa is similar to that in other animal and plant T2 RNases. RNase Aa is cleaved at two points: 21 residues from the N-terminus and 29 residues from the C-terminus; however, both fragments may remain attached to the protein via disulfide bridges, leading to the maintenance of its conformation, as suggested by circular dichroism spectrum analysis. The phylogenetic analysis revealed that starfish RNase Aa is evolutionarily an intermediate between protozoan and oyster RNases. PMID:26920046

  11. Information avoidance tendencies, threat management resources, and interest in genetic sequencing feedback

    PubMed Central

    Taber, Jennifer M.; Klein, William M.P.; Ferrer, Rebecca A.; Lewis, Katie L.; Harris, Peter R.; Shepperd, James A.; Biesecker, Leslie G.

    2015-01-01

    Background Information avoidance is a defensive strategy that undermines receipt of potentially beneficial but threatening health information and may especially occur when threat management resources are unavailable. Purpose We examined whether individual differences in information avoidance predicted intentions to receive genetic sequencing results for preventable and unpreventable (i.e., more threatening) disease and, secondarily, whether threat management resources of self-affirmation or optimism mitigated any effects. Methods Participants (N=493) in an NIH study (ClinSeq®) piloting the use of genome sequencing reported intentions to receive (optional) sequencing results and completed individual difference measures of information avoidance, self-affirmation, and optimism. Results Information avoidance tendencies corresponded with lower intentions to learn results, particularly for unpreventable diseases. The association was weaker among individuals higher in self-affirmation or optimism, but only for results regarding preventable diseases. Conclusions Information avoidance tendencies may influence decisions to receive threatening health information; threat management resources hold promise for mitigating this association. PMID:25582989

  12. Diagnostic yield of targeted next generation sequencing in various cancer types: an information-theoretic approach.

    PubMed

    Hagemann, Ian S; O'Neill, Patrick K; Erill, Ivan; Pfeifer, John D

    2015-09-01

    The information-theoretic concept of Shannon entropy can be used to quantify the information provided by a diagnostic test. We hypothesized that in tumor types with stereotyped mutational profiles, the results of NGS testing would yield lower average information than in tumors with more diverse mutations. To test this hypothesis, we estimated the entropy of NGS testing in various cancer types, using results obtained from clinical sequencing. A set of 238 tumors were subjected to clinical targeted NGS across all exons of 27 genes. There were 120 actionable variants in 109 cases, occurring in the genes KRAS, EGFR, PTEN, PIK3CA, KIT, BRAF, NRAS, IDH1, and JAK2. Sequencing results for each tumor were modeled as a dichotomized genotype (actionable mutation detected or not detected) for each of the 27 genes. Based upon the entropy of these genotypes, sequencing was most informative for colorectal cancer (3.235 bits of information/case) followed by high grade glioma (2.938 bits), lung cancer (2.197 bits), pancreatic cancer (1.339 bits), and sarcoma/STTs (1.289 bits). In the most informative cancer types, the information content of NGS was similar to surgical pathology examination (modeled at approximately 2-3 bits). Entropy provides a novel measure of utility for laboratory testing in general and for NGS in particular. This metric is, however, purely analytical and does not capture the relative clinical significance of the identified variants, which may also differ across tumor types. PMID:26227479

  13. Peptide sequence information derived by pronase digestion and ammonium sulfate in-source decay matrix-assisted laser desorption/ionization time-of-flight mass spectrometry.

    PubMed

    Marzilli, L A; Golden, T R; Cotter, R J; Woods, A S

    2000-11-01

    We present the use of Pronase digestion and in-source decay in the presence of ammonium sulfate as complementary techniques to confirm the amino acid sequence of a peptide. Pronase, a commercial preparation from Streptomyces griseus, is a combination of proteolytic enzymes. It produces carboxypeptidase and aminopeptidase ladders using a single Pronase digestion and represents an inexpensive, nonspecific, and fast supplement to traditional sequencing enzymes. However, N-terminal peptidase activity appears dependent on the terminal amino acid residue. We also introduce the use of saturated ammonium sulfate as an "on-slide" sample additive to promote in-source fragmentation of peptides. Use of saturated ammonium sulfate resulted in a simple way to increase peptide backbone fragmentation and essentially produced either a cn or yn ion series. Together these techniques provide useful supplements to existing methods for peptide sequence information. PMID:11073263

  14. Fast multiple alignment of ungapped DNA sequences using information theory and a relaxation method.

    PubMed

    Schneider, Thomas D; Mastronarde, David N

    1996-12-01

    An information theory based multiple alignment ("Malign") method was used to align the DNA binding sequences of the OxyR and Fis proteins, whose sequence conservation is so spread out that it is difficult to identify the sites. In the algorithm described here, the information content of the sequences is used as a unique global criterion for the quality of the alignment. The algorithm uses look-up tables to avoid recalculating computationally expensive functions such as the logarithm. Because there are no arbitrary constants and because the results are reported in absolute units (bits), the best alignment can be chosen without ambiguity. Starting from randomly selected alignments, a hill-climbing algorithm can track through the immense space of s(n) combinations where s is the number of sequences and n is the number of positions possible for each sequence. Instead of producing a single alignment, the algorithm is fast enough that one can afford to use many start points and to classify the solutions. Good convergence is indicated by the presence of a single well-populated solution class having higher information content than other classes. The existence of several distinct classes for the Fis protein indicates that those binding sites have self-similar features. PMID:19953199

  15. The Protein Information Resource (PIR) and the PIR-International Protein Sequence Database.

    PubMed Central

    George, D G; Dodson, R J; Garavelli, J S; Haft, D H; Hunt, L T; Marzec, C R; Orcutt, B C; Sidman, K E; Srinivasarao, G Y; Yeh, L S; Arminski, L M; Ledley, R S; Tsugita, A; Barker, W C

    1997-01-01

    From its origin, the PIR has aspired to support research in computational biology and genomics through the compilation of a comprehensive, quality controlled and well-organized protein sequence information resource. The resource originated with the pioneering work of the late Margaret O. Dayhoff in the early 1960s. Since 1988, the Protein Sequence Database has been maintained collaboratively by PIR-International, an association of macromolecular sequence data collection centers dedicated to fostering international cooperation as an essential element in the development of scientific databases. The work of the resource is widely distributed and is available on the World Wide Web, via FTP, E-mail server, CD-ROM and magnetic media. It is widely redistributed and incorporated into many other protein sequence data compilations including SWISS-PROT and theEntrezsystem of the NCBI. PMID:9016497

  16. Sequence learning in 4-month-old infants: do infants represent ordinal information?

    PubMed

    Lewkowicz, David J; Berent, Iris

    2009-01-01

    This study investigated how 4-month-old infants represent sequences: Do they track the statistical relations among specific sequence elements (e.g., AB, BC) or do they encode abstract ordinal positions (i.e., B is second)? Infants were habituated to sequences of 4 moving and sounding elements-3 of the elements varied in their ordinal position while the position of 1 target element remained invariant (e.g., ABCD, CBDA)-and then were tested for the detection of changes in the target's position. Infants detected an ordinal change only when it disrupted the statistical co-occurrence of elements but not when statistical information was controlled. It is concluded that 4-month-olds learn the order of sequence elements by tracking their statistical associations but not their invariant ordinal position. PMID:19930353

  17. Predominant information quality scheme for the essential amino acids: an information-theoretical analysis.

    PubMed

    Esquivel, Rodolfo O; Molina-Espíritu, Moyocoyani; López-Rosa, Sheila; Soriano-Correa, Catalina; Barrientos-Salcedo, Carolina; Kohout, Miroslav; Dehesa, Jesús S

    2015-08-24

    In this work we undertake a pioneer information-theoretical analysis of 18 selected amino acids extracted from a natural protein, bacteriorhodopsin (1C3W). The conformational structures of each amino acid are analyzed by use of various quantum chemistry methodologies at high levels of theory: HF, M062X and CISD(Full). The Shannon entropy, Fisher information and disequilibrium are determined to grasp the spatial spreading features of delocalizability, order and uniformity of the optimized structures. These three entropic measures uniquely characterize all amino acids through a predominant information-theoretic quality scheme (PIQS), which gathers all chemical families by means of three major spreading features: delocalization, narrowness and uniformity. This scheme recognizes four major chemical families: aliphatic (delocalized), aromatic (delocalized), electro-attractive (narrowed) and tiny (uniform). All chemical families recognized by the existing energy-based classifications are embraced by this entropic scheme. Finally, novel chemical patterns are shown in the information planes associated with the PIQS entropic measures. PMID:26175003

  18. Evolution of alpha-lactalbumins. The complete amino acid sequence of the alpha-lactalbumin from a marsupial (Macropus rufogriseus) and corrections to regions of sequence in bovine and goat alpha-lactalbumins.

    PubMed

    Shewale, J G; Sinha, S K; Brew, K

    1984-04-25

    alpha-Lactalbumin was purified from a whey protein fraction of the milk of the red-necked wallaby (Macropus rufogriseus). The complete amino acid sequence was determined from the results of automatic sequenator analyses of the intact protein, the three cyanogen bromide fragments, and of peptides generated from the larger, COOH-terminal CNBr fragment by digestion with trypsin or staphylococcal protease. This is the first sequence to be determined of an alpha-lactalbumin from a marsupial and differs from known eutherian alpha-lactalbumins in size and locations of deletions in alignments with the homologous type c lysozymes, as well as in having amino acid substitutions at 8 sites that are invariant in known eutherian proteins. Some corrections are also reported for two regions of sequence in both bovine and goat alpha-lactalbumins. The new and previously published information on alpha-lactalbumin sequences is analyzed in relation to the evolutionary history of the alpha-lactalbumin line as well as the relationship of structure to function in these proteins. PMID:6715332

  19. Relating sequence encoded information to form and function of intrinsically disordered proteins

    PubMed Central

    Das, Rahul K.; Ruff, Kiersten M.; Pappu, Rohit V.

    2015-01-01

    Intrinsically disordered proteins (IDPs) showcase the importance of conformational plasticity and heterogeneity in protein function. We summarize recent advances that connect information encoded in IDP sequences to their conformational properties and functions. We focus on insights obtained through a combination of atomistic simulations and biophysical measurements that are synthesized into a coherent framework using polymer physics theories. PMID:25863585

  20. Full Genome Virus Detection in Fecal Samples Using Sensitive Nucleic Acid Preparation, Deep Sequencing, and a Novel Iterative Sequence Classification Algorithm

    PubMed Central

    Cotten, Matthew; Oude Munnink, Bas; Canuti, Marta; Deijs, Martin; Watson, Simon J.; Kellam, Paul; van der Hoek, Lia

    2014-01-01

    We have developed a full genome virus detection process that combines sensitive nucleic acid preparation optimised for virus identification in fecal material with Illumina MiSeq sequencing and a novel post-sequencing virus identification algorithm. Enriched viral nucleic acid was converted to double-stranded DNA and subjected to Illumina MiSeq sequencing. The resulting short reads were processed with a novel iterative Python algorithm SLIM for the identification of sequences with homology to known viruses. De novo assembly was then used to generate full viral genomes. The sensitivity of this process was demonstrated with a set of fecal samples from HIV-1 infected patients. A quantitative assessment of the mammalian, plant, and bacterial virus content of this compartment was generated and the deep sequencing data were sufficient to assembly 12 complete viral genomes from 6 virus families. The method detected high levels of enteropathic viruses that are normally controlled in healthy adults, but may be involved in the pathogenesis of HIV-1 infection and will provide a powerful tool for virus detection and for analyzing changes in the fecal virome associated with HIV-1 progression and pathogenesis. PMID:24695106

  1. Characterization of nucleic acids by tandem mass spectrometry - The second decade (2004-2013): From DNA to RNA and modified sequences.

    PubMed

    Schürch, Stefan

    2016-07-01

    Nucleic acids play key roles in the storage and processing of genetic information, as well as in the regulation of cellular processes. Consequently, they represent attractive targets for drugs against gene-related diseases. On the other hand, synthetic oligonucleotide analogues have found application as chemotherapeutic agents targeting cellular DNA and RNA. The development of effective nucleic acid-based chemotherapeutic strategies requires adequate analytical techniques capable of providing detailed information about the nucleotide sequences, the presence of structural modifications, the formation of higher-order structures, as well as the interaction of nucleic acids with other cellular components and chemotherapeutic agents. Due to the impressive technical and methodological developments of the past years, tandem mass spectrometry has evolved to one of the most powerful tools supporting research related to nucleic acids. This review covers the literature of the past decade devoted to the tandem mass spectrometric investigation of nucleic acids, with the main focus on the fundamental mechanistic aspects governing the gas-phase dissociation of DNA, RNA, modified oligonucleotide analogues, and their adducts with metal ions. Additionally, recent findings on the elucidation of nucleic acid higher-order structures by tandem mass spectrometry are reviewed. © 2014 Wiley Periodicals, Inc., Mass Spec Rev 35:483-523, 2016. PMID:25288464

  2. Sequencing Genetics Information: Integrating Data into Information Literacy for Undergraduate Biology Students

    ERIC Educational Resources Information Center

    MacMillan, Don

    2010-01-01

    This case study describes an information literacy lab for an undergraduate biology course that leads students through a range of resources to discover aspects of genetic information. The lab provides over 560 students per semester with the opportunity for hands-on exploration of resources in steps that simulate the pathways of higher-level…

  3. Evolutionary connections of biological kingdoms based on protein and nucleic acid sequence evidence

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.

    1983-01-01

    Prokaryotic and eukaryotic evolutionary trees are developed from protein and nucleic-acid sequences by the methods of numerical taxonomy. Trees are presented for bacterial ferredoxins, 5S ribosomal RNA, c-type cytochromes , cytochromes c2 and c', and 5.8S ribosomal RNA; the implications for early evolution are discussed; and a composite tree showing the branching of the anaerobes, aerobes, archaebacteria, and eukaryotes is shown. Single lines are found for all oxygen-evolving photosynthetic forms and for the salt-loving and high-temperature forms of archaebacteria. It is argued that the eukaryote mitochondria, chloroplasts, and cytoplasmic host material are descended from free-living prokaryotes that formed symbiotic associations, with more than one symbiotic event involved in the evolution of each organelle.

  4. The amino acid alphabet and the architecture of the protein sequence-structure map. I. Binary alphabets.

    PubMed

    Ferrada, Evandro

    2014-12-01

    The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet. PMID:25473967

  5. The Amino Acid Alphabet and the Architecture of the Protein Sequence-Structure Map. I. Binary Alphabets

    PubMed Central

    Ferrada, Evandro

    2014-01-01

    The correspondence between protein sequences and structures, or sequence-structure map, relates to fundamental aspects of structural, evolutionary and synthetic biology. The specifics of the mapping, such as the fraction of accessible sequences and structures, or the sequences' ability to fold fast, are dictated by the type of interactions between the monomers that compose the sequences. The set of possible interactions between monomers is encapsulated by the potential energy function. In this study, I explore the impact of the relative forces of the potential on the architecture of the sequence-structure map. My observations rely on simple exact models of proteins and random samples of the space of potential energy functions of binary alphabets. I adopt a graph perspective and study the distribution of viable sequences and the structures they produce, as networks of sequences connected by point mutations. I observe that the relative proportion of attractive, neutral and repulsive forces defines types of potentials, that induce sequence-structure maps of vastly different architectures. I characterize the properties underlying these differences and relate them to the structure of the potential. Among these properties are the expected number and relative distribution of sequences associated to specific structures and the diversity of structures as a function of sequence divergence. I study the types of binary potentials observed in natural amino acids and show that there is a strong bias towards only some types of potentials, a bias that seems to characterize the folding code of natural proteins. I discuss implications of these observations for the architecture of the sequence-structure map of natural proteins, the construction of random libraries of peptides, and the early evolution of the natural amino acid alphabet. PMID:25473967

  6. Trypsin inhibitors from ridged gourd (Luffa acutangula Linn.) seeds: purification, properties, and amino acid sequences.

    PubMed

    Haldar, U C; Saha, S K; Beavis, R C; Sinha, N K

    1996-02-01

    Two trypsin inhibitors, LA-1 and LA-2, have been isolated from ridged gourd (Luffa acutangula Linn.) seeds and purified to homogeneity by gel filtration followed by ion-exchange chromatography. The isoelectric point is at pH 4.55 for LA-1 and at pH 5.85 for LA-2. The Stokes radius of each inhibitor is 11.4 A. The fluorescence emission spectrum of each inhibitor is similar to that of the free tyrosine. The biomolecular rate constant of acrylamide quenching is 1.0 x 10(9) M-1 sec-1 for LA-1 and 0.8 x 10(9) M-1 sec-1 for LA-2 and that of K2HPO4 quenching is 1.6 x 10(11) M-1 sec-1 for LA-1 and 1.2 x 10(11) M-1 sec-1 for LA-2. Analysis of the circular dichroic spectra yields 40% alpha-helix and 60% beta-turn for La-1 and 45% alpha-helix and 55% beta-turn for LA-2. Inhibitors LA-1 and LA-2 consist of 28 and 29 amino acid residues, respectively. They lack threonine, alanine, valine, and tryptophan. Both inhibitors strongly inhibit trypsin by forming enzyme-inhibitor complexes at a molar ratio of unity. A chemical modification study suggests the involvement of arginine of LA-1 and lysine of LA-2 in their reactive sites. The inhibitors are very similar in their amino acid sequences, and show sequence homology with other squash family inhibitors. PMID:8924202

  7. Microfluidic platform for isolating nucleic acid targets using sequence specific hybridization

    PubMed Central

    Wang, Jingjing; Morabito, Kenneth; Tang, Jay X.; Tripathi, Anubhav

    2013-01-01

    The separation of target nucleic acid sequences from biological samples has emerged as a significant process in today's diagnostics and detection strategies. In addition to the possible clinical applications, the fundamental understanding of target and sequence specific hybridization on surface modified magnetic beads is of high value. In this paper, we describe a novel microfluidic platform that utilizes a mobile magnetic field in static microfluidic channels, where single stranded DNA (ssDNA) molecules are isolated via nucleic acid hybridization. We first established efficient isolation of biotinylated capture probe (BP) using streptavidin-coated magnetic beads. Subsequently, we investigated the hybridization of target ssDNA with BP bound to beads and explained these hybridization kinetics using a dual-species kinetic model. The number of hybridized target ssDNA molecules was determined to be about 6.5 times less than that of BP on the bead surface, due to steric hindrance effects. The hybridization of target ssDNA with non-complementary BP bound to bead was also examined, and non-specific hybridization was found to be insignificant. Finally, we demonstrated highly efficient capture and isolation of target ssDNA in the presence of non-target ssDNA, where as low as 1% target ssDNA can be detected from mixture. The microfluidic method described in this paper is significantly relevant and is broadly applicable, especially towards point-of-care biological diagnostic platforms that require binding and separation of known target biomolecules, such as RNA, ssDNA, or protein. PMID:24404041

  8. Information performances and illative sequences: Sequential organization of explanations of chemical phase equilibrium

    NASA Astrophysics Data System (ADS)

    Brown, Nathaniel James Swanton

    While there is consensus that conceptual change is surprisingly difficult, many competing theories of conceptual change co-exist in the literature. This dissertation argues that this discord is partly the result of an inadequate account of the unwritten rules of human social interaction that underlie the field's preferred methodology---semi-structured interviewing. To better understand the contributions of interaction during explanations, I analyze eight undergraduate general chemistry students as they attempt to explain to various people, for various reasons, why phenomena involving chemical phase equilibrium occur. Using the methods of interaction analysis, I characterize the unwritten, but systematic, rules that these participants follow as they explain. The result is a description of the contributions of interaction to explaining. Each step in each explanation is a jointly performed expression of a subject-predicate relation, an interactive accomplishment I call an information performance (in-form, for short). Unlike clauses, in-forms need not have a coherent grammatical structure. Unlike speaker turns, in-forms have the clear function of expressing information. Unlike both clauses and speaker turns, in-forms are a co-construction, jointly performed by both the primary speaker and the other interlocutor. The other interlocutor strongly affects the form and content of each explanation by giving or withholding feedback at the end of each in-form, moments I call feedback-relevant places. While in-forms are the bricks out of which the explanation is constructed, they are secured by a series of inferential links I call an illative sequence. Illative sequences are forward-searching, starting with a remembered fact or observation and following a chain of inferences in the hope it leads to the target phenomenon. The participants treat an explanation as a success if the illative sequence generates an in-form that describes the phenomenon. If the illative sequence does

  9. Characterization of N-glycosylation and amino acid sequence features of immunoglobulins from swine.

    PubMed

    Lopez, Paul G; Girard, Lauren; Buist, Marjorie; de Oliveira, Andrey Giovanni Gomes; Bodnar, Edward; Salama, Apolline; Soulillou, Jean-Paul; Perreault, Hélène

    2016-02-01

    The primary goal of this study was to develop a method to study the N-glycosylation of IgG from swine in order to detect epitopes containing N-glycolylneuraminic acid (Neu5Gc) and/or terminal galactose residues linked in α1-3 susceptible to cause xenograft-related problems. Samples of immunoglobulin were isolated from porcine serum using protein-A affinity chromatography. The eluate was then separated on electrophoretic gel, and bands corresponding to the N-glycosylated heavy chains were cut off the gel and subjected to tryptic digestion. Peptides and glycopeptides were separated by reversed phase liquid chromatography and fractions were collected for matrix-assisted laser desorption/ionization time-of-flight mass spectrometric (MALDI-TOF-MS) analysis. Overall no α1-3 galactose was detected, as demonstrated by complete susceptibility of terminal galactose residues to β-galactosidase digestion. Neu5Gc was detected on singly sialylated structures. Two major N-glycopeptides were found, EEQFNSTYR and EAQFNSTYR as determined by tandem MS (MS/MS), as previously reported by Butler et al. (Immunogenetics, 61, 2009, 209-230), who found 11 subclasses for porcine IgG. Out of the 11, ten include the sequence corresponding to EEQFNSTYR, and only one codes for EAQFNSTYR. In this study, glycosylation patterns associated with both chains were slightly different, in that EEQFNSTYR had a higher content of galactose. The last step of this study consisted of peptide-mapping the 11 reported porcine IgG sequences. Although there was considerable overlap, at least one unique tryptic peptide was found per IgG sequence. The workflow presented in this manuscript constitutes the first study to use MALDI-TOF-MS in the investigation of porcine IgG structural features. PMID:26586247

  10. ESPript/ENDscript: extracting and rendering sequence and 3D information from atomic structures of proteins

    PubMed Central

    Gouet, Patrice; Robert, Xavier; Courcelle, Emmanuel

    2003-01-01

    The fortran program ESPript was created in 1993, to display on a PostScript figure multiple sequence alignments adorned with secondary structure elements. A web server was made available in 1999 and ESPript has been linked to three major web tools: ProDom which identifies protein domains, PredictProtein which predicts secondary structure elements and NPS@ which runs sequence alignment programs. A web server named ENDscript was created in 2002 to facilitate the generation of ESPript figures containing a large amount of information. ENDscript uses programs such as BLAST, Clustal and PHYLODENDRON to work on protein sequences and such as DSSP, CNS and MOLSCRIPT to work on protein coordinates. It enables the creation, from a single Protein Data Bank identifier, of a multiple sequence alignment figure adorned with secondary structure elements of each sequence of known 3D structure. Similar 3D structures are superimposed in turn with the program PROFIT and a final figure is drawn with BOBSCRIPT, which shows sequence and structure conservation along the Cα trace of the query. ESPript and ENDscript are available at http://genopole.toulouse.inra.fr/ESPript. PMID:12824317

  11. Lactic acid production from potato peel waste by anaerobic sequencing batch fermentation using undefined mixed culture.

    PubMed

    Liang, Shaobo; McDonald, Armando G; Coats, Erik R

    2015-11-01

    Lactic acid (LA) is a necessary industrial feedstock for producing the bioplastic, polylactic acid (PLA), which is currently produced by pure culture fermentation of food carbohydrates. This work presents an alternative to produce LA from potato peel waste (PPW) by anaerobic fermentation in a sequencing batch reactor (SBR) inoculated with undefined mixed culture from a municipal wastewater treatment plant. A statistical design of experiments approach was employed using set of 0.8L SBRs using gelatinized PPW at a solids content range from 30 to 50 g L(-1), solids retention time of 2-4 days for yield and productivity optimization. The maximum LA production yield of 0.25 g g(-1) PPW and highest productivity of 125 mg g(-1) d(-1) were achieved. A scale-up SBR trial using neat gelatinized PPW (at 80 g L(-1) solids content) at the 3 L scale was employed and the highest LA yield of 0.14 g g(-1) PPW and a productivity of 138 mg g(-1) d(-1) were achieved with a 1 d SRT. PMID:25708409

  12. Bacterial community compositions in sediment polluted by perfluoroalkyl acids (PFAAs) using Illumina high-throughput sequencing.

    PubMed

    Sun, Yajun; Wang, Tieyu; Peng, Xiawei; Wang, Pei; Lu, Yonglong

    2016-06-01

    The characterization of bacterial community compositions and the change in perfluoroalkyl acids (PFAAs) along a natural river distribution system were explored in the present study. Illumina high-throughput sequencing was used to explore bacterial community diversity and structure in sediment polluted by PFAAs from the Xiaoqing River, the area with concentrated fluorochemical facilities in China. The concentration of PFAAs was in the range of 8.44-465.60 ng/g dry weight (dw) in sediment. Perfluorooctanoic acid (PFOA) was the dominant PFAA in all samples, which accounted for 94.2 % of total PFAAs. High-level PFOA could lead to an obvious increase in relative abundance of Proteobacteria, ε-Proteobacteria, Thiobacillus, and Sulfurimonas and the decrease in relative abundance of other bacteria. Redundancy analysis revealed that PFOA played an important role in the formation of bacterial community, and PFOA at higher concentration could reduce the diversity of bacterial community. When the concentration of PFOA was below 100 ng/g dw in sediment, no significant effect on microbial community structure was observed. Thiobacillus and Sulfurimonas were positively correlated with the concentration of PFOA, suggesting that both genera were resistant to PFOA contamination. PMID:26780047

  13. Mass spectrometric detection of the amino acid sequence polymorphism of the hepatitis C virus antigen.

    PubMed

    Kaysheva, A L; Ivanov, Yu D; Frantsuzov, P A; Krohin, N V; Pavlova, T I; Uchaikin, V F; Konev, V А; Kovalev, O B; Ziborov, V S; Archakov, A I

    2016-03-01

    A method for detection and identification of the hepatitis C virus antigen (HCVcoreAg) in human serum with consideration for possible amino acid substitutions is proposed. The method is based on a combination of biospecific capturing and concentrating of the target protein on the surface of the chip for atomic force microscope (AFM chip) with subsequent protein identification by tandem mass spectrometric (MS/MS) analysis. Biospecific AFM-capturing of viral particles containing HCVcoreAg from serum samples was performed by use of AFM chips with monoclonal antibodies (anti-HCVcore) covalently immobilized on the surface. Biospecific complexes were registered and counted by AFM. Further MS/MS analysis allowed to reliably identify the HCVcoreAg in the complexes formed on the AFM chip surface. Analysis of MS/MS spectra, with the account taken of the possible polymorphisms in the amino acid sequence of the HCVcoreAg, enabled us to increase the number of identified peptides. PMID:26773170

  14. DIALIGN at GOBICS—multiple sequence alignment using various sources of external information

    PubMed Central

    Al Ait, Layal; Yamak, Zaher; Morgenstern, Burkhard

    2013-01-01

    DIALIGN is an established tool for multiple sequence alignment that is particularly useful to detect local homologies in sequences with low overall similarity. In recent years, various versions of the program have been developed, some of which are fully automated, whereas others are able to accept user-specified external information. In this article, we review some versions of the program that are available through ‘Göttingen Bioinformatics Compute Server’. In addition to previously described implementations, we present a new release of DIALIGN called ‘DIALIGN-PFAM’, which uses hits to the PFAM database for improved protein alignment. Our software is available through http://dialign.gobics.de/. PMID:23620293

  15. Canine preprorelaxin: nucleic acid sequence and localization within the canine placenta.

    PubMed

    Klonisch, T; Hombach-Klonisch, S; Froehlich, C; Kauffold, J; Steger, K; Steinetz, B G; Fischer, B

    1999-03-01

    Employing uteroplacental tissue at Day 35 of gestation, we determined the nucleic acid sequence of canine preprorelaxin using reverse transcription- and rapid amplification of cDNA ends-polymerase chain reaction. Canine preprorelaxin cDNA consisted of 534 base pairs encoding a protein of 177 amino acids with a signal peptide of 25 amino acids (aa), a B domain of 35 aa, a C domain of 93 aa, and an A domain of 24 aa. The putative receptor binding region in the N'-terminal part of the canine relaxin B domain GRDYVR contained two substitutions from the classical motif (E-->D and L-->Y). Canine preprorelaxin shared highest homology with porcine and equine preprorelaxin. Northern analysis revealed a 1-kilobase transcript present in total RNA of canine uteroplacental tissue but not of kidney tissue. Uteroplacental tissue from two bitches each at Days 30 and 35 of gestation were studied by in situ hybridization to localize relaxin mRNA. Immunohistochemistry for relaxin, cytokeratin, vimentin, and von Willebrand factor was performed on uteroplacental tissue at Day 30 of gestation. The basal cell layer at the core of the chorionic villi was devoid of relaxin mRNA and immunoreactive relaxin or vimentin but was immunopositive for cytokeratin and identified as cytotrophoblast cells. The cell layer surrounding the chorionic villi displayed specific hybridization signals for relaxin mRNA and immunoreactivity for relaxin and cytokeratin but not for vimentin, and was identified as syncytiotrophoblast. Those areas of the chorioallantoic tissue with most intense relaxin immunoreactivity were highly vascularized as demonstrated by immunoreactive von Willebrand factor expressed on vascular endothelium. The uterine glands and nonplacental uterine areas of the canine zonary girdle placenta were devoid of relaxin mRNA and relaxin. We conclude that the syncytiotrophoblast is the source of relaxin in the canine placenta. PMID:10026098

  16. Purification and partial amino acid sequence of the chloroplast cytochrome b-559.

    PubMed

    Widger, W R; Cramer, W A; Hermodson, M; Meyer, D; Gullifor, M

    1984-03-25

    The hydrophobic cytochrome b-559, purified from unstacked, ethanol-washed spinach thylakoid membranes, using extraction with 2% Triton X-100 in 4 M urea and three chromatographic steps in the presence of protease inhibitors, has a dominant band on sodium dodecyl sulfate-urea gels corresponding to Mr = 10,000. The yield of this preparation is 30-50% (5-10 mg) starting with 600 mg of chlorophyll. The heme content yields a calculated molecular weight of no more than 17,500/heme, and perhaps somewhat smaller after correction for impurities. The Mr = 10,000 band is stained by the tetramethylbenzidine-H2O2 heme reagent on lithium dodecyl sulfate gels run at 0 degrees C. The Mr = 10,000 protein, further separated by high performance liquid chromatography, contains a unique NH2 terminus that is not blocked, and the amino acid sequence for the first 27 residues is NH2-Ser-Gly-Ser-Thr-Gly-Glu-Arg-Ser-Phe-Ala-Asp-Ile-Ile-Thr-Ser-Ile-Arg-Tyr-Trp -Val-Ile-X-Ser-Ile-Thr-Ile-Pro. . . COOH. Approximately 55% of the amino acids are hydrophobic, based on amino acid analysis of the Mr = 10,000 peptide, which also indicated the presence of at least one histidine. Only one cytochrome b-559 component could be identified, whose yield indicated that it arises from a single b-559 protein in chloroplasts corresponding to the in situ high potential cytochrome of the chloroplast photosystem II. PMID:6706983

  17. Sequence-Specific Electrical Purification of Nucleic Acids with Nanoporous Gold Electrodes.

    PubMed

    Daggumati, Pallavi; Appelt, Sandra; Matharu, Zimple; Marco, Maria L; Seker, Erkin

    2016-06-22

    Nucleic-acid-based biosensors have enabled rapid and sensitive detection of pathogenic targets; however, these devices often require purified nucleic acids for analysis since the constituents of complex biological fluids adversely affect sensor performance. This purification step is typically performed outside the device, thereby increasing sample-to-answer time and introducing contaminants. We report a novel approach using a multifunctional matrix, nanoporous gold (np-Au), which enables both detection of specific target sequences in a complex biological sample and their subsequent purification. The np-Au electrodes modified with 26-mer DNA probes (via thiol-gold chemistry) enabled sensitive detection and capture of complementary DNA targets in the presence of complex media (fetal bovine serum) and other interfering DNA fragments in the range of 50-1500 base pairs. Upon capture, the noncomplementary DNA fragments and serum constituents of varying sizes were washed away. Finally, the surface-bound DNA-DNA hybrids were released by electrochemically cleaving the thiol-gold linkage, and the hybrids were iontophoretically eluted from the nanoporous matrix. The optical and electrophoretic characterization of the analytes before and after the detection-purification process revealed that low target DNA concentrations (80 pg/μL) can be successfully detected in complex biological fluids and subsequently released to yield pure hybrids free of polydisperse digested DNA fragments and serum biomolecules. Taken together, this multifunctional platform is expected to enable seamless integration of detection and purification of nucleic acid biomarkers of pathogens and diseases in miniaturized diagnostic devices. PMID:27244455

  18. Implicit Sequence Learning in Dyslexia: A Within-Sequence Comparison of First- and Higher-Order Information

    ERIC Educational Resources Information Center

    Du, Wenchong; Kelly, Steve W.

    2013-01-01

    The present study examines implicit sequence learning in adult dyslexics with a focus on comparing sequence transitions with different statistical complexities. Learning of a 12-item deterministic sequence was assessed in 12 dyslexic and 12 non-dyslexic university students. Both groups showed equivalent standard reaction time increments when the…

  19. What Matters in Implicit Task Sequence Learning: Perceptual Stimulus Features, Task Sets, or Correlated Streams of Information?

    ERIC Educational Resources Information Center

    Weiermann, Brigitte; Cock, Josephine; Meier, Beat

    2010-01-01

    Implicit task sequence learning may be attributed to learning the order of perceptual stimulus features associated with the task sequence, learning a series of automatic task set activations, or learning an integrated sequence that derives from 2 correlated streams of information. In the present study, our purpose was to distinguish among these 3…

  20. 40 CFR 72.31 - Information requirements for Acid Rain permit applications.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 16 2010-07-01 2010-07-01 false Information requirements for Acid Rain permit applications. 72.31 Section 72.31 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR PROGRAMS (CONTINUED) PERMITS REGULATION Acid Rain Permit Applications § 72.31 Information requirements for Acid Rain...

  1. Homology analyses of the protein sequences of fatty acid synthases from chicken liver, rat mammary gland, and yeast

    SciTech Connect

    Chang, Soo-Ik ); Hammes, G.G. )

    1989-11-01

    Homology analyses of the protein sequences of chicken liver and rat mammary gland fatty acid synthases were carried out. The amino acid sequences of the chicken and rat enzymes are 67% identical. If conservative substitutions are allowed, 78% of the amino acids are matched. A region of low homologies exists between the functional domains, in particular around amino acid residues 1059-1264 of the chicken enzyme. Homologies between the active sites of chicken and rat and of chicken and yeast enzymes have been analyzed by an alignment method. A high degree of homology exists between the active sites of the chicken and rat enzymes. However, the chicken and yeast enzymes show a lower degree of homology. The DADPH-binding dinucleotide folds of the {beta}-ketoacyl reductase and the enoyl reductase sites were identified by comparison with a known consensus sequence for the DADP- and FAD-binding dinucleotide folds. The active sites of all of the enzymes are primarily in hydrophobic regions of the protein. This study suggests that the genes for the functional domains of fatty acid synthase were originally separated, and these genes were connected to each other by using different connecting nucleotide sequences in different species. An alternative explanation for the differences in rat and chicken is a common ancestry and mutations in the joining regions during evolution.

  2. Identification of shark species in seafood products by forensically informative nucleotide sequencing (FINS).

    PubMed

    Blanco, M; Pérez-Martín, R I; Sotelo, C G

    2008-11-12

    The identification of commercial shark species is a relevant issue to ensure the correct labeling of seafood products, to maintain consumer confidence in seafood, and to enhance the knowledge of the species and volumes that are at present being captured, thus improving the management of shark fisheries. The polymerase chain reaction was employed to obtain a 423 bp amplicon from the mitochondrial cytochrome b gene. The sequences from this fragment, belonging to 63 authentic individuals of 23 species, were analyzed using a genetic distance method. Nine different samples of commercial fresh, frozen, and convenience food were obtained in local and international markets to validate the methodology. These samples were analyzed, and sequences were employed for species identification, showing that forensically informative nucleotide sequencing (FINS) is a suitable technique for identification of processed seafood containing shark as an ingredient. The results also showed that incorrect labeling practices may occur regarding shark products, probably because of incorrect labeling at the production point. PMID:18831561

  3. cDNA-derived amino acid sequence of rat mitochondrial 3-oxoacyl-CoA thiolase with no transient presequence: structural relationship with peroxisomal isozyme.

    PubMed Central

    Arakawa, H; Takiguchi, M; Amaya, Y; Nagata, S; Hayashi, H; Mori, M

    1987-01-01

    The sorting of homologous proteins between two separate intracellular organelles is a major unsolved problem. 3-Oxoacyl-CoA thiolase is localized in mitochondria and peroxisomes, and provides a good system for the study on the problem. Unlike most mitochondrial matrix proteins, mitochondrial 3-oxoacyl-CoA thiolase in rats is synthesized with no transient presequence and possess information for mitochondrial targeting and import in the mature protein. Two overlapping cDNA clones contained an open reading frame encoding a polypeptide of 397 amino acid residues (predicted Mr = 41,868), a 5' untranslated sequence of 164 bp, a 3' untranslated sequence of 264 bp and a poly(A) tract. The amino acid sequence of the mitochondrial thiolase is 37% identical with that of the mature portion of rat peroxisomal 3-oxoacyl-CoA thiolase precursor. These results suggest that the two thiolases have a common origin and obtained information for targeting to respective organelles during evolution. Two portions in the mitochondrial thiolase that may serve as a mitochondrial targeting signal are presented. PMID:3038520

  4. Towards rationally redesigning bacterial signaling systems using information encoded in abundant sequence data

    NASA Astrophysics Data System (ADS)

    Cheng, Ryan; Morcos, Faruck; Levine, Herbert; Onuchic, Jose

    2014-03-01

    An important challenge in biology is to distinguish the subset of residues that allow bacterial two-component signaling (TCS) proteins to preferentially interact with their correct TCS partner such that they can bind and transfer signal. Detailed knowledge of this information would allow one to search sequence-space for mutations that can systematically tune the signal transmission between TCS partners as well as re-encode a TCS protein to preferentially transfer signals to a non-partner. Motivated by the notion that this detailed information is found in sequence data, we explore the mutual sequence co-evolution between signaling partners to infer how mutations can positively or negatively alter their interaction. Using Direct Coupling Analysis (DCA) for determining evolutionarily conserved interprotein interactions, we apply a DCA-based metric to quantify mutational changes in the interaction between TCS proteins and demonstrate that it accurately correlates with experimental mutagenesis studies probing the mutational change in the in vitro phosphotransfer. Our methodology serves as a potential framework for the rational design of TCS systems as well as a framework for the system-level study of protein-protein interactions in sequence-rich systems. This research has been supported by the NSF INSPIRE award MCB-1241332 and by the CTBP sponsored by the NSF (Grant PHY-1308264).

  5. Combining Structure and Sequence Information Allows Automated Prediction of Substrate Specificities within Enzyme Families

    PubMed Central

    Röttig, Marc; Rausch, Christian; Kohlbacher, Oliver

    2010-01-01

    An important aspect of the functional annotation of enzymes is not only the type of reaction catalysed by an enzyme, but also the substrate specificity, which can vary widely within the same family. In many cases, prediction of family membership and even substrate specificity is possible from enzyme sequence alone, using a nearest neighbour classification rule. However, the combination of structural information and sequence information can improve the interpretability and accuracy of predictive models. The method presented here, Active Site Classification (ASC), automatically extracts the residues lining the active site from one representative three-dimensional structure and the corresponding residues from sequences of other members of the family. From a set of representatives with known substrate specificity, a Support Vector Machine (SVM) can then learn a model of substrate specificity. Applied to a sequence of unknown specificity, the SVM can then predict the most likely substrate. The models can also be analysed to reveal the underlying structural reasons determining substrate specificities and thus yield valuable insights into mechanisms of enzyme specificity. We illustrate the high prediction accuracy achieved on two benchmark data sets and the structural insights gained from ASC by a detailed analysis of the family of decarboxylating dehydrogenases. The ASC web service is available at http://asc.informatik.uni-tuebingen.de/. PMID:20072606

  6. Sequence exploration reveals information bias among molecular markers used in phylogenetic reconstruction for Colletotrichum species.

    PubMed

    Rampersad, Sephra N; Hosein, Fazeeda N; Carrington, Christine Vf

    2014-01-01

    The Colletotrichum gloeosporioides species complex is among the most destructive fungal plant pathogens in the world, however, identification of isolates of quarantine importance to the intra-specific level is confounded by a number of factors that affect phylogenetic reconstruction. Information bias and quality parameters were investigated to determine whether nucleotide sequence alignments and phylogenetic trees accurately reflect the genetic diversity and phylogenetic relatedness of individuals. Sequence exploration of GAPDH, ACT, TUB2 and ITS markers indicated that the query sequences had different patterns of nucleotide substitution but were without evidence of base substitution saturation. Regions of high entropy were much more dispersed in the ACT and GAPDH marker alignments than for the ITS and TUB2 markers. A discernible bimodal gap in the genetic distance frequency histograms was produced for the ACT and GAPDH markers which indicated successful separation of intra- and inter-specific sequences in the data set. Overall, analyses indicated clear differences in the ability of these markers to phylogenetically separate individuals to the intra-specific level which coincided with information bias. PMID:25392785

  7. Complete amino acid sequence of the medium-chain S-acyl fatty acid synthetase thio ester hydrolase from rat mammary gland

    SciTech Connect

    Randhawa, Z.I.; Smith, S.

    1987-03-10

    The complete amino acid sequence of the medium-chain S-acyl fatty acid synthetase thio ester hydrolase (thioesterase II) from rat mammary gland is presented. Most of the sequence was derived by analysis of (/sup 14/C)-labelled peptide fragments produced by cleavage at methionyl, glutamyl, lysyl, arginyl, and tryptophanyl residues. A small section of the sequence was deduced from a previously analyzed cDNA clone. The protein consists of 260 residues and has a blocked amino-terminal methionine and calculated M/sub r/ of 29,212. The carboxy-terminal sequence, verified by Edman degradation of the carboxy-terminal cyanogen bromide fragment and carboxypeptidase Y digestion of the intact thioesterase II, terminates with a serine residue and lacks three additional residues predicted by the cDNA sequence. The native enzyme contains three cysteine residues but no disulfide bridges. The active site serine residue is located at position 101. The rat mammary gland thioesterase II exhibits approximately 40% homology with a thioesterase from mallard uropygial gland, the sequence of which was recently determined by cDNA analysis. Thus the two enzymes may share similar structural features and a common evolutionary origin. The location of the active site in these thioesterases differs from that of other serine active site esterases; indeed, the enzymes do not exhibit any significant homology with other serine esterases, suggesting that they may constitute a separate new family of serine active site enzymes.

  8. The complete amino acid sequence of the A-chain of human plasma alpha 2HS-glycoprotein.

    PubMed

    Yoshioka, Y; Gejyo, F; Marti, T; Rickli, E E; Bürgi, W; Offner, G D; Troxler, R F; Schmid, K

    1986-02-01

    Normal human plasma alpha 2HS-glycoprotein has earlier been shown to be comprised of two polypeptide chains. Recently, the amino acid and carbohydrate sequences of the short chain were elucidated (Gejyo, F., Chang, J.-L., Bürgi, W., Schmid, K., Offner, G. D., Troxler, R.F., van Halbeck, H., Dorland, L., Gerwig, G. J., and Vliegenthart, J.F.G. (1983) J. Biol. Chem. 258, 4966-4971). In the present study, the amino acid sequence of the long chain of this protein, designated A-chain, was determined and found to consist of 282 amino acid residues. Twenty-four amino acid doublets were found; the most abundant of these are Pro-Pro and Ala-Ala which each occur five times. Of particular interest is the presence of three Gly-X-Pro and one Gly-Pro-X sequences that are characteristic of the repeating sequences of collagens. Chou-Fasman evaluation of the secondary structure suggested that the A-chain contains 29% alpha-helix, 24% beta-pleated sheet, and 26% reverse turns and, thus, approximately 80% of the polypeptide chain may display ordered structure. Four glycosylation sites were identified. The two N-glycosidic oligosaccharides were found in the center region (residues 138 and 158), whereas the two O-glycosidic heterosaccharides, both linked to threonine (residues 238 and 252), occur within the carboxyl-terminal region. The N-glycans are linked to Asn residues in beta-turns, while the O-glycans are located in short random segments. Comparison of the sequence of the amino- and carboxyl-terminal 30 residues with protein sequences in a data bank demonstrated that the A-chain is not significantly related to any known proteins. However, the proline-rich carboxyl-terminal region of the A-chain displays some sequence similarity to collagens and the collagen-like domains of complement subcomponent C1q. PMID:3944104

  9. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

    PubMed Central

    Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

    2015-01-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930

  10. Analysis of the functional domains of biosynthetic threonine deaminase by comparison of the amino acid sequences of three wild-type alleles to the amino acid sequence of biodegradative threonine deaminase.

    PubMed

    Taillon, B E; Little, R; Lawther, R P

    1988-03-31

    The nucleotide sequence of the gene, ilvA, for biosynthetic threonine deaminase (Tda) from Salmonella typhimurium was determined. The deduced amino acid sequence was compared with the deduced amino acid sequences of the biosynthetic Tda from Escherichia coli K-12 (ilvA) and Saccharomyces cerevisiae (ILV1) and the biodegradative Tda from E. coli K-12 (tdc). The comparison indicated the presence of two types of blocks of homologous amino acids. The first type of homology is in the N-terminal portion of all four isozymes of Tda and probably indicates amino acids involved in catalysis. The second type of homology is found in the C-terminal portion of the three biosynthetic isozymes and presumably is involved in either (i) the binding or interaction of the allosteric effector isoleucine with the enzyme, or (ii) subunit interactions. The sites of amino acid changes of two E. coli K-12 ilvA alleles with altered response to isoleucine are consistent with the conclusion that the C-terminal portion of biosynthetic Tda is involved in allosteric regulation. PMID:3290055

  11. Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments

    PubMed Central

    2014-01-01

    Background Several methods are available for the detection of covarying positions from a multiple sequence alignment (MSA). If the MSA contains a large number of sequences, information about the proximities between residues derived from covariation maps can be sufficient to predict a protein fold. However, in many cases the structure is already known, and information on the covarying positions can be valuable to understand the protein mechanism and dynamic properties. Results In this study we have sought to determine whether a multivariate (multidimensional) extension of traditional mutual information (MI) can be an additional tool to study covariation. The performance of two multidimensional MI (mdMI) methods, designed to remove the effect of ternary/quaternary interdependencies, was tested with a set of 9 MSAs each containing <400 sequences, and was shown to be comparable to that of the newest methods based on maximum entropy/pseudolikelyhood statistical models of protein sequences. However, while all the methods tested detected a similar number of covarying pairs among the residues separated by < 8 Å in the reference X-ray structures, there was on average less than 65% overlap between the top scoring pairs detected by methods that are based on different principles. Conclusions Given the large variety of structure and evolutionary history of different proteins it is possible that a single best method to detect covariation in all proteins does not exist, and that for each protein family the best information can be derived by merging/comparing results obtained with different methods. This approach may be particularly valuable in those cases in which the size of the MSA is small or the quality of the alignment is low, leading to significant differences in the pairs detected by different methods. PMID:24886131

  12. The developmental transcriptome landscape of bovine skeletal muscle defined by Ribo-Zero ribonucleic acid sequencing.

    PubMed

    Sun, X; Li, M; Sun, Y; Cai, H; Li, R; Wei, X; Lan, X; Huang, Y; Lei, C; Chen, H

    2015-12-01

    Ribonucleic acid sequencing (RNA-Seq) libraries are normally prepared with oligo(dT) selection of poly(A)+ mRNA, but it depends on intact total RNA samples. Recent studies have described Ribo-Zero technology, a novel method that can capture both poly(A)+ and poly(A)- transcripts from intact or fragmented RNA samples. We report here the first application of Ribo-Zero RNA-Seq for the analysis of the bovine embryonic, neonatal, and adult skeletal muscle whole transcriptome at an unprecedented depth. Overall, 19,893 genes were found to be expressed, with a high correlation of expression levels between the calf and the adult. Hundreds of genes were found to be highly expressed in the embryo and decreased at least 10-fold after birth, indicating their potential roles in embryonic muscle development. In addition, we present for the first time the analysis of global transcript isoform discovery in bovine skeletal muscle and identified 36,694 transcript isoforms. Transcriptomic data were also analyzed to unravel sequence variations; 185,036 putative SNP and 12,428 putative short insertions-deletions (InDel) were detected. Specifically, many stop-gain, stop-loss, and frameshift mutations were identified that probably change the relative protein production and sequentially affect the gene function. Notably, the numbers of stage-specific transcripts, alternative splicing events, SNP, and InDel were greater in the embryo than in the calf and the adult, suggesting that gene expression is most active in the embryo. The resulting view of the transcriptome at a single-base resolution greatly enhances the comprehensive transcript catalog and uncovers the global trends in gene expression during bovine skeletal muscle development. PMID:26641174

  13. Method for the detection of specific nucleic acid sequences by polymerase nucleotide incorporation

    DOEpatents

    Castro, Alonso

    2004-06-01

    A method for rapid and efficient detection of a target DNA or RNA sequence is provided. A primer having a 3'-hydroxyl group at one end and having a sequence of nucleotides sufficiently homologous with an identifying sequence of nucleotides in the target DNA is selected. The primer is hybridized to the identifying sequence of nucleotides on the DNA or RNA sequence and a reporter molecule is synthesized on the target sequence by progressively binding complementary nucleotides to the primer, where the complementary nucleotides include nucleotides labeled with a fluorophore. Fluorescence emitted by fluorophores on single reporter molecules is detected to identify the target DNA or RNA sequence.

  14. Creation of a data base for sequences of ribosomal nucleic acids and detection of conserved restriction endonucleases sites through computerized processing.

    PubMed Central

    Patarca, R; Dorta, B; Ramirez, J L

    1982-01-01

    As part of a project pertaining the organization of ribosomal genes in Kinetoplastidae, we have created a data base for published sequences of ribosomal nucleic acids, with information in Spanish. As a first step in their processing, we have written a computer program which introduces the new feature of determining the length of the fragments produced after single or multiple digestion with any of the known restriction enzymes. With this information we have detected conserved SAU 3A sites: (i) at the 5' end of the 5.8S rRNA and at the 3' end of the small subunit rRNA, both included in similar larger sequences; (ii) in the 5.8S rRNA of vertebrates (a second one), which is not present in lower eukaryotes, showing a clear evolutive divergence; and, (iii) at the 5' terminal of the small subunit rRNA, included in a larger conserved sequence. The possible biological importance of these sequences is discussed. PMID:6278402

  15. Characterization and cDNA sequence of Bothriechis schlegeliil-amino acid oxidase with antibacterial activity.

    PubMed

    Vargas Muñoz, Leidy Johana; Estrada-Gomez, Sebastian; Núñez, Vitelbina; Sanz, Libia; Calvete, Juan J

    2014-08-01

    Snake venoms are complex mixtures of proteins including l-amino acid oxidase (lAAO). A lAAO (named BslAAO) with a mass of 56kDa and a theoretical Ip of 5.79, was purified from Bothriechis schlegelii venom through size-exclusion, ion exchange and affinity chromatography. The entire protein sequence of 498 amino acids, was determined from cDNA using reverse-transcribed mRNA isolated from venom gland. The enzyme showed dose-dependent inhibition of bacterial growth. BslAAO showed inhibitory effect against S. aureus with a MIC of 4μg/mL and a MBC of 8μg/mL. Against Acinetobacter baumannii, showed a MIC of 2μg/mL and MBC of 4μg/mL, No effect was observed in Escherichia coli. This antibacterial activity was inhibited by catalase, indicating that antimicrobial activity was due to H2O2 production. BslAAO did not show any cytotoxic activity toward mouse myoblast cell line C2C12 or peripheral blood mononuclear cells. The enzyme oxidated l-Leu, with a Km of 16.37μM and a Vmax of 0.39μM/min. Snake venoms lAAOs, are potential frames of different therapeutics molecules since these enzymes exhibit low MICs and MBCs and show to be harmless to human cells due to microorganisms being generally several fold more sensitive to reactive oxygen species than human tissues. PMID:24875315

  16. Genome Sequence of a Candidate World Health Organization Reference Strain of Zika Virus for Nucleic Acid Testing

    PubMed Central

    Trösemeier, Jan-Hendrik; Musso, Didier; Blümel, Johannes; Thézé, Julien; Pybus, Oliver G.

    2016-01-01

    We report here the sequence of a candidate reference strain of Zika virus (ZIKV) developed on behalf of the World Health Organization (WHO). The ZIKV reference strain is intended for use in nucleic acid amplification (NAT)-based assays for the detection and quantification of ZIKV RNA. PMID:27587826

  17. Genome Sequence of Schizochytrium sp. CCTCC M209059, an Effective Producer of Docosahexaenoic Acid-Rich Lipids

    PubMed Central

    Ji, Xiao-Jun; Mo, Kai-Qiang; Ren, Lu-Jing; Li, Gan-Lu; Huang, Jian-Zhong

    2015-01-01

    Schizochytrium is an effective species for producing omega-3 docosahexaenoic acid (DHA). Here, we report a genome sequence of Schizochytrium sp. CCTCC M209059, which has a genome size of 39.09 Mb. It will provide the genomic basis for further insights into the metabolic and regulatory mechanisms underlying the DHA formation. PMID:26251485

  18. Evolutionary Distance of Amino Acid Sequence Orthologs across Macaque Subspecies: Identifying Candidate Genes for SIV Resistance in Chinese Rhesus Macaques

    PubMed Central

    Ross, Cody T.; Roodgar, Morteza; Smith, David Glenn

    2015-01-01

    We use the Reciprocal Smallest Distance (RSD) algorithm to identify amino acid sequence orthologs in the Chinese and Indian rhesus macaque draft sequences and estimate the evolutionary distance between such orthologs. We then use GOanna to map gene function annotations and human gene identifiers to the rhesus macaque amino acid sequences. We conclude methodologically by cross-tabulating a list of amino acid orthologs with large divergence scores with a list of genes known to be involved in SIV or HIV pathogenesis. We find that many of the amino acid sequences with large evolutionary divergence scores, as calculated by the RSD algorithm, have been shown to be related to HIV pathogenesis in previous laboratory studies. Four of the strongest candidate genes for SIVmac resistance in Chinese rhesus macaques identified in this study are CDK9, CXCL12, TRIM21, and TRIM32. Additionally, ANKRD30A, CTSZ, GORASP2, GTF2H1, IL13RA1, MUC16, NMDAR1, Notch1, NT5M, PDCD5, RAD50, and TM9SF2 were identified as possible candidates, among others. We failed to find many laboratory experiments contrasting the effects of Indian and Chinese orthologs at these sites on SIVmac pathogenesis, but future comparative studies might hold fertile ground for research into the biological mechanisms underlying innate resistance to SIVmac in Chinese rhesus macaques. PMID:25884674

  19. Evolutionary distance of amino acid sequence orthologs across macaque subspecies: identifying candidate genes for SIV resistance in Chinese rhesus macaques.

    PubMed

    Ross, Cody T; Roodgar, Morteza; Smith, David Glenn

    2015-01-01

    We use the Reciprocal Smallest Distance (RSD) algorithm to identify amino acid sequence orthologs in the Chinese and Indian rhesus macaque draft sequences and estimate the evolutionary distance between such orthologs. We then use GOanna to map gene function annotations and human gene identifiers to the rhesus macaque amino acid sequences. We conclude methodologically by cross-tabulating a list of amino acid orthologs with large divergence scores with a list of genes known to be involved in SIV or HIV pathogenesis. We find that many of the amino acid sequences with large evolutionary divergence scores, as calculated by the RSD algorithm, have been shown to be related to HIV pathogenesis in previous laboratory studies. Four of the strongest candidate genes for SIVmac resistance in Chinese rhesus macaques identified in this study are CDK9, CXCL12, TRIM21, and TRIM32. Additionally, ANKRD30A, CTSZ, GORASP2, GTF2H1, IL13RA1, MUC16, NMDAR1, Notch1, NT5M, PDCD5, RAD50, and TM9SF2 were identified as possible candidates, among others. We failed to find many laboratory experiments contrasting the effects of Indian and Chinese orthologs at these sites on SIVmac pathogenesis, but future comparative studies might hold fertile ground for research into the biological mechanisms underlying innate resistance to SIVmac in Chinese rhesus macaques. PMID:25884674

  20. Draft Genome Sequence of Lactobacillus delbrueckii subsp. bulgaricus CFL1, a Lactic Acid Bacterium Isolated from French Handcrafted Fermented Milk.

    PubMed

    Meneghel, Julie; Dugat-Bony, Eric; Irlinger, Françoise; Loux, Valentin; Vidal, Marie; Passot, Stéphanie; Béal, Catherine; Layec, Séverine; Fonseca, Fernanda

    2016-01-01

    Lactobacillus delbrueckii subsp. bulgaricus (L. bulgaricus) is a lactic acid bacterium widely used for the production of yogurt and cheeses. Here, we report the genome sequence of L. bulgaricus CFL1 to improve our knowledge on its stress-induced damages following production and end-use processes. PMID:26941141

  1. Draft Genome Sequence of Cutaneotrichosporon curvatus DSM 101032 (Formerly Cryptococcus curvatus), an Oleaginous Yeast Producing Polyunsaturated Fatty Acids.

    PubMed

    Hofmeyer, Thomas; Hackenschmidt, Silke; Nadler, Florian; Thürmer, Andrea; Daniel, Rolf; Kabisch, Johannes

    2016-01-01

    Cutaneotrichosporon curvatus DSM 101032 is an oleaginous yeast that can be isolated from various habitats and is capable of producing substantial amounts of polyunsaturated fatty acids. Here, we present the first draft genome sequence of any C. curvatus species. PMID:27174275

  2. Complete genome sequence of Lactobacillus plantarum ZS2058, a probiotic strain with high conjugated linoleic acid production ability.

    PubMed

    Yang, Bo; Chen, Haiqin; Tian, Fengwei; Zhao, Jianxin; Gu, Zhennan; Zhang, Hao; Chen, Yong Q; Chen, Wei

    2015-11-20

    Lactobacillus plantarum ZS2058 was isolated from sauerkraut and identified to synthesize the beneficial metabolite conjugated linoleic acid. The genome contains a 319,7363-bp chromosome and three plasmids. The sequence will facilitate identification and characterization of the genetic determinants for its putative biological benefits. PMID:26439428

  3. Draft Genome Sequence of Burkholderia stabilis LA20W, a Trehalose Producer That Uses Levulinic Acid as a Substrate

    PubMed Central

    Sato, Yuya; Koike, Hideaki; Kondo, Susumu; Hori, Tomoyuki; Kanno, Manabu; Kimura, Nobutada; Morita, Tomotake; Kirimura, Kohtaro

    2016-01-01

    Burkholderia stabilis LA20W produces trehalose using levulinic acid (LA) as a substrate. Here, we report the 7.97-Mb draft genome sequence of B. stabilis LA20W, which will be useful in investigations of the enzymes involved in LA metabolism and the mechanism of LA-induced trehalose production. PMID:27491978

  4. Draft Genome Sequence of Acetobacter tropicalis Type Strain NBRC16470, a Producer of Optically Pure d-Glyceric Acid.

    PubMed

    Koike, Hideaki; Sato, Shun; Morita, Tomotake; Fukuoka, Tokuma; Habe, Hiroshi

    2014-01-01

    Here we report the 3.7-Mb draft genome sequence of Acetobacter tropicalis NBRC16470(T), which can produce optically pure d-glyceric acid (d-GA; 99% enantiomeric excess) from raw glycerol feedstock derived from biodiesel fuel production processes. PMID:25523780

  5. Genome Sequence of a Candidate World Health Organization Reference Strain of Zika Virus for Nucleic Acid Testing.

    PubMed

    Trösemeier, Jan-Hendrik; Musso, Didier; Blümel, Johannes; Thézé, Julien; Pybus, Oliver G; Baylis, Sally A

    2016-01-01

    We report here the sequence of a candidate reference strain of Zika virus (ZIKV) developed on behalf of the World Health Organization (WHO). The ZIKV reference strain is intended for use in nucleic acid amplification (NAT)-based assays for the detection and quantification of ZIKV RNA. PMID:27587826

  6. Draft Genome Sequence of Burkholderia stabilis LA20W, a Trehalose Producer That Uses Levulinic Acid as a Substrate.

    PubMed

    Sato, Yuya; Koike, Hideaki; Kondo, Susumu; Hori, Tomoyuki; Kanno, Manabu; Kimura, Nobutada; Morita, Tomotake; Kirimura, Kohtaro; Habe, Hiroshi

    2016-01-01

    Burkholderia stabilis LA20W produces trehalose using levulinic acid (LA) as a substrate. Here, we report the 7.97-Mb draft genome sequence of B. stabilis LA20W, which will be useful in investigations of the enzymes involved in LA metabolism and the mechanism of LA-induced trehalose production. PMID:27491978

  7. Draft Genome Sequence of Cutaneotrichosporon curvatus DSM 101032 (Formerly Cryptococcus curvatus), an Oleaginous Yeast Producing Polyunsaturated Fatty Acids

    PubMed Central

    Hofmeyer, Thomas; Hackenschmidt, Silke; Nadler, Florian; Thürmer, Andrea; Daniel, Rolf

    2016-01-01

    Cutaneotrichosporon curvatus DSM 101032 is an oleaginous yeast that can be isolated from various habitats and is capable of producing substantial amounts of polyunsaturated fatty acids. Here, we present the first draft genome sequence of any C. curvatus species. PMID:27174275

  8. Ultra high-throughput nucleic acid sequencing as a tool for virus discovery in the turkey gut.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Recently, the use of the next generation of nucleic acid sequencing technology (i.e., 454 pyrosequencing, as developed by Roche/454 Life Sciences) has allowed an in-depth look at the uncultivated microorganisms present in complex environmental samples, including samples with agricultural importance....

  9. Draft Genome Sequence of Lactobacillus delbrueckii subsp. bulgaricus CFL1, a Lactic Acid Bacterium Isolated from French Handcrafted Fermented Milk

    PubMed Central

    Meneghel, Julie; Irlinger, Françoise; Loux, Valentin; Vidal, Marie; Passot, Stéphanie; Béal, Catherine; Layec, Séverine

    2016-01-01

    Lactobacillus delbrueckii subsp. bulgaricus (L. bulgaricus) is a lactic acid bacterium widely used for the production of yogurt and cheeses. Here, we report the genome sequence of L. bulgaricus CFL1 to improve our knowledge on its stress-induced damages following production and end-use processes. PMID:26941141

  10. Clinical decision support for whole genome sequence information leveraging a service-oriented architecture: a prototype.

    PubMed

    Welch, Brandon M; Rodriguez-Loya, Salvador; Eilbeck, Karen; Kawamoto, Kensaku

    2014-01-01

    Whole genome sequence (WGS) information could soon be routinely available to clinicians to support the personalized care of their patients. At such time, clinical decision support (CDS) integrated into the clinical workflow will likely be necessary to support genome-guided clinical care. Nevertheless, developing CDS capabilities for WGS information presents many unique challenges that need to be overcome for such approaches to be effective. In this manuscript, we describe the development of a prototype CDS system that is capable of providing genome-guided CDS at the point of care and within the clinical workflow. To demonstrate the functionality of this prototype, we implemented a clinical scenario of a hypothetical patient at high risk for Lynch Syndrome based on his genomic information. We demonstrate that this system can effectively use service-oriented architecture principles and standards-based components to deliver point of care CDS for WGS information in real-time. PMID:25954430

  11. Sequence-Specific Recognition of MicroRNAs and Other Short Nucleic Acids with Solid-State Nanopores.

    PubMed

    Zahid, Osama K; Wang, Fanny; Ruzicka, Jan A; Taylor, Ethan W; Hall, Adam R

    2016-03-01

    The detection and quantification of short nucleic acid sequences has many potential applications in studying biological processes, monitoring disease initiation and progression, and evaluating environmental systems, but is challenging by nature. We present here an assay based on the solid-state nanopore platform for the identification of specific sequences in solution. We demonstrate that hybridization of a target nucleic acid with a synthetic probe molecule enables discrimination between duplex and single-stranded molecules with high efficacy. Our approach requires limited preparation of samples and yields an unambiguous translocation event rate enhancement that can be used to determine the presence and abundance of a single sequence within a background of nontarget oligonucleotides. PMID:26824296

  12. Analysis of the code relating sequence to conformation in globular proteins. Development of a stereochemical alphabet on the basis of intra-residue information

    PubMed Central

    Robson, Barry; Pain, Roger H.

    1974-01-01

    1. The relation of primary sequence to all residue backbone conformations was explored to test out starting conformations for protein folding. 2. Information theory was used to obtain measures of information which quantitate the role of each residue in determining its own conformation; i.e. intra-residue information. 3. The information measures are plotted as a function of ϕ, ψ peptide-backbone angles and ϕ, ψ contour maps obtained for each of the 20 amino acids. These show characteristic differences between residues. 4. To find practical ways of relating sequence to ϕ, ψ angles, several types of stereochemical alphabet were investigated. The value of these was tested by using them to predict the ϕ, ψ angles of nine different proteins. 5. A difference plot was constructed to show regions of the sequence that require little or no information extra to the intra-residue information in order to predict a correct conformation. These regions are suggested to be candidates for nucleating sites in the protein. PMID:4463966

  13. Sequencing around 5-Hydroxyconiferyl Alcohol-Derived Units in Caffeic Acid O-Methyltransferase-Deficient Poplar Lignins1[OA

    PubMed Central

    Lu, Fachuang; Marita, Jane M.; Lapierre, Catherine; Jouanin, Lise; Morreel, Kris; Boerjan, Wout; Ralph, John

    2010-01-01

    Caffeic acid O-methyltransferase (COMT) is a bifunctional enzyme that methylates the 5- and 3-hydroxyl positions on the aromatic ring of monolignol precursors, with a preference for 5-hydroxyconiferaldehyde, on the way to producing sinapyl alcohol. Lignins in COMT-deficient plants contain benzodioxane substructures due to the incorporation of 5-hydroxyconiferyl alcohol (5-OH-CA), as a monomer, into the lignin polymer. The derivatization followed by reductive cleavage method can be used to detect and determine benzodioxane structures because of their total survival under this degradation method. Moreover, partial sequencing information for 5-OH-CA incorporation into lignin can be derived from detection or isolation and structural analysis of the resulting benzodioxane products. Results from a modified derivatization followed by reductive cleavage analysis of COMT-deficient lignins provide evidence that 5-OH-CA cross couples (at its β-position) with syringyl and guaiacyl units (at their O-4-positions) in the growing lignin polymer and then either coniferyl or sinapyl alcohol, or another 5-hydroxyconiferyl monomer, adds to the resulting 5-hydroxyguaiacyl terminus, producing the benzodioxane. This new terminus may also become etherified by coupling with further monolignols, incorporating the 5-OH-CA integrally into the lignin structure. PMID:20427467

  14. Sequencing around 5-hydroxyconiferyl alcohol-derived units in caffeic acid O-methyltransferase-deficient poplar lignins.

    PubMed

    Lu, Fachuang; Marita, Jane M; Lapierre, Catherine; Jouanin, Lise; Morreel, Kris; Boerjan, Wout; Ralph, John

    2010-06-01

    Caffeic acid O-methyltransferase (COMT) is a bifunctional enzyme that methylates the 5- and 3-hydroxyl positions on the aromatic ring of monolignol precursors, with a preference for 5-hydroxyconiferaldehyde, on the way to producing sinapyl alcohol. Lignins in COMT-deficient plants contain benzodioxane substructures due to the incorporation of 5-hydroxyconiferyl alcohol (5-OH-CA), as a monomer, into the lignin polymer. The derivatization followed by reductive cleavage method can be used to detect and determine benzodioxane structures because of their total survival under this degradation method. Moreover, partial sequencing information for 5-OH-CA incorporation into lignin can be derived from detection or isolation and structural analysis of the resulting benzodioxane products. Results from a modified derivatization followed by reductive cleavage analysis of COMT-deficient lignins provide evidence that 5-OH-CA cross couples (at its beta-position) with syringyl and guaiacyl units (at their O-4-positions) in the growing lignin polymer and then either coniferyl or sinapyl alcohol, or another 5-hydroxyconiferyl monomer, adds to the resulting 5-hydroxyguaiacyl terminus, producing the benzodioxane. This new terminus may also become etherified by coupling with further monolignols, incorporating the 5-OH-CA integrally into the lignin structure. PMID:20427467

  15. Amino acid sequence of rabbit kidney neutral endopeptidase 24.11 (enkephalinase) deduced from a complementary DNA.

    PubMed Central

    Devault, A; Lazure, C; Nault, C; Le Moual, H; Seidah, N G; Chrétien, M; Kahn, P; Powell, J; Mallet, J; Beaumont, A

    1987-01-01

    Neutral endopeptidase (EC 3.4.24.11) is a major constituent of kidney brush border membranes. It is also present in the brain where it has been shown to be involved in the inactivation of opioid peptides, methionine- and leucine-enkephalins. For this reason this enzyme is often called 'enkephalinase'. In order to characterize the primary structure of the enzyme, oligonucleotide probes were designed from partial amino acid sequences and used to isolate clones from kidney cDNA libraries. Sequencing of the cDNA inserts revealed the complete primary structure of the enzyme. Neutral endopeptidase consists of 750 amino acids. It contains a short N-terminal cytoplasmic domain (27 amino acids), a single membrane-spanning segment (23 amino acids) and an extracellular domain that comprises most of the protein mass. The comparison of the primary structure of neutral endopeptidase with that of thermolysin, a bacterial Zn-metallopeptidase, indicates that most of the amino acid residues involved in Zn coordination and catalytic activity in thermolysin are found within highly honmologous sequences in neutral endopeptidase. Images Fig. 1. Fig. 3. PMID:2440677

  16. On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions

    NASA Astrophysics Data System (ADS)

    Tarpine, Ryan; Istrail, Sorin

    The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.

  17. Classification of ncRNAs using position and size information in deep sequencing data

    PubMed Central

    Erhard, Florian; Zimmer, Ralf

    2010-01-01

    Motivation: Small non-coding RNAs (ncRNAs) play important roles in various cellular functions in all clades of life. With next-generation sequencing techniques, it has become possible to study ncRNAs in a high-throughput manner and by using specialized algorithms ncRNA classes such as miRNAs can be detected in deep sequencing data. Typically, such methods are targeted to a certain class of ncRNA. Many methods rely on RNA secondary structure prediction, which is not always accurate and not all ncRNA classes are characterized by a common secondary structure. Unbiased classification methods for ncRNAs could be important to improve accuracy and to detect new ncRNA classes in sequencing data. Results: Here, we present a scoring system called ALPS (alignment of pattern matrices score) that only uses primary information from a deep sequencing experiment, i.e. the relative positions and lengths of reads, to classify ncRNAs. ALPS makes no further assumptions, e.g. about common structural properties in the ncRNA class and is nevertheless able to identify ncRNA classes with high accuracy. Since ALPS is not designed to recognize a certain class of ncRNA, it can be used to detect novel ncRNA classes, as long as these unknown ncRNAs have a characteristic pattern of deep sequencing read lengths and positions. We evaluate our scoring system on publicly available deep sequencing data and show that it is able to classify known ncRNAs with high sensitivity and specificity. Availability: Calculated pattern matrices of the datasets hESC and EB are available at the project web site http://www.bio.ifi.lmu.de/ALPS. An implementation of the described method is available upon request from the authors. Contact: florian.erhard@bio.ifi.lmu.de PMID:20823303

  18. Human parainfluenza type 3 virus hemagglutinin-neuraminidase glycoprotein: nucleotide sequence of mRNA and limited amino acid sequence of the purified protein.

    PubMed Central

    Elango, N; Coligan, J E; Jambou, R C; Venkatesan, S

    1986-01-01

    The nucleotide sequence of mRNA for the hemagglutinin-neuraminidase (HN) protein of human parainfluenza type 3 virus obtained from the corresponding cDNA clone had a single long open reading frame encoding a putative protein of 64,254 daltons consisting of 572 amino acids. The deduced protein sequence was confirmed by limited N-terminal amino acid microsequencing of CNBr cleavage fragments of native HN that was purified by immunoprecipitation. The HN protein is moderately hydrophobic and has four potential sites (Asn-X-Ser/Thr) of N-glycosylation in the C-terminal half of the molecule. It is devoid of both the N-terminal signal sequence and the C-terminal membrane anchorage domain characteristic of the hemagglutinin of influenza virus and the fusion (F0) protein of the paramyxoviruses. Instead, it has a single prominent hydrophobic region capable of membrane insertion beginning at 32 residues from the N terminus. This N-terminal membrane insertion is similar to that of influenza virus neuraminidase and the recently reported structures of HN proteins of Sendai virus and simian virus 5. Images PMID:3003381

  19. Sequence dependent N-terminal rearrangement and degradation of peptide nucleic acid (PNA) in aqueous solution

    NASA Technical Reports Server (NTRS)

    Eriksson, M.; Christensen, L.; Schmidt, J.; Haaima, G.; Orgel, L.; Nielsen, P. E.

    1998-01-01

    The stability of the PNA (peptide nucleic acid) thymine monomer inverted question markN-[2-(thymin-1-ylacetyl)]-N-(2-aminoaminoethyl)glycine inverted question mark and those of various PNA oligomers (5-8-mers) have been measured at room temperature (20 degrees C) as a function of pH. The thymine monomer undergoes N-acyl transfer rearrangement with a half-life of 34 days at pH 11 as analyzed by 1H NMR; and two reactions, the N-acyl transfer and a sequential degradation, are found by HPLC analysis to occur at measurable rates for the oligomers at pH 9 or above. Dependent on the amino-terminal sequence, half-lives of 350 h to 163 days were found at pH 9. At pH 12 the half-lives ranged from 1.5 h to 21 days. The results are discussed in terms of PNA as a gene therapeutic drug as well as a possible prebiotic genetic material.

  20. Structural analysis of complementary DNA and amino acid sequences of human and rat androgen receptors

    SciTech Connect

    Chang, C.; Kokontis, J.; Liao, S. )

    1988-10-01

    Structural analysis of cDNAs for human and rat androgen receptors (ARs) indicates that the amino-terminal regions of ARs are rich in oligo- and poly(amino acid) motifs as in some homeotic genes. The human AR has a long stretch of repeated glycines, whereas rat AR has a long stretch of glutamines. There is a considerable sequence similarity among ARs and the receptors for glucocorticoids, progestins, and mineralocorticoids within the steroid-binding domains. The cysteine-rich DNA-binding domains are well conserved. Translation of mRNA transcribed from AR cDNAs yielded 94- and 76-kDa proteins and smaller forms that bind to DNA and have high affinity toward androgens. These rat or human ARs were recognized by human autoantibodies to natural Ars. Molecular hybridization studies, using AR cDNAs as probes, indicated that the ventral prostate and other male accessory organs are rich in AR mRNA and that the production of AR mRNA in the target organs may be autoregulated by androgens.

  1. Rapid and Sensitive Isothermal Detection of Nucleic-acid Sequence by Multiple Cross Displacement Amplification

    PubMed Central

    Wang, Yi; Wang, Yan; Ma, Ai-Jing; Li, Dong-Xun; Luo, Li-Juan; Liu, Dong-Xin; Jin, Dong; Liu, Kai; Ye, Chang-Yun

    2015-01-01

    We have devised a novel amplification strategy based on isothermal strand-displacement polymerization reaction, which was termed multiple cross displacement amplification (MCDA). The approach employed a set of ten specially designed primers spanning ten distinct regions of target sequence and was preceded at a constant temperature (61–65 °C). At the assay temperature, the double-stranded DNAs were at dynamic reaction environment of primer-template hybrid, thus the high concentration of primers annealed to the template strands without a denaturing step to initiate the synthesis. For the subsequent isothermal amplification step, a series of primer binding and extension events yielded several single-stranded DNAs and single-stranded single stem-loop DNA structures. Then, these DNA products enabled the strand-displacement reaction to enter into the exponential amplification. Three mainstream methods, including colorimetric indicators, agarose gel electrophoresis and real-time turbidity, were selected for monitoring the MCDA reaction. Moreover, the practical application of the MCDA assay was successfully evaluated by detecting the target pathogen nucleic acid in pork samples, which offered advantages on quick results, modest equipment requirements, easiness in operation, and high specificity and sensitivity. Here we expounded the basic MCDA mechanism and also provided details on an alternative (Single-MCDA assay, S-MCDA) to MCDA technique. PMID:26154567

  2. Snake venoms. The amino acid sequences of two proteinase inhibitor homologues from Dendroaspis angusticeps venom.

    PubMed

    Joubert, F J; Taljaard, N

    1980-05-01

    Toxins C13S1C3 and C13S2C3 from D. angusticeps venom were purified by gel filtration and ion exchange chromatography. Whereas C13S1C3 contains 57 amino acids, C13S2C3 contains 59 but each include six half-cystine residues. The complete primary structure of the low toxicity proteins have been elucidated. The sequences and the invariant residues of toxins C13S1C3 and C13S2C3 from D. angusticeps venom resemble, respectively, those of the proteinase inhibitor homologues K and I from D. polylepis polylepis venom and they are also homologous to the active proteinase inhibitors from various sources. In C13S1C3 and K the active site lysyl residue of active bovine pancreatic proteinase inhibitor is conserved but the site residue alanine, is replaced by lysine. In C13S2C3 and I the active site residue is replaced by tyrosine. PMID:7429422

  3. Interpretation of personal genome sequencing data in terms of disease ranks based on mutual information

    PubMed Central

    2015-01-01

    Background The rapid advances in genome sequencing technologies have resulted in an unprecedented number of genome variations being discovered in humans. However, there has been very limited coverage of interpretation of the personal genome sequencing data in terms of diseases. Methods In this paper we present the first computational analysis scheme for interpreting personal genome data by simultaneously considering the functional impact of damaging variants and curated disease-gene association data. This method is based on mutual information as a measure of the relative closeness between the personal genome and diseases. We hypothesize that a higher mutual information score implies that the personal genome is more susceptible to a particular disease than other diseases. Results The method was applied to the sequencing data of 50 acute myeloid leukemia (AML) patients in The Cancer Genome Atlas. The utility of associations between a disease and the personal genome was explored using data of healthy (control) people obtained from the 1000 Genomes Project. The ranks of the disease terms in the AML patient group were compared with those in the healthy control group using "Leukemia, Myeloid, Acute" (C04.557.337.539.550) as the corresponding MeSH disease term. The mutual information rank of the disease term was substantially higher in the AML patient group than in the healthy control group, which demonstrates that the proposed methodology can be successfully applied to infer associations between the personal genome and diseases. Conclusions Overall, the area under the receiver operating characteristics curve was significantly larger for the AML patient data than for the healthy controls. This methodology could contribute to consequential discoveries and explanations for mining personal genome sequencing data in terms of diseases, and have versatility with respect to genomic-based knowledge such as drug-gene and environmental-factor-gene interactions. PMID:26045178

  4. Power Spectrum and Mutual Information Analyses of DNA Base (Nucleotide) Sequences

    NASA Astrophysics Data System (ADS)

    Isohata, Yasuhiko; Hayashi, Masaki

    2003-03-01

    On the basis of the power spectrum analyses for the base (nucleotide) sequences of various genes, we have studied long-range correlations in total base sequences which are expressed as 1/fα, behaviour of the exponent α for the accumulated base sequences as well as periodicities at short range. In particular from the analysis of content rate distributions of α we have obtained the average value \\barα=0.40± 0.01 and \\barα=0.20± 0.01 for the human genes and S. cerevisiae genes, respectively. We have also performed the analyses using the mutual information function. We show that there exists a clear difference between the content rate distributions of correlation lengths for the sample human genes and the S. cerevisiae genes. We are led to a conjecture that the elongation of the correlation length in the base sequences of genes from the early eukaryote (S. cerevisiae) to the late eukaryote (human) should be the definite reflection of the evolutionary process.

  5. Nucleotide and predicted amino acid sequence of a cDNA clone encoding part of human transketolase.

    PubMed

    Abedinia, M; Layfield, R; Jones, S M; Nixon, P F; Mattick, J S

    1992-03-31

    Transketolase is a key enzyme in the pentose-phosphate pathway which has been implicated in the latent human genetic disease, Wernicke-Korsakoff syndrome. Here we report the cloning and partial characterisation of the coding sequences encoding human transketolase from a human brain cDNA library. The library was screened with oligonucleotide probes based on the amino acid sequence of proteolytic fragments of the purified protein. Northern blots showed that the transketolase mRNA is approximately 2.2 kb, close to the minimum expected, of which approximately 60% was represented in the largest cDNA clone. Sequence analysis of the transketolase coding sequences reveals a number of homologies with related enzymes from other species. PMID:1567394

  6. 5S ribosomal ribonucleic acid sequences in Bacteroides and Fusobacterium: evolutionary relationships within these genera and among eubacteria in general

    NASA Technical Reports Server (NTRS)

    Van den Eynde, H.; De Baere, R.; Shah, H. N.; Gharbia, S. E.; Fox, G. E.; Michalik, J.; Van de Peer, Y.; De Wachter, R.

    1989-01-01

    The 5S ribosomal ribonucleic acid (rRNA) sequences were determined for Bacteroides fragilis, Bacteroides thetaiotaomicron, Bacteroides capillosus, Bacteroides veroralis, Porphyromonas gingivalis, Anaerorhabdus furcosus, Fusobacterium nucleatum, Fusobacterium mortiferum, and Fusobacterium varium. A dendrogram constructed by a clustering algorithm from these sequences, which were aligned with all other hitherto known eubacterial 5S rRNA sequences, showed differences as well as similarities with respect to results derived from 16S rRNA analyses. In the 5S rRNA dendrogram, Bacteroides clustered together with Cytophaga and Fusobacterium, as in 16S rRNA analyses. Intraphylum relationships deduced from 5S rRNAs suggested that Bacteroides is specifically related to Cytophaga rather than to Fusobacterium, as was suggested by 16S rRNA analyses. Previous taxonomic considerations concerning the genus Bacteroides, based on biochemical and physiological data, were confirmed by the 5S rRNA sequence analysis.

  7. National Acid Precipitation Assessment Program: Acidic deposition: An inventory of non-Federal research, monitoring, and assessment information

    SciTech Connect

    Herrick, C.N.

    1990-01-01

    The Acid Precipitation Act of 1990 (Title VII of the Energy Security Act of 1980, P.L. 96-294) established the Interagency Task Force on Acid Precipitation to develop and implement the National Acid Precipitation Assessment Program (NAPAP). The information included in the document was provided to NAPAP's Task Group Leaders and State-of-Science and State-of-Technology authors in July 1989. The early release was intended to assure that the authors would be aware of the information at an early phase in the assessment production process.

  8. A Proposed Clinical Decision Support Architecture Capable of Supporting Whole Genome Sequence Information

    PubMed Central

    Welch, Brandon M.; Rodriguez Loya, Salvador; Eilbeck, Karen; Kawamoto, Kensaku

    2014-01-01

    Whole genome sequence (WGS) information may soon be widely available to help clinicians personalize the care and treatment of patients. However, considerable barriers exist, which may hinder the effective utilization of WGS information in a routine clinical care setting. Clinical decision support (CDS) offers a potential solution to overcome such barriers and to facilitate the effective use of WGS information in the clinic. However, genomic information is complex and will require significant considerations when developing CDS capabilities. As such, this manuscript lays out a conceptual framework for a CDS architecture designed to deliver WGS-guided CDS within the clinical workflow. To handle the complexity and breadth of WGS information, the proposed CDS framework leverages service-oriented capabilities and orchestrates the interaction of several independently-managed components. These independently-managed components include the genome variant knowledge base, the genome database, the CDS knowledge base, a CDS controller and the electronic health record (EHR). A key design feature is that genome data can be stored separately from the EHR. This paper describes in detail: (1) each component of the architecture; (2) the interaction of the components; and (3) how the architecture attempts to overcome the challenges associated with WGS information. We believe that service-oriented CDS capabilities will be essential to using WGS information for personalized medicine. PMID:25411644

  9. A proposed clinical decision support architecture capable of supporting whole genome sequence information.

    PubMed

    Welch, Brandon M; Loya, Salvador Rodriguez; Eilbeck, Karen; Kawamoto, Kensaku

    2014-04-01

    Whole genome sequence (WGS) information may soon be widely available to help clinicians personalize the care and treatment of patients. However, considerable barriers exist, which may hinder the effective utilization of WGS information in a routine clinical care setting. Clinical decision support (CDS) offers a potential solution to overcome such barriers and to facilitate the effective use of WGS information in the clinic. However, genomic information is complex and will require significant considerations when developing CDS capabilities. As such, this manuscript lays out a conceptual framework for a CDS architecture designed to deliver WGS-guided CDS within the clinical workflow. To handle the complexity and breadth of WGS information, the proposed CDS framework leverages service-oriented capabilities and orchestrates the interaction of several independently-managed components. These independently-managed components include the genome variant knowledge base, the genome database, the CDS knowledge base, a CDS controller and the electronic health record (EHR). A key design feature is that genome data can be stored separately from the EHR. This paper describes in detail: (1) each component of the architecture; (2) the interaction of the components; and (3) how the architecture attempts to overcome the challenges associated with WGS information. We believe that service-oriented CDS capabilities will be essential to using WGS information for personalized medicine. PMID:25411644

  10. Sample Prep, Workflow Automation and Nucleic Acid Fractionation for Next Generation Sequencing

    SciTech Connect

    Roskey, Mark

    2010-06-03

    Mark Roskey of Caliper LifeSciences discusses how the company's technologies fit into the next generation sequencing workflow on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  11. Evolution of vertebrate IgM: complete amino acid sequence of the constant region of Ambystoma mexicanum mu chain deduced from cDNA sequence.

    PubMed

    Fellah, J S; Wiles, M V; Charlemagne, J; Schwager, J

    1992-10-01

    cDNA clones coding for the constant region of the Mexican axolotl (Ambystoma mexicanum) mu heavy immunoglobulin chain were selected from total spleen RNA, using a cDNA polymerase chain reaction technique. The specific 5'-end primer was an oligonucleotide homologous to the JH segment of Xenopus laevis mu chain. One of the clones, JHA/3, corresponded to the complete constant region of the axolotl mu chain, consisting of a 1362-nucleotide sequence coding for a polypeptide of 454 amino acids followed in 3' direction by a 179-nucleotide untranslated region and a polyA+ tail. The axolotl C mu is divided into four typical domains (C mu 1-C mu 4) and can be aligned with the Xenopus C mu with an overall identity of 56% at the nucleotide level. Percent identities were particularly high between C mu 1 (59%) and C mu 4 (71%). The C-terminal 20-amino acid segment which constitutes the secretory part of the mu chain is strongly homologous to the equivalent sequences of chondrichthyans and of other tetrapods, including a conserved N-linked oligosaccharide, the penultimate cysteine and the C-terminal lysine. The four C mu domains of 13 vertebrate species ranging from chondrichthyans to mammals were aligned and compared at the amino acid level. The significant number of mu-specific residues which are conserved into each of the four C mu domains argues for a continuous line of evolution of the vertebrate mu chain. This notion was confirmed by the ability to reconstitute a consistent vertebrate evolution tree based on the phylogenic parsimony analysis of the C mu 4 sequences. PMID:1382992

  12. Using the concept of Chou's pseudo amino acid composition to predict protein solubility: an approach with entropies in information theory.

    PubMed

    Xiaohui, Niu; Nana, Li; Jingbo, Xia; Dingyan, Chen; Yuehua, Peng; Yang, Xiao; Weiquan, Wei; Dongming, Wang; Zengzhen, Wang

    2013-09-01

    Protein solubility plays a major role and has strong implication in the proteomics. Predicting the propensity of a protein to be soluble or to form inclusion body is a fundamental and not fairly resolved problem. In order to predict the protein solubility, almost 10,000 protein sequences were downloaded from NCBI. Then the sequences were eliminated for the high homologous similarity by CD-HIT. Thus, there were 5692 sequences remained. Based on protein sequences, amino acid and dipeptide compositions were generally extracted to predict protein solubility. In this study, the entropy in information theory was introduced as another predictive factor in the model. Experiments involving nine different feature vector combinations, including the above-mentioned three kinds of factors, were conducted with support vector machines (SVMs) as prediction engine. Each combination was evaluated by re-substitution test and 10-fold cross-validation test. According to the evaluation results, the accuracies and Matthew's Correlation Coefficient (MCC) values were boosted by the introduction of the entropy. The best combination was the one with amino acid, dipeptide compositions and their entropies. Its accuracy reached 90.34% and Matthew's Correlation Coefficient (MCC) value was 0.7494 in re-substitution test, while 88.12% and 0.7945 respectively for 10-fold cross-validation. In conclusion, the introduction of the entropy significantly improved the performance of the predictive method. PMID:23524162

  13. A novel predictor for protein structural class based on integrated information of the secondary structure sequence.

    PubMed

    Zhang, Lichao; Zhao, Xiqiang; Kong, Liang; Liu, Shuxia

    2014-08-01

    The structural class has become one of the most important features for characterizing the overall folding type of a protein and played important roles in many aspects of protein research. At present, it is still a challenging problem to accurately predict protein structural class for low-similarity sequences. In this study, an 18-dimensional integrated feature vector is proposed by fusing the information about content and position of the predicted secondary structure elements. The consistently high accuracies of jackknife and 10-fold cross-validation tests on different low-similarity benchmark datasets show that the proposed method is reliable and stable. Comparison of our results with other methods demonstrates that our method is an effective computational tool for protein structural class prediction, especially for low-similarity sequences. PMID:24859536

  14. Slider—maximum use of probability information for alignment of short sequence reads and SNP detection

    PubMed Central

    Malhis, Nawar; Butterfield, Yaron S. N.; Ester, Martin; Jones, Steven J. M.

    2009-01-01

    Motivation: A plethora of alignment tools have been created that are designed to best fit different types of alignment conditions. While some of these are made for aligning Illumina Sequence Analyzer reads, none of these are fully utilizing its probability (prb) output. In this article, we will introduce a new alignment approach (Slider) that reduces the alignment problem space by utilizing each read base's probabilities given in the prb files. Results: Compared with other aligners, Slider has higher alignment accuracy and efficiency. In addition, given that Slider matches bases with probabilities other than the most probable, it significantly reduces the percentage of base mismatches. The result is that its SNP predictions are more accurate than other SNP prediction approaches used today that start from the most probable sequence, including those using base quality. Contact: nmalhis@bcgsc.ca Supplementary information and availability: http://www.bcgsc.ca/platform/bioinfo/software/slider PMID:18974170

  15. Low levels of haptoglobin and putative amino acid sequence in Taiwanese Lanyu miniature pigs.

    PubMed

    Yueh, Sunny C H; Wang, Yao Horng; Lin, Kuan Yu; Tseng, Chi Feng; Chu, Hsien Pin; Chen, Kuen Jaw; Wang, Shih Sheng; Lai, I Hsiang; Mao, Simon J T

    2008-04-01

    Porcine haptoglobin (Hp) is an acute phase protein. Its plasma level increases significantly during inflammation and infection. One of the main functions of Hp is to bind free hemoglobin (Hb) and inhibit its oxidative activity. In the present report, we studied the Hp phenotype of Taiwanese Lanyu miniature pigs (TLY minipigs; n=43) and found their Hp structure to be a homodimer (beta-alpha-alpha-beta) similar to human Hp 1-1. Interestingly, Western blot and high performance liquid chromatographic (HPLC) analysis showed that 25% of the TLY minipigs possessed low or no plasma Hp level (<0.05 mg/ml). The Hp cDNA of these TLY minipigs was then cloned, and the translated amino acid sequence was analyzed. No sequences were found to be deficient; they showed a 99.7% identity with domestic pigs (NP_999165). The mean overall Hp level of the TLY minipigs (0.21 +/- 0.25 mg/ml; n=43) determined by enzyme-linked immunosorbent assay (ELISA) was markedly lower than that of domestic pigs (0.78 +/- 0.45 mg/ml; p<0.001), while 25% of the TLY minipigs had an Hp level that was extremely low (<0.05 mg/ml). In addition, the initial recovery rate (first 40 min) in the circulation of infused fluorescein isothiocyanate (FITC)-Hb was significantly higher in the TLY minipigs with extremely low Hp levels than those with high levels. This data suggests that the low concentration of Hp-Hb complex is responsible for the higher recovery rate of Hb in the circulation. TLY minipigs have been used as an experimental model for cardiovascular diseases; whether they can be used as a model for inflammatory diseases, with Hp as a marker, remains a topic of interest. However, since the Hp level varies significantly among individual TLY minipigs, it is necessary to prescreen the Hp levels of the animals to minimize variation in the experimental baseline. The present study may provide a reference value for future use of the TLY minipig as an animal model for inflammation-associated diseases. PMID:18460833

  16. Sequence Comparison and Phylogeny of Nucleotide Sequence of Coat Protein and Nucleic Acid Binding Protein of a Distinct Isolate of Shallot virus X from India.

    PubMed

    Majumder, S; Baranwal, V K

    2011-06-01

    Shallot virus X (ShVX), a type species in the genus Allexivirus of the family Alfaflexiviridae has been associated with shallot plants in India and other shallot growing countries like Russia, Germany, Netherland, and New Zealand. Coat protein (CP) and nucleic acid binding protein (NB) region of the virus was obtained by reverse transcriptase polymerase chain reaction from scales leaves of shallot bulbs. The partial cDNA contained two open reading frames encoding proteins of molecular weights of 28.66 and 14.18 kDa belonging to Flexi_CP super-family and viral NB super-family, respectively. The percent identity and phylogenetic analysis of amino acid sequences of CP and NB region of the virus associated with shallot indicated that it was a distinct isolate of ShVX. PMID:23637504

  17. MetaPathways: a modular pipeline for constructing pathway/genome databases from environmental sequence information

    PubMed Central

    2013-01-01

    Background A central challenge to understanding the ecological and biogeochemical roles of microorganisms in natural and human engineered ecosystems is the reconstruction of metabolic interaction networks from environmental sequence information. The dominant paradigm in metabolic reconstruction is to assign functional annotations using BLAST. Functional annotations are then projected onto symbolic representations of metabolism in the form of KEGG pathways or SEED subsystems. Results Here we present MetaPathways, an open source pipeline for pathway inference that uses the PathoLogic algorithm to map functional annotations onto the MetaCyc collection of reactions and pathways, and construct environmental Pathway/Genome Databases (ePGDBs) compatible with the editing and navigation features of Pathway Tools. The pipeline accepts assembled or unassembled nucleotide sequences, performs quality assessment and control, predicts and annotates noncoding genes and open reading frames, and produces inputs to PathoLogic. In addition to constructing ePGDBs, MetaPathways uses MLTreeMap to build phylogenetic trees for selected taxonomic anchor and functional gene markers, converts General Feature Format (GFF) files into concatenated GenBank files for ePGDB construction based on third-party annotations, and generates useful file formats including Sequin files for direct GenBank submission and gene feature tables summarizing annotations, MLTreeMap trees, and ePGDB pathway coverage summaries for statistical comparisons. Conclusions MetaPathways provides users with a modular annotation and analysis pipeline for predicting metabolic interaction networks from environmental sequence information using an alternative to KEGG pathways and SEED subsystems mapping. It is extensible to genomic and transcriptomic datasets from a wide range of sequencing platforms, and generates useful data products for microbial community structure and function analysis. The MetaPathways software package

  18. Amino acid sequence of mouse nidogen, a multidomain basement membrane protein with binding activity for laminin, collagen IV and cells.

    PubMed Central

    Mann, K; Deutzmann, R; Aumailley, M; Timpl, R; Raimondi, L; Yamada, Y; Pan, T C; Conway, D; Chu, M L

    1989-01-01

    The whole amino acid sequence of nidogen was deduced from cDNA clones isolated from expression libraries and confirmed to approximately 50% by Edman degradation of peptides. The protein consists of some 1217 amino acid residues and a 28-residue signal peptide. The data support a previously proposed dumb-bell model of nidogen by demonstrating a large N-terminal globular domain (641 residues), five EGF-like repeats constituting the rod-like domain (248 residues) and a smaller C-terminal globule (328 residues). Two more EGF-like repeats interrupt the N-terminal and terminate the C-terminal sequences. Weak sequence homologies (25%) were detected between some regions of nidogen, the LDL receptor, thyroglobulin and the EGF precursor. Nidogen contains two consensus sequences for tyrosine sulfation and for asparagine beta-hydroxylation, two N-linked carbohydrate acceptor sites and, within one of the EGF-like repeats an Arg-Gly-Asp sequence. The latter was shown to be functional in cell attachment to nidogen. Binding sites for laminin and collagen IV are present on the C-terminal globule but not yet precisely localized. Images PMID:2496973

  19. Jack bean α-mannosidase: amino acid sequencing and N-glycosylation analysis of a valuable glycomics tool.

    PubMed

    Gnanesh Kumar, B S; Pohlentz, Gottfried; Schulte, Mona; Mormann, Michael; Siva Kumar, Nadimpalli

    2014-03-01

    Jack bean (Canavalia ensiformis) seeds contain several biologically important proteins among which α-mannosidase (EC 3.2.1.24) has been purified, its biochemical properties studied and widely used in glycan analysis. In the present study, we have used the purified enzyme and derived its amino acid sequence covering both the known subunits (molecular mass of ∼66,000 and ∼44,000 Da) hitherto not known in its entirety. Peptide de novo sequencing and structural elucidation of N-glycopeptides obtained either directly from proteolytic digestion or after zwitterionic hydrophilic interaction liquid chromatography solid phase extraction-based separation were performed by use of nanoelectrospray ionization quadrupole time-of-flight mass spectrometry and low-energy collision-induced dissociation experiments. De novo sequencing provided new insights into the disulfide linkage organization, intersection of subunits and complete N-glycan structures along with site specificities. The primary sequence suggests that the enzyme belongs to glycosyl hydrolase family 38 and the N-glycan sequence analysis revealed high-mannose oligosaccharides, which were found to be heterogeneous with varying number of hexoses viz, Man8-9GlcNAc2 and Glc1Man9GlcNAc2 in an evolutionarily conserved N-glycosylation site. This site with two proximal cysteines is present in all the acidic α-mannosidases reported so far in eukaryotes. Further, a truncated paucimannose type was identified to be lacking terminal two mannose, Man1(Xyl)GlcNAc2 (Fuc). PMID:24295789

  20. Complete Genome Sequence of Enterococcus mundtii QU 25, an Efficient l-(+)-Lactic Acid-Producing Bacterium

    PubMed Central

    Shiwa, Yuh; Yanase, Hiroaki; Hirose, Yuu; Satomi, Shohei; Araya-Kojima, Tomoko; Watanabe, Satoru; Zendo, Takeshi; Chibazakura, Taku; Shimizu-Kadota, Mariko; Yoshikawa, Hirofumi; Sonomoto, Kenji

    2014-01-01

    Enterococcus mundtii QU 25, a non-dairy bacterial strain of ovine faecal origin, can ferment both cellobiose and xylose to produce l-lactic acid. The use of this strain is highly desirable for economical l-lactate production from renewable biomass substrates. Genome sequence determination is necessary for the genetic improvement of this strain. We report the complete genome sequence of strain QU 25, primarily determined using Pacific Biosciences sequencing technology. The E. mundtii QU 25 genome comprises a 3 022 186-bp single circular chromosome (GC content, 38.6%) and five circular plasmids: pQY182, pQY082, pQY039, pQY024, and pQY003. In all, 2900 protein-coding sequences, 63 tRNA genes, and 6 rRNA operons were predicted in the QU 25 chromosome. Plasmid pQY024 harbours genes for mundticin production. We found that strain QU 25 produces a bacteriocin, suggesting that mundticin-encoded genes on plasmid pQY024 were functional. For lactic acid fermentation, two gene clusters were identified—one involved in the initial metabolism of xylose and uptake of pentose and the second containing genes for the pentose phosphate pathway and uptake of related sugars. This is the first complete genome sequence of an E. mundtii strain. The data provide insights into lactate production in this bacterium and its evolution among enterococci. PMID:24568933

  1. Gastropod arginine kinases from Cellana grata and Aplysia kurodai. Isolation and cDNA-derived amino acid sequences.

    PubMed

    Suzuki, T; Inoue, N; Higashi, T; Mizobuchi, R; Sugimura, N; Yokouchi, K; Furukohri, T

    2000-12-01

    Arginine kinase (AK) was isolated from the radular muscle of the gastropod molluscs Cellana grata (subclass Prosobranchia) and Aplysia kurodai (subclass Opisthobranchia), respectively, by ammonium sulfate fractionation, Sephadex G-75 gel filtration and DEAE-ion exchange chromatography. The denatured relative molecular mass values were estimated to be 40 kDa by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. The isolated enzyme from Aplysia gave a Km value of 0.6 mM for arginine and a Vmax value of 13 micromole Pi min(-1) mg protein(-1) for the forward reaction. These values are comparable to other molluscan AKs. The cDNAs encoding Cellana and Aplysia AKs were amplified by polymerase chain reaction, and the nucleotide sequences of 1,608 and 1,239 bp, respectively, were determined. The open reading frame for Cellana AK is 1044 nucleotides in length and encodes a protein with 347 amino acid residues, and that for A. kurodai is 1077 nucleotides and 354 residues. The cDNA-derived amino acid sequences were validated by chemical sequencing of internal lysyl endopeptidase peptides. The amino acid sequences of Cellana and Aplysia AKs showed the highest percent identity (66-73%) with those of the abalone Nordotis and turbanshell Battilus belonging to the same class Gastropoda. These AK sequences still have a strong homology (63-71%) with that of the chiton Liolophura (class Polyplacophora), which is believed to be one of the most primitive molluscs. On the other hand, these AK sequences are less homologous (55-57%) with that of the clam Pseudocardium (class Bivalvia), suggesting that the biological position of the class Polyplacophora should be reconsidered. PMID:11281267

  2. Studies on the high-sulphur proteins of reduced Merino wool. Amino acid sequence of protein SCMKB-IIIB4

    PubMed Central

    Swart, L. S.; Haylett, T.

    1971-01-01

    The complete amino acid sequence of protein SCMKB-IIIB4 is presented. It is closely related to the sequence of protein SCMKB-IIIB3 (Haylett, Swart & Parris, 1971) differing in only four positions. The peptic and thermolysin peptides of protein SCMKB-IIIB4 were analysed by the dansyl–Edman method (Gray, 1967) and by tritium-labelling of C-terminal residues (Matsuo, Fujimoto & Tatsuno, 1966). This protein is the third member of a group of high-sulphur wool proteins with molecular weight of about 11400. It consists of 98 residues and has acetylalanine and carboxymethylcysteine as N- and C-terminal residues respectively. PMID:4942536

  3. The Canterbury Tales: Lessons from the Canterbury Earthquake Sequence to Inform Better Public Communication Models

    NASA Astrophysics Data System (ADS)

    McBride, S.; Tilley, E. N.; Johnston, D. M.; Becker, J.; Orchiston, C.

    2015-12-01

    This research evaluates the public education earthquake information prior to the Canterbury Earthquake sequence (2010-present), and examines communication learnings to create recommendations for improvement in implementation for these types of campaigns in future. The research comes from a practitioner perspective of someone who worked on these campaigns in Canterbury prior to the Earthquake Sequence and who also was the Public Information Manager Second in Command during the earthquake response in February 2011. Documents, specifically those addressing seismic risk, that were created prior to the earthquake sequence, were analyzed, using a "best practice matrix" created by the researcher, for how closely these aligned to best practice academic research. Readability tests and word counts are also employed to assist with triangulation of the data as was practitioner involvement. This research also outlines the lessons learned by practitioners and explores their experiences in regards to creating these materials and how they perceive these now, given all that has happened since the inception of the booklets. The findings from the research showed these documents lacked many of the attributes of best practice. The overly long, jargon filled text had little positive outcome expectancy messages. This probably would have failed to persuade anyone that earthquakes were a real threat in Canterbury. Paradoxically, it is likely these booklets may have created fatalism in publics who read the booklets. While the overall intention was positive, for scientists to explain earthquakes, tsunami, landslides and other risks to encourage the public to prepare for these events, the implementation could be greatly improved. This final component of the research highlights points of improvement for implementation for more successful campaigns in future. The importance of preparedness and science information campaigns can be not only in preparing the population but also into development of

  4. DNA Sequence and Expression Variation of Hop (Humulus lupulus) Valerophenone Synthase (VPS), a Key Gene in Bitter Acid Biosynthesis

    PubMed Central

    Castro, Consuelo B.; Whittock, Lucy D.; Whittock, Simon P.; Leggett, Grey; Koutoulis, Anthony

    2008-01-01

    Background The hop plant (Humulus lupulus) is a source of many secondary metabolites, with bitter acids essential in the beer brewing industry and others having potential applications for human health. This study investigated variation in DNA sequence and gene expression of valerophenone synthase (VPS), a key gene in the bitter acid biosynthesis pathway of hop. Methods Sequence variation was studied in 12 varieties, and expression was analysed in four of the 12 varieties in a series across the development of the hop cone. Results Nine single nucleotide polymorphisms (SNPs) were detected in VPS, seven of which were synonymous. The two non-synonymous polymorphisms did not appear to be related to typical bitter acid profiles of the varieties studied. However, real-time quantitative reverse-transcription polymerase chain reaction (qRT-PCR) analysis of VPS expression during hop cone development showed a clear link with the bitter acid content. The highest levels of VPS expression were observed in two triploid varieties, ‘Symphony’ and ‘Ember’, which typically have high bitter acid levels. Conclusions In all hop varieties studied, VPS expression was lowest in the leaves and an increase in expression was consistently observed during the early stages of cone development. PMID:18519445

  5. The amino acid sequence of protein SCMK-B2A from the high-sulphur fraction of wool keratin

    PubMed Central

    Elleman, T. C.

    1972-01-01

    1. The amino acid sequence of protein SCMK-B2A, a reduced and S-carboxymethylated protein from the high-sulphur fraction of wool, has been determined. 2. This protein of 171 amino acid residues displays both a high degree of internal homology and extensive external homology with other members of the SCMK-B2 group of proteins. 3. Evidence is presented which suggests that the SCMK-B2 group of proteins are produced by separate non-allelic genes. ImagesPLATE 1 PMID:4679226

  6. Whole genome sequencing of elite rice cultivars as a comprehensive information resource for marker assisted selection.

    PubMed

    Duitama, Jorge; Silva, Alexander; Sanabria, Yamid; Cruz, Daniel Felipe; Quintero, Constanza; Ballen, Carolina; Lorieux, Mathias; Scheffler, Brian; Farmer, Andrew; Torres, Edgar; Oard, James; Tohme, Joe

    2015-01-01

    Current advances in sequencing technologies and bioinformatics revealed the genomic background of rice, a staple food for the poor people, and provided the basis to develop large genomic variation databases for thousands of cultivars. Proper analysis of this massive resource is expected to give novel insights into the structure, function, and evolution of the rice genome, and to aid the development of rice varieties through marker assisted selection or genomic selection. In this work we present sequencing and bioinformatics analyses of 104 rice varieties belonging to the major subspecies of Oryza sativa. We identified repetitive elements and recurrent copy number variation covering about 200 Mbp of the rice genome. Genotyping of over 18 million polymorphic locations within O. sativa allowed us to reconstruct the individual haplotype patterns shaping the genomic background of elite varieties used by farmers throughout the Americas. Based on a reconstruction of the alleles for the gene GBSSI, we could identify novel genetic markers for selection of varieties with high amylose content. We expect that both the analysis methods and the genomic information described here would be of great use for the rice research community and for other groups carrying on similar sequencing efforts in other crops. PMID:25923345

  7. Whole Genome Sequencing of Elite Rice Cultivars as a Comprehensive Information Resource for Marker Assisted Selection

    PubMed Central

    Duitama, Jorge; Silva, Alexander; Sanabria, Yamid; Cruz, Daniel Felipe; Quintero, Constanza; Ballen, Carolina; Lorieux, Mathias; Scheffler, Brian; Farmer, Andrew; Torres, Edgar; Oard, James; Tohme, Joe

    2015-01-01

    Current advances in sequencing technologies and bioinformatics revealed the genomic background of rice, a staple food for the poor people, and provided the basis to develop large genomic variation databases for thousands of cultivars. Proper analysis of this massive resource is expected to give novel insights into the structure, function, and evolution of the rice genome, and to aid the development of rice varieties through marker assisted selection or genomic selection. In this work we present sequencing and bioinformatics analyses of 104 rice varieties belonging to the major subspecies of Oryza sativa. We identified repetitive elements and recurrent copy number variation covering about 200 Mbp of the rice genome. Genotyping of over 18 million polymorphic locations within O. sativa allowed us to reconstruct the individual haplotype patterns shaping the genomic background of elite varieties used by farmers throughout the Americas. Based on a reconstruction of the alleles for the gene GBSSI, we could identify novel genetic markers for selection of varieties with high amylose content. We expect that both the analysis methods and the genomic information described here would be of great use for the rice research community and for other groups carrying on similar sequencing efforts in other crops. PMID:25923345

  8. High-affinity homologous peptide nucleic acid probes for targeting a quadruplex-forming sequence from a MYC promoter element.

    PubMed

    Roy, Subhadeep; Tanious, Farial A; Wilson, W David; Ly, Danith H; Armitage, Bruce A

    2007-09-18

    Guanine-rich DNA and RNA sequences are known to fold into secondary structures known as G-quadruplexes. Recent biochemical evidence along with the discovery of an increasing number of sequences in functionally important regions of the genome capable of forming G-quadruplexes strongly indicates important biological roles for these structures. Thus, molecular probes that can selectively target quadruplex-forming sequences (QFSs) are envisioned as tools to delineate biological functions of quadruplexes as well as potential therapeutic agents. Guanine-rich peptide nucleic acids have been previously shown to hybridize to homologous DNA or RNA sequences forming PNA-DNA (or RNA) quadruplexes. For this paper we studied the hybridization of an eight-mer G-rich PNA to a quadruplex-forming sequence derived from the promoter region of the MYC proto-oncogene. UV melting analysis, fluorescence assays, and surface plasmon resonance experiments reveal that this PNA binds to the MYC QFS in a 2:1 stoichiometry and with an average binding constant Ka = (2.0 +/- 0.2) x 10(8) M(-1) or Kd = 5.0 nM. In addition, experiments carried out with short DNA targets revealed a dependence of the affinity on the sequence of bases in the loop region of the DNA. A structural model for the hybrid quadruplex is proposed, and implications for gene targeting by G-rich PNAs are discussed. PMID:17718513

  9. Ferredoxin:NADP oxidoreductase of Cyanophora paradoxa: purification, partial characterization, and N-terminal amino acid sequence.

    PubMed

    Gebhart, U B; Maier, T L; Stevanović, S; Bayer, M G; Schenk, H E

    1992-06-01

    The ferredoxin:NADP+ oxidoreductase of the protist Cyanophora paradoxa, as a descendant of a former symbiotic consortium, an important model organism in view of the Endosymbiosis Theory, is the first enzyme purified from a formerly original endocytobiont (cyanelle) that is found to be encoded in the nucleus of the host. This cyanoplast enzyme was isolated by FPLC (19% yield) and characterized with respect to the uv-vis spectrum, pH optimum (pH 9), molecular mass of 34 kDa, and an N-terminal amino acid sequence (24 residues). The enzyme shows, as known from other organisms, molecular heterogeneity. The N-terminus of a further ferredoxin:NADP+ oxidoreductase polypeptide represents a shorter sequence missing the first four amino acids of the mature enzyme. PMID:1392619

  10. Purification, characterization, and amino acid sequencing of a. delta. /sup 5/-3-oxosteroid isomerase from Pseudomonas putida biotype B

    SciTech Connect

    Linden, K.G.

    1986-01-01

    Studies were performed on the ..delta../sup 5/-3-oxosteroid isomerase from Pseudomonas putida biotype B. The studies have involved three broad areas: improvement in the purification of the enzyme, further characterization of the purified enzyme, and completion of the amino acid sequence of the enzyme. For the purification of the enzyme, techniques for removing the isomerase from whole cells were studied, the effects of ionic strength on the binding of the isomerase to steroidal affinity resins was explored, and a new affinity resin was developed. Absorption spectra and the proton NMR spectra of the isomerase were obtained. Amino acid sequencing of the oxosteroid isomerase indicates that the enzyme is a dimeric protein consisting of two identical subunits each consisting of a polypeptide chain of 131 residues and a M/sub r/ = 14,536.

  11. Identification of novel rice low phytic acid mutations via TILLING by sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Phytic acid (myo-inositol-1,2,3,4,5,6-hexakisphosphate or InsP6) accounts for 75-85% of the total phosphorus in seeds. Low phytic acid (lpa) mutants exhibit decreases in seed InsP6 with corresponding increases in inorganic P which, unlike phytic acid P, is readily utilized by humans and monogastric ...

  12. Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources

    PubMed Central

    Mizianty, Marcin J.; Stach, Wojciech; Chen, Ke; Kedarisetti, Kanaka Durga; Disfani, Fatemeh Miri; Kurgan, Lukasz

    2010-01-01

    Motivation: Intrinsically disordered proteins play a crucial role in numerous regulatory processes. Their abundance and ubiquity combined with a relatively low quantity of their annotations motivate research toward the development of computational models that predict disordered regions from protein sequences. Although the prediction quality of these methods continues to rise, novel and improved predictors are urgently needed. Results: We propose a novel method, named MFDp (Multilayered Fusion-based Disorder predictor), that aims to improve over the current disorder predictors. MFDp is as an ensemble of 3 Support Vector Machines specialized for the prediction of short, long and generic disordered regions. It combines three complementary disorder predictors, sequence, sequence profiles, predicted secondary structure, solvent accessibility, backbone dihedral torsion angles, residue flexibility and B-factors. Our method utilizes a custom-designed set of features that are based on raw predictions and aggregated raw values and recognizes various types of disorder. The MFDp is compared at the residue level on two datasets against eight recent disorder predictors and top-performing methods from the most recent CASP8 experiment. In spite of using training chains with ≤25% similarity to the test sequences, our method consistently and significantly outperforms the other methods based on the MCC index. The MFDp outperforms modern disorder predictors for the binary disorder assignment and provides competitive real-valued predictions. The MFDp's outputs are also shown to outperform the other methods in the identification of proteins with long disordered regions. Availability: http://biomine.ece.ualberta.ca/MFDp.html Supplementary information: Supplementary data are available at Bioinformatics online. Contact: lkurgan@ece.ualberta.ca PMID:20823312

  13. Snake venoms. The amino-acid sequence of trypsin inhibitor E of Dendroaspis polylepis polylepis (Black Mamba) venom.

    PubMed

    Joubert, F J; Strydom, D J

    1978-06-01

    Trypsin inhibitor E from black mamba venom comprises 59 amino acid residues in a single polypeptide chain, cross-linked by three intrachain disulphide bridges. The complete primary structure of inhibitor E was elucidated. The sequence is homologous with trypsin inhibitors from different sources. Unique among this homologous series of proteinase inhibitors, inhibitor E has an affinity for transition metal ions, exemplified here by Cu2 and Co2+. PMID:668688

  14. Draft Genome Sequence of Escherichia coli Strain VKPM B-10182, Producing the Enzyme for Synthesis of Cephalosporin Acids

    PubMed Central

    Mardanov, Andrey V.; Eldarov, Mikhail A.; Sklyarenko, Anna V.; Dumina, Maria V.; Beletsky, Alexey V.; Yarotsky, Sergey V.

    2014-01-01

    Escherichia coli strain VKPM B-10182, obtained by chemical mutagenesis from E. coli strain ATCC 9637, produces cephalosporin acid synthetase employed in the synthesis of β-lactam antibiotics, such as cefazolin. The draft genome sequence of strain VKPM B-10182 revealed 32 indels and 1,780 point mutations that might account for the improvement in antibiotic synthesis that we observed. PMID:25414512

  15. The Cucurbitaceae of India: Accepted names, synonyms, geographic distribution, and information on images and DNA sequences

    PubMed Central

    Renner, Susanne S.; Pandey, Arun K.

    2013-01-01

    Abstract The most recent critical checklists of the Cucurbitaceae of India are 30 years old. Since then, botanical exploration, online availability of specimen images and taxonomic literature, and molecular-phylogenetic studies have led to modified taxon boundaries and geographic ranges. We present a checklist of the Cucurbitaceae of India that treats 400 relevant names and provides information on the collecting locations and herbaria for all types. We accept 94 species (10 of them endemic) in 31 genera. For accepted species, we provide their geographic distribution inside and outside India, links to online images of herbarium or living specimens, and information on publicly available DNA sequences to highlight gaps in the current understanding of Indian cucurbit diversity. Of the 94 species, 79% have DNA sequences in GenBank, albeit rarely from Indian material. The most species-rich genera are Trichosanthes with 22 species, Cucumis with 11 (all but two wild), Momordica with 8, and Zehneria with 5. From an evolutionary point of view, India is of special interest because it harbors a wide range of lineages, many of them relatively old and phylogenetically isolated. Phytogeographically, the north eastern and peninsular regions are richest in species, while the Jammu Kashmir and Himachal regions have few Cucurbitaceae. Our checklist probably underestimates the true diversity of Indian Cucurbitaceae, but should help focus efforts towards the least known species and regions. PMID:23717193

  16. Accessible surface area of proteins from purely sequence information and the importance of global features

    NASA Astrophysics Data System (ADS)

    Faraggi, Eshel; Zhou, Yaoqi; Kloczkowski, Andrzej

    2014-03-01

    We present a new approach for predicting the accessible surface area of proteins. The novelty of this approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Rather, sequential window information and the global monomer and dimer compositions of the chain are used. We find that much of the lost accuracy due to the elimination of evolutionary information is recouped by the use of global features. Furthermore, this new predictor produces similar results for proteins with or without sequence homologs deposited in the Protein Data Bank, and hence shows generalizability. Finally, these predictions are obtained in a small fraction (1/1000) of the time required to run mutation profile based prediction. All these factors indicate the possible usability of this work in de-novo protein structure prediction and in de-novo protein design using iterative searches. Funded in part by the financial support of the National Institutes of Health through Grants R01GM072014 and R01GM073095, and the National Science Foundation through Grant NSF MCB 1071785.

  17. First draft genome sequencing of indole acetic acid producing and plant growth promoting fungus Preussia sp. BSL10.

    PubMed

    Khan, Abdul Latif; Asaf, Sajjad; Khan, Abdur Rahim; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung

    2016-05-10

    Preussia sp. BSL10, family Sporormiaceae, was actively producing phytohormone (indole-3-acetic acid) and extra-cellular enzymes (phosphatases and glucosidases). The fungus was also promoting the growth of arid-land tree-Boswellia sacra. Looking at such prospects of this fungus, we sequenced its draft genome for the first time. The Illumina based sequence analysis reveals an approximate genome size of 31.4Mbp for Preussia sp. BSL10. Based on ab initio gene prediction, total 32,312 coding sequences were annotated consisting of 11,967 coding genes, pseudogenes, and 221 tRNA genes. Furthermore, 321 carbohydrate-active enzymes were predicted and classified into many functional families. PMID:26995610

  18. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

    PubMed Central

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

    2015-01-01

    Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745

  19. A multilevel ant colony optimization algorithm for classical and isothermic DNA sequencing by hybridization with multiplicity information available.

    PubMed

    Kwarciak, Kamil; Radom, Marcin; Formanowicz, Piotr

    2016-04-01

    The classical sequencing by hybridization takes into account a binary information about sequence composition. A given element from an oligonucleotide library is or is not a part of the target sequence. However, the DNA chip technology has been developed and it enables to receive a partial information about multiplicity of each oligonucleotide the analyzed sequence consist of. Currently, it is not possible to assess the exact data of such type but even partial information should be very useful. Two realistic multiplicity information models are taken into consideration in this paper. The first one, called "one and many" assumes that it is possible to obtain information if a given oligonucleotide occurs in a reconstructed sequence once or more than once. According to the second model, called "one, two and many", one is able to receive from biochemical experiment information if a given oligonucleotide is present in an analyzed sequence once, twice or at least three times. An ant colony optimization algorithm has been implemented to verify the above models and to compare with existing algorithms for sequencing by hybridization which utilize the additional information. The proposed algorithm solves the problem with any kind of hybridization errors. Computational experiment results confirm that using even the partial information about multiplicity leads to increased quality of reconstructed sequences. Moreover, they also show that the more precise model enables to obtain better solutions and the ant colony optimization algorithm outperforms the existing ones. Test data sets and the proposed ant colony optimization algorithm are available on: http://bioserver.cs.put.poznan.pl/download/ACO4mSBH.zip. PMID:26878124

  20. A novel T-cell-defined HLA-DR polymorphism not predicted from the linear amino acid sequence.

    PubMed

    Termijtelen, A; van den Elsen, P; Koning, F; de Koster, S; Schroeijers, W; Vanderkerckhove, B

    1989-09-01

    Recent investigations have shown that alloreactive T cells are capable of responding to structures defined by specific linear amino acid sequences on class II molecules. In the present study we show that also a polymorphism can be recognized that is not defined by such linear amino acid sequences. Two human T-cell clones, sensitized to DRw13 haplotypes, are described. The description of clone c50 serves to exemplify the first model. This DRB1-specific clone responds to stimulator cells that carry DR molecules, different in their DRB1 first and second hypervariable regions (HV1 and HV2) but identical in their HV3 regions (i.e., DRw13,Dw18; DRw13,Dw19; DR4,Dw10; and DRw11,LDVII). The second clone, c1443, behaves nonconventionally. It responds to DRw13,Dw18; DRw13,Dw19; and DR4,Dw4 stimulator cells, although no specific amino acid sequence is shared between these specificities. The latter pattern of reactivity suggests the existence of a novel polymorphism recognized by alloreactive T cells. This particular polymorphism may also be biologically significant. PMID:2476425

  1. cDNA-derived amino-acid sequence of a land turtle (Geochelone carbonaria) beta-chain hemoglobin.

    PubMed

    Bordin, S; Meza, A N; Saad, S T; Ogo, S H; Costa, F F

    1997-06-01

    The cDNA sequence encoding the turtle Geochelone carbonaria beta-chain was determinated. The isolation of hemoglobin mRNA was based on degenerate primers' PCR in combination with 5'- and 3'-RACE protocol. The full length cDNA is 615 bp with the ATG start codon at position 53 and TGA stop codon at position 495; The AATAAA polyadenylation signal is found at position 599. The deduced polypeptyde contains 146 amino-acid residues. The predicted amino acid sequence shares 83% identity with the beta-globin of a related specie, the aquatic turtle C. p. belli. Otherwise, identity is higher when compared with chicken beta-Hb (80%) than with other reptilian orders (Squamata, 69%, and Crocodilia, 61%). Compared with human HbA, there is 67% identity, and at least three amino acid substitutions could be of some functional significance (Glu43 beta-->Ser, His116 beta-->Thr and His143 beta-->Leu). To our knowledge this represents the first cDNA sequence of a reptile globin gene described. PMID:9238523

  2. Amino acid sequence of the serine-repeat antigen (SERA) of Plasmodium falciparum determined from cloned cDNA.

    PubMed

    Bzik, D J; Li, W B; Horii, T; Inselburg, J

    1988-09-01

    We report the isolation of cDNA clones for a Plasmodium falciparum gene that encodes the complete amino acid sequence of a previously identified exported blood stage antigen. The Mr of this antigen protein had been determined by sodium dodecylsulphate-polyacrylamide gel electrophoresis analysis, by different workers, to be 113,000, 126,000, and 140,000. We show, by cDNA nucleotide sequence analysis, that this antigen gene encodes a 989 amino acid protein (111 kDa) that contains a potential signal peptide, but not a membrane anchor domain. In the FCR3 strain the serine content of the protein was 11%, of which 57% of the serine residues were localized within a 201 amino acid sequence that included 35 consecutive serine residues. The protein also contained three possible N-linked glycosylation sites and numerous possible O-linked glycosylation sites. The mRNA was abundant during late trophozoite-schizont parasite stages. We propose to identity this antigen, which had been called p126, by the acronym SERA, serine-repeat antigen, based on its complete structure. The usefulness of the cloned cDNA as a source of a possible malaria vaccine is considered in view of the previously demonstrated ability of the antigen to induce parasite-inhibitory antibodies and a protective immune response in Saimiri monkeys. PMID:2847041

  3. T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension

    PubMed Central

    Di Tommaso, Paolo; Moretti, Sebastien; Xenarios, Ioannis; Orobitg, Miquel; Montanyola, Alberto; Chang, Jia-Ming; Taly, Jean-François; Notredame, Cedric

    2011-01-01

    This article introduces a new interface for T-Coffee, a consistency-based multiple sequence alignment program. This interface provides an easy and intuitive access to the most popular functionality of the package. These include the default T-Coffee mode for protein and nucleic acid sequences, the M-Coffee mode that allows combining the output of any other aligners, and template-based modes of T-Coffee that deliver high accuracy alignments while using structural or homology derived templates. These three available template modes are Expresso for the alignment of protein with a known 3D-Structure, R-Coffee to align RNA sequences with conserved secondary structures and PSI-Coffee to accurately align distantly related sequences using homology extension. The new server benefits from recent improvements of the T-Coffee algorithm and can align up to 150 sequences as long as 10 000 residues and is available from both http://www.tcoffee.org and its main mirror http://tcoffee.crg.cat. PMID:21558174

  4. Amino acid sequences of lysozymes newly purified from invertebrates imply wide distribution of a novel class in the lysozyme family.

    PubMed

    Ito, Y; Yoshikawa, A; Hotani, T; Fukuda, S; Sugimura, K; Imoto, T

    1999-01-01

    Lysozymes were purified from three invertebrates: a marine bivalve, a marine conch, and an earthworm. The purified lysozymes all showed a similar molecular weight of 13 kDa on SDS/PAGE. Their N-terminal sequences up to the 33rd residue determined here were apparently homologous among them; in addition, they had a homology with a partial sequence of a starfish lysozyme which had been reported before. The complete sequence of the bivalve lysozyme was determined by peptide mapping and subsequent sequence analysis. This was composed of 123 amino acids including as many as 14 cysteine residues and did not show a clear homology with the known types of lysozymes. However, the homology search of this protein on the protein or nucleic acid database revealed two homologous proteins. One of them was a gene product, CELF22 A3.6 of C. elegans, which was a functionally unknown protein. The other was an isopeptidase of a medicinal leech, named destabilase. Thus, a new type of lysozyme found in at least four species across the three classes of the invertebrates demonstrates a novel class of protein/lysozyme family in invertebrates. The bivalve lysozyme, first characterized here, showed extremely high protein stability and hen lysozyme-like enzymatic features. PMID:9914527

  5. Complete Genome Sequences of Escherichia coli O157:H7 Strains SRCC 1675 and 28RC, Which Vary in Acid Resistance

    PubMed Central

    Baranzoni, Gian Marco; Reichenberger, Erin R.; Kim, Gwang-Hee; Breidt, Frederick; Kay, Kathryn; Oh, Deog-Hwan

    2016-01-01

    The level of acid resistance among Escherichia coli O157:H7 strains varies, and strains with higher resistance to acid may have a lower infectious dose. The complete genome sequences belonging to two strains of Escherichia coli O157:H7 with different levels of acid resistance are presented here. PMID:27469964

  6. Complete Genome Sequences of Escherichia coli O157:H7 Strains SRCC 1675 and 28RC, Which Vary in Acid Resistance.

    PubMed

    Baranzoni, Gian Marco; Fratamico, Pina M; Reichenberger, Erin R; Kim, Gwang-Hee; Breidt, Frederick; Kay, Kathryn; Oh, Deog-Hwan

    2016-01-01

    The level of acid resistance among Escherichia coli O157:H7 strains varies, and strains with higher resistance to acid may have a lower infectious dose. The complete genome sequences belonging to two strains of Escherichia coli O157:H7 with different levels of acid resistance are presented here. PMID:27469964

  7. Complete genome sequences of Escherichia coli O157:H7 strains SRCC 1675 and 28RC that vary in acid resistance

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The level of acid resistance among Escherichia coli O157:H7 strains varies, and strains with higher resistance to acid may have a lower infectious dose. The complete genome sequences belonging to two strains of Escherichia coli O157:H7 with different levels of acid resistance are presented....

  8. A Scaffold Analysis Tool Using Mate-Pair Information in Genome Sequencing

    PubMed Central

    Kim, Pan-Gyu; Cho, Hwan-Gue; Park, Kiejung

    2008-01-01

    We have developed a Windows-based program, ConPath, as a scaffold analyzer. ConPath constructs scaffolds by ordering and orienting separate sequence contigs by exploiting the mate-pair information between contig-pairs. Our algorithm builds directed graphs from link information and traverses them to find the longest acyclic graphs. Using end read pairs of fixed-sized mate-pair libraries, ConPath determines relative orientations of all contigs, estimates the gap size of each adjacent contig pair, and reports wrong assembly information by validating orientations and gap sizes. We have utilized ConPath in more than 10 microbial genome projects, including Mannheimia succiniciproducens and Vibro vulnificus, where we verified contig assembly and identified several erroneous contigs using the four types of error defined in ConPath. Also, ConPath supports some convenient features and viewers that permit investigation of each contig in detail; these include contig viewer, scaffold viewer, edge information list, mate-pair list, and the printing of complex scaffold structures. PMID:18414585

  9. iCataly-PseAAC: Identification of Enzymes Catalytic Sites Using Sequence Evolution Information with Grey Model GM (2,1).

    PubMed

    Xiao, Xuan; Hui, Meng-Juan; Liu, Zi; Qiu, Wang-Ren

    2015-12-01

    Enzymes play pivotal roles in most of the biological reaction. The catalytic residues of an enzyme are defined as the amino acids which are directly involved in chemical catalysis; the knowledge of these residues is important for understanding enzyme function. Given an enzyme, which residues are the catalytic sites, and which residues are not? This is the first important problem for in-depth understanding the catalytic mechanism and drug development. With the explosive of protein sequences generated during the post-genomic era, it is highly desirable for both basic research and drug design to develop fast and reliable method for identifying the catalytic sites of enzymes according to their sequences. To address this problem, we proposed a new predictor, called iCataly-PseAAC. In the prediction system, the peptide sample was formulated with sequence evolution information via grey system model GM(2,1). It was observed by the rigorous jackknife test and independent dataset test that iCataly-PseAAC was superior to exist predictions though its only use sequence information. As a user-friendly web server, iCataly-PseAAC is freely accessible at http://www.jci-bioinfo.cn/iCataly-PseAAC. A step-by-step guide has been provided on how to use the web server to get the desired results for the convenience of most experimental scientists. PMID:26077845

  10. Fad7 gene identification and fatty acids phenotypic variation in an olive collection by EcoTILLING and sequencing approaches.

    PubMed

    Sabetta, Wilma; Blanco, Antonio; Zelasco, Samanta; Lombardo, Luca; Perri, Enzo; Mangini, Giacomo; Montemurro, Cinzia

    2013-08-01

    The ω-3 fatty acid desaturases (FADs) are enzymes responsible for catalyzing the conversion of linoleic acid to α-linolenic acid localized in the plastid or in the endoplasmic reticulum. In this research we report the genotypic and phenotypic variation of Italian Olea europaea L. germoplasm for the fatty acid composition. The phenotypic oil characterization was followed by the molecular analysis of the plastidial-type ω-3 FAD gene (fad7) (EC 1.14.19), whose full-length sequence has been here identified in cultivar Leccino. The gene consisted of 2635 bp with 8 exons and 5'- and 3'-UTRs of 336 and 282 bp respectively, and showed a high level of heterozygousity (1/110 bp). The natural allelic variation was investigated both by a LiCOR EcoTILLING assay and the PCR product direct sequencing. Only three haplotypes were identified among the 96 analysed cultivars, highlighting the strong degree of conservation of this gene. PMID:23685785

  11. Sequence-independent and reversible photocontrol of transcription/expression systems using a photosensitive nucleic acid binder

    PubMed Central

    Estévez-Torres, André; Crozatier, Cécile; Diguet, Antoine; Hara, Tomoaki; Saito, Hirohide; Yoshikawa, Kenichi; Baigl, Damien

    2009-01-01

    To understand non-trivial biological functions, it is crucial to develop minimal synthetic models that capture their basic features. Here, we demonstrate a sequence-independent, reversible control of transcription and gene expression using a photosensitive nucleic acid binder (pNAB). By introducing a pNAB whose affinity for nucleic acids is tuned by light, in vitro RNA production, EGFP translation, and GFP expression (a set of reactions including both transcription and translation) were successfully inhibited in the dark and recovered after a short illumination at 365 nm. Our results indicate that the accessibility of the protein machinery to one or several nucleic acid binding sites can be efficiently regulated by changing the conformational/condensation state of the nucleic acid (DNA conformation or mRNA aggregation), thus regulating gene activity in an efficient, reversible, and sequence-independent manner. The possibility offered by our approach to use light to trigger various gene expression systems in a system-independent way opens interesting perspectives to study gene expression dynamics as well as to develop photocontrolled biotechnological procedures. PMID:19617550

  12. BEAUTY: an enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results.

    PubMed

    Worley, K C; Wiese, B A; Smith, R F

    1995-09-01

    BEAUTY (BLAST enhanced alignment utility) is an enhanced version of the NCBI's BLAST data base search tool that facilitates identification of the functions of matched sequences. We have created new data bases of conserved regions and functional domains for protein sequences in NCBI's Entrez data base, and BEAUTY allows this information to be incorporated directly into BLAST search results. A Conserved Regions Data Base, containing the locations of conserved regions within Entrez protein sequences, was constructed by (1) clustering the entire data base into families, (2) aligning each family using our PIMA multiple sequence alignment program, and (3) scanning the multiple alignments to locate the conserved regions within each aligned sequence. A separate Annotated Domains Data Base was constructed by extracting the locations of all annotated domains and sites from sequences represented in the Entrez, PROSITE, BLOCKS, and PRINTS data bases. BEAUTY performs a BLAST search of those Entrez sequences with conserved regions and/or annotated domains. BEAUTY then uses the information from the Conserved Regions and Annotated Domains data bases to generate, for each matched sequence, a schematic display that allows one to directly compare the relative locations of (1) the conserved regions, (2) annotated domains and sites, and (3) the locally aligned regions matched in the BLAST search. In addition, BEAUTY search results include World-Wide Web hypertext links to a number of external data bases that provide a variety of additional types of information on the function of matched sequences. This convenient integration of protein families, conserved regions, annotated domains, alignment displays, and World-Wide Web resources greatly enhances the biological informativeness of sequence similarity searches. BEAUTY searches can be performed remotely on our system using the "BCM Search Launcher" World-Wide Web pages (URL is < http:/ /gc.bcm.tmc.edu:8088/ search

  13. Expression profiling without genome sequence information in a non-model species, Pandalid shrimp (Pandalus latirostris), by next-generation sequencing.

    PubMed

    Kawahara-Miki, Ryouka; Wada, Kenta; Azuma, Noriko; Chiba, Susumu

    2011-01-01

    While the study of phenotypic variation is a central theme in evolutionary biology, the genetic approaches available to understanding this variation are usually limited because of a lack of genomic information in non-model organisms. This study explored the utility of next-generation sequencing (NGS) technologies for studying phenotypic variations between 2 populations of a non-model species, the Hokkai shrimp (Pandalus latirostris; Decapoda, Pandalidae). Before we performed transcriptome analyses using NGS, we examined the genetic and phenotypic differentiation between the populations. Analyses using microsatellite DNA markers suggested that these populations genetically differed from one another and that gene flow is restricted between them. Moreover, the results of our 4-year field observations indicated that the egg traits varied genetically between the populations. Using mRNA extracted from the ovaries of 5 females in each population of Hokkai shrimp, we then performed a transcriptome analysis of the 2 populations. A total of 13.66 gigabases (Gb) of 75-bp reads was obtained. Further, 58,804 and 33,548 contigs for the first and second population, respectively, and 47,467 contigs for both populations were produced by de novo assembly. We detected 552 sequences with the former approach and 702 sequences with the later one; both sets of sequences showed greater than twofold differences in the expression levels between the 2 populations. Twenty-nine sequences were found in both approaches and were considered to be differentially expressed genes. Among them, 9 sequences showed significant similarity to functional genes. The present study showed a de novo assembly approach for the transcriptome of a non-model species using only short-read sequence data, and provides a strategy for identifying sequences showing significantly different expression levels between populations. PMID:22016807

  14. Expression Profiling without Genome Sequence Information in a Non-Model Species, Pandalid Shrimp (Pandalus latirostris), by Next-Generation Sequencing

    PubMed Central

    Kawahara-Miki, Ryouka; Wada, Kenta; Azuma, Noriko; Chiba, Susumu

    2011-01-01

    While the study of phenotypic variation is a central theme in evolutionary biology, the genetic approaches available to understanding this variation are usually limited because of a lack of genomic information in non-model organisms. This study explored the utility of next-generation sequencing (NGS) technologies for studying phenotypic variations between 2 populations of a non-model species, the Hokkai shrimp (Pandalus latirostris; Decapoda, Pandalidae). Before we performed transcriptome analyses using NGS, we examined the genetic and phenotypic differentiation between the populations. Analyses using microsatellite DNA markers suggested that these populations genetically differed from one another and that gene flow is restricted between them. Moreover, the results of our 4-year field observations indicated that the egg traits varied genetically between the populations. Using mRNA extracted from the ovaries of 5 females in each population of Hokkai shrimp, we then performed a transcriptome analysis of the 2 populations. A total of 13.66 gigabases (Gb) of 75-bp reads was obtained. Further, 58,804 and 33,548 contigs for the first and second population, respectively, and 47,467 contigs for both populations were produced by de novo assembly. We detected 552 sequences with the former approach and 702 sequences with the later one; both sets of sequences showed greater than twofold differences in the expression levels between the 2 populations. Twenty-nine sequences were found in both approaches and were considered to be differentially expressed genes. Among them, 9 sequences showed significant similarity to functional genes. The present study showed a de novo assembly approach for the transcriptome of a non-model species using only short-read sequence data, and provides a strategy for identifying sequences showing significantly different expression levels between populations. PMID:22016807

  15. Site-directed gene mutation at mixed sequence targets by psoralen-conjugated pseudo-complementary peptide nucleic acids.

    PubMed

    Kim, Ki-Hyun; Nielsen, Peter E; Glazer, Peter M

    2007-01-01

    Sequence-specific DNA-binding molecules such as triple helix-forming oligonucleotides (TFOs) provide a means for inducing site-specific mutagenesis and recombination at chromosomal sites in mammalian cells. However, the utility of TFOs is limited by the requirement for homopurine stretches in the target duplex DNA. Here, we report the use of pseudo-complementary peptide nucleic acids (pcPNAs) for intracellular gene targeting at mixed sequence sites. Due to steric hindrance, pcPNAs are unable to form pcPNA-pcPNA duplexes but can bind to complementary DNA sequences by Watson-Crick pairing via double duplex-invasion complex formation. We show that psoralen-conjugated pcPNAs can deliver site-specific photoadducts and mediate targeted gene modification within both episomal and chromosomal DNA in mammalian cells without detectable off-target effects. Most of the induced psoralen-pcPNA mutations were single-base substitutions and deletions at the predicted pcPNA-binding sites. The pcPNA-directed mutagenesis was found to be dependent on PNA concentration and UVA dose and required matched pairs of pcPNAs. Neither of the individual pcPNAs alone had any effect nor did complementary PNA pairs of the same sequence. These results identify pcPNAs as new tools for site-specific gene modification in mammalian cells without purine sequence restriction, thereby providing a general strategy for designing gene targeting molecules. PMID:17977869

  16. Comparison of the nucleotide and amino acid sequences of the RsrI and EcoRI restriction endonucleases.

    PubMed

    Stephenson, F H; Ballard, B T; Boyer, H W; Rosenberg, J M; Greene, P J

    1989-12-21

    The RsrI endonuclease, a type-II restriction endonuclease (ENase) found in Rhodobacter sphaeroides, is an isoschizomer of the EcoRI ENase. A clone containing an 11-kb BamHI fragment was isolated from an R. sphaeroides genomic DNA library by hybridization with synthetic oligodeoxyribonucleotide probes based on the N-terminal amino acid (aa) sequence of RsrI. Extracts of E. coli containing a subclone of the 11-kb fragment display RsrI activity. Nucleotide sequence analysis reveals an 831-bp open reading frame encoding a polypeptide of 277 aa. A 50% identity exists within a 266-aa overlap between the deduced aa sequences of RsrI and EcoRI. Regions of 75-100% aa sequence identity correspond to key structural and functional regions of EcoRI. The type-II ENases have many common properties, and a common origin might have been expected. Nevertheless, this is the first demonstration of aa sequence similarity between ENases produced by different organisms. PMID:2695392

  17. Complete amino acid sequence of human plasma Zn-. cap alpha. /sub 2/-glycoprotein and its homology to histocompatibility antigens

    SciTech Connect

    Araki, T.; Gejyo, F.; Takagaki, K.; Haupt, H.; Schwick, H.G.; Buergi, W.; Marti, T.; Schaller, J.; Rickli, E.; Brossmer, R.

    1988-02-01

    In the present study the complete amino acid sequence of human plasma Zn-..cap alpha../sub 2/-glycoprotein was determined. This protein whose biological function is unknown consists of a single polypeptide chain of 276 amino acid residues including 8 tryptophan residues and has a pyroglutamyl residue at the amino terminus. The location of the two disulfide bonds in the polypeptide chain was also established. The three glycans, whose structure was elucidated with the aid of 500 MHz /sup 1/H NMR spectroscopy, were sialylated N-biantennas. The molecular weight calculated from the polypeptide and carbohydrate structure is 38,478, which is close to the reported value of approx. = 41,000 based on physicochemical measurements. The predicted secondary structure appeared to comprised of 23% ..cap alpha..-helix, 27% ..beta..-sheet, and 22% ..beta..-turns. The three N-glycans were found to be located in ..beta..-turn regions. An unexpected finding was made by computer analysis of the sequence data; this revealed that Zn-..cap alpha../sub 2/-glycoprotein is closely related to antigens of the major histocompatibility complex in amino acid sequence and in domain structure. There was an unusually high degree of sequence homology with the ..cap alpha.. chains of class I histocompatibility antigens. Moreover, this plasma protein was shown to be a member of the immunoglobulin gene superfamily. Zn-..cap alpha../sub 2/-glycoprotein appears to be truncated secretory major histocompatibility complex-related molecule, and it may have a role in the expression of the immune response.

  18. Clinical application of whole-genome sequencing to inform treatment for multidrug-resistant tuberculosis cases.

    PubMed

    Witney, Adam A; Gould, Katherine A; Arnold, Amber; Coleman, David; Delgado, Rachel; Dhillon, Jasvir; Pond, Marcus J; Pope, Cassie F; Planche, Tim D; Stoker, Neil G; Cosgrove, Catherine A; Butcher, Philip D; Harrison, Thomas S; Hinds, Jason

    2015-05-01

    The treatment of drug-resistant tuberculosis cases is challenging, as drug options are limited, and the existing diagnostics are inadequate. Whole-genome sequencing (WGS) has been used in a clinical setting to investigate six cases of suspected extensively drug-resistant Mycobacterium tuberculosis (XDR-TB) encountered at a London teaching hospital between 2008 and 2014. Sixteen isolates from six suspected XDR-TB cases were sequenced; five cases were analyzed in a clinically relevant time frame, with one case sequenced retrospectively. WGS identified mutations in the M. tuberculosis genes associated with antibiotic resistance that are likely to be responsible for the phenotypic resistance. Thus, an evidence base was developed to inform the clinical decisions made around antibiotic treatment over prolonged periods. All strains in this study belonged to the East Asian (Beijing) lineage, and the strain relatedness was consistent with the expectations from the case histories, confirming one contact transmission event. We demonstrate that WGS data can be produced in a clinically relevant time scale some weeks before drug sensitivity testing (DST) data are available, and they actively help clinical decision-making through the assessment of whether an isolate (i) has a particular resistance mutation where there are absent or contradictory DST results, (ii) has no further resistance markers and therefore is unlikely to be XDR, or (iii) is identical to an isolate of known resistance (i.e., a likely transmission event). A small number of discrepancies between the genotypic predictions and phenotypic DST results are discussed in the wider context of the interpretation and reporting of WGS results. PMID:25673793

  19. Groupwise registration of cardiac perfusion MRI sequences using normalized mutual information in high dimension

    NASA Astrophysics Data System (ADS)

    Hamrouni, Sameh; Rougon, Nicolas; Pr"teux, Françoise

    2011-03-01

    In perfusion MRI (p-MRI) exams, short-axis (SA) image sequences are captured at multiple slice levels along the long-axis of the heart during the transit of a vascular contrast agent (Gd-DTPA) through the cardiac chambers and muscle. Compensating cardio-thoracic motions is a requirement for enabling computer-aided quantitative assessment of myocardial ischaemia from contrast-enhanced p-MRI sequences. The classical paradigm consists of registering each sequence frame on a reference image using some intensity-based matching criterion. In this paper, we introduce a novel unsupervised method for the spatio-temporal groupwise registration of cardiac p-MRI exams based on normalized mutual information (NMI) between high-dimensional feature distributions. Here, local contrast enhancement curves are used as a dense set of spatio-temporal features, and statistically matched through variational optimization to a target feature distribution derived from a registered reference template. The hard issue of probability density estimation in high-dimensional state spaces is bypassed by using consistent geometric entropy estimators, allowing NMI to be computed directly from feature samples. Specifically, a computationally efficient kth-nearest neighbor (kNN) estimation framework is retained, leading to closed-form expressions for the gradient flow of NMI over finite- and infinite-dimensional motion spaces. This approach is applied to the groupwise alignment of cardiac p-MRI exams using a free-form Deformation (FFD) model for cardio-thoracic motions. Experiments on simulated and natural datasets suggest its accuracy and robustness for registering p-MRI exams comprising more than 30 frames.

  20. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity.

    PubMed

    Adey, Andrew; Kitzman, Jacob O; Burton, Joshua N; Daza, Riza; Kumar, Akash; Christiansen, Lena; Ronaghi, Mostafa; Amini, Sasan; Gunderson, Kevin L; Steemers, Frank J; Shendure, Jay

    2014-12-01

    We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of generating libraries comprised of 9216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to > 1 megabase. These pools are "subhaploid," in that the lengths of fragments contained in each pool sums to ∼5% to 10% of the full genome. The scaffolding approach described here, termed fragScaff, leverages coincidences between the content of different pools as a source of contiguity information. Specifically, CPT-seq data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate "joins" are used to construct a graph, which is then resolved by a minimum spanning tree. As a proof-of-concept, we apply CPT-seq and fragScaff to substantially boost the contiguity of de novo assemblies of the human, mouse, and fly genomes, increasing the scaffold N50 of de novo assemblies by eight- to 57-fold with high accuracy. We also demonstrate that fragScaff is complementary to Hi-C-based contact probability maps, providing midrange contiguity to support robust, accurate chromosome-scale de novo genome assemblies without the need for laborious in vivo cloning steps. Finally, we demonstrate CPT-seq as a means of anchoring unplaced novel human contigs to the reference genome as well as for detecting misassembled sequences. PMID:25327137

  1. In vitro, long-range sequence information for de novo genome assembly via transposase contiguity

    PubMed Central

    Adey, Andrew; Kitzman, Jacob O.; Burton, Joshua N.; Daza, Riza; Kumar, Akash; Christiansen, Lena; Ronaghi, Mostafa; Amini, Sasan; L. Gunderson, Kevin; Steemers, Frank J.

    2014-01-01

    We describe a method that exploits contiguity preserving transposase sequencing (CPT-seq) to facilitate the scaffolding of de novo genome assemblies. CPT-seq is an entirely in vitro means of generating libraries comprised of 9216 indexed pools, each of which contains thousands of sparsely sequenced long fragments ranging from 5 kilobases to >1 megabase. These pools are “subhaploid,” in that the lengths of fragments contained in each pool sums to ∼5% to 10% of the full genome. The scaffolding approach described here, termed fragScaff, leverages coincidences between the content of different pools as a source of contiguity information. Specifically, CPT-seq data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate “joins” are used to construct a graph, which is then resolved by a minimum spanning tree. As a proof-of-concept, we apply CPT-seq and fragScaff to substantially boost the contiguity of de novo assemblies of the human, mouse, and fly genomes, increasing the scaffold N50 of de novo assemblies by eight- to 57-fold with high accuracy. We also demonstrate that fragScaff is complementary to Hi-C-based contact probability maps, providing midrange contiguity to support robust, accurate chromosome-scale de novo genome assemblies without the need for laborious in vivo cloning steps. Finally, we demonstrate CPT-seq as a means of anchoring unplaced novel human contigs to the reference genome as well as for detecting misassembled sequences. PMID:25327137

  2. From metaphor to practices: The introduction of "information engineers" into the first DNA sequence database.

    PubMed

    García-Sancho, Miguel

    2011-01-01

    This paper explores the introduction of professional systems engineers and information management practices into the first centralized DNA sequence database, developed at the European Molecular Biology Laboratory (EMBL) during the 1980s. In so doing, it complements the literature on the emergence of an information discourse after World War II and its subsequent influence in biological research. By the careers of the database creators and the computer algorithms they designed, analyzing, from the mid-1960s onwards information in biology gradually shifted from a pervasive metaphor to be embodied in practices and professionals such as those incorporated at the EMBL. I then investigate the reception of these database professionals by the EMBL biological staff, which evolved from initial disregard to necessary collaboration as the relationship between DNA, genes, and proteins turned out to be more complex than expected. The trajectories of the database professionals at the EMBL suggest that the initial subject matter of the historiography of genomics should be the long-standing practices that emerged after World War II and to a large extent originated outside biomedicine and academia. Only after addressing these practices, historians may turn to their further disciplinary assemblage in fields such as bioinformatics or biotechnology. PMID:21789956

  3. The sequence diversity and expression among genes of the folic acid biosynthesis pathway in industrial Saccharomyces strains.

    PubMed

    Goncerzewicz, Anna; Misiewicz, Anna

    2015-01-01

    Folic acid is an important vitamin in human nutrition and its deficiency in pregnant women's diets results in neural tube defects and other neurological damage to the fetus. Additionally, DNA synthesis, cell division and intestinal absorption are inhibited in case of adults. Since this discovery, governments and health organizations worldwide have made recommendations concerning folic acid supplementation of food for women planning to become pregnant. In many countries this has led to the introduction of fortifications, where synthetic folic acid is added to flour. It is known that Saccharomyces strains (brewing and bakers' yeast) are one of the main producers of folic acid and they can be used as a natural source of this vitamin. Proper selection of the most efficient strains may enhance the folate content in bread, fermented vegetables, dairy products and beer by 100% and may be used in the food industry. The objective of this study was to select the optimal producing yeast strain by determining the differences in nucleotide sequences in the FOL2, FOL3 and DFR1 genes of folic acid biosynthesis pathway. The Multitemperature Single Strand Conformation Polymorphism (MSSCP) method and further nucleotide sequencing for selected strains were applied to indicate SNPs in selected gene fragments. The RT qPCR technique was also applied to examine relative expression of the FOL3 gene. Furthermore, this is the first time ever that industrial yeast strains were analysed regarding genes of the folic acid biosynthesis pathway. It was observed that a correlation exists between the folic acid amount produced by industrial yeast strains and changes in the nucleotide sequence of adequate genes. The most significant changes occur in the DFR1 gene, mostly in the first part, which causes major protein structure modifications in KKP 232, KKP 222 and KKP 277 strains. Our study shows that the large amount of SNP contributes to impairment of the selected enzymes and S. cerevisiae and S

  4. Fatty acid profile and Unigene-derived simple sequence repeat markers in tung tree (Vernicia fordii)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Tung tree (Vernicia fordii) provides the sole source of tung oil widely used in industry. Lack of fatty acid composition and molecular markers hinders biochemical, genetic and breeding research. The objectives of this study were to determine fatty acid profiles and develop unigene-derived simple se...

  5. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51. Copies of WIPO... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid...

  6. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... Director of the Federal Register in accordance with 5 U.S.C. 552(a) and 1 CFR part 51. Copies of WIPO... 37 Patents, Trademarks, and Copyrights 1 2011-07-01 2011-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid...

  7. Clinical genomics information management software linking cancer genome sequence and clinical decisions.

    PubMed

    Watt, Stuart; Jiao, Wei; Brown, Andrew M K; Petrocelli, Teresa; Tran, Ben; Zhang, Tong; McPherson, John D; Kamel-Reid, Suzanne; Bedard, Philippe L; Onetto, Nicole; Hudson, Thomas J; Dancey, Janet; Siu, Lillian L; Stein, Lincoln; Ferretti, Vincent

    2013-09-01

    Using sequencing information to guide clinical decision-making requires coordination of a diverse set of people and activities. In clinical genomics, the process typically includes sample acquisition, template preparation, genome data generation, analysis to identify and confirm variant alleles, interpretation of clinical significance, and reporting to clinicians. We describe a software application developed within a clinical genomics study, to support this entire process. The software application tracks patients, samples, genomic results, decisions and reports across the cohort, monitors progress and sends reminders, and works alongside an electronic data capture system for the trial's clinical and genomic data. It incorporates systems to read, store, analyze and consolidate sequencing results from multiple technologies, and provides a curated knowledge base of tumor mutation frequency (from the COSMIC database) annotated with clinical significance and drug sensitivity to generate reports for clinicians. By supporting the entire process, the application provides deep support for clinical decision making, enabling the generation of relevant guidance in reports for verification by an expert panel prior to forwarding to the treating physician. PMID:23603536

  8. Structure-Templated Predictions of Novel Protein Interactions from Sequence Information

    PubMed Central

    Betel, Doron; Breitkreuz, Kevin E; Isserlin, Ruth; Dewar-Darch, Danielle; Tyers, Mike; Hogue, Christopher W. V

    2007-01-01

    The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain–motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information. PMID:17892321

  9. "De-novo" amino acid sequence elucidation of protein G'e by combined "Top-Down" and "Bottom-Up" mass spectrometry

    NASA Astrophysics Data System (ADS)

    Yefremova, Yelena; Al-Majdoub, Mahmoud; Opuni, Kwabena F. M.; Koy, Cornelia; Cui, Weidong; Yan, Yuetian; Gross, Michael L.; Glocker, Michael O.

    2015-03-01

    Mass spectrometric de-novo sequencing was applied to review the amino acid sequence of a commercially available recombinant protein Ǵ with great scientific and economic importance. Substantial deviations to the published amino acid sequence (Uniprot Q54181) were found by the presence of 46 additional amino acids at the N-terminus, including a so-called "His-tag" as well as an N-terminal partial α- N-gluconoylation and α- N-phosphogluconoylation, respectively. The unexpected amino acid sequence of the commercial protein G' comprised 241 amino acids and resulted in a molecular mass of 25,998.9 ± 0.2 Da for the unmodified protein. Due to the higher mass that is caused by its extended amino acid sequence compared with the original protein G' (185 amino acids), we named this protein "protein G'e." By means of mass spectrometric peptide mapping, the suggested amino acid sequence, as well as the N-terminal partial α- N-gluconoylations, was confirmed with 100% sequence coverage. After the protein G'e sequence was determined, we were able to determine the expression vector pET-28b from Novagen with the Xho I restriction enzyme cleavage site as the best option that was used for cloning and expressing the recombinant protein G'e in E. coli. A dissociation constant ( K d ) value of 9.4 nM for protein G'e was determined thermophoretically, showing that the N-terminal flanking sequence extension did not cause significant changes in the binding affinity to immunoglobulins.

  10. "De-novo" amino acid sequence elucidation of protein G'e by combined "top-down" and "bottom-up" mass spectrometry.

    PubMed

    Yefremova, Yelena; Al-Majdoub, Mahmoud; Opuni, Kwabena F M; Koy, Cornelia; Cui, Weidong; Yan, Yuetian; Gross, Michael L; Glocker, Michael O

    2015-03-01

    Mass spectrometric de-novo sequencing was applied to review the amino acid sequence of a commercially available recombinant protein G´ with great scientific and economic importance. Substantial deviations to the published amino acid sequence (Uniprot Q54181) were found by the presence of 46 additional amino acids at the N-terminus, including a so-called "His-tag" as well as an N-terminal partial α-N-gluconoylation and α-N-phosphogluconoylation, respectively. The unexpected amino acid sequence of the commercial protein G' comprised 241 amino acids and resulted in a molecular mass of 25,998.9 ± 0.2 Da for the unmodified protein. Due to the higher mass that is caused by its extended amino acid sequence compared with the original protein G' (185 amino acids), we named this protein "protein G'e." By means of mass spectrometric peptide mapping, the suggested amino acid sequence, as well as the N-terminal partial α-N-gluconoylations, was confirmed with 100% sequence coverage. After the protein G'e sequence was determined, we were able to determine the expression vector pET-28b from Novagen with the Xho I restriction enzyme cleavage site as the best option that was used for cloning and expressing the recombinant protein G'e in E. coli. A dissociation constant (K(d)) value of 9.4 nM for protein G'e was determined thermophoretically, showing that the N-terminal flanking sequence extension did not cause significant changes in the binding affinity to immunoglobulins. PMID:25560987

  11. Dynamic behavior of an intrinsically unstructured linker domain is conserved in the face of negligible amino acid sequence conservation.

    PubMed

    Daughdrill, Gary W; Narayanaswami, Pranesh; Gilmore, Sara H; Belczyk, Agniezka; Brown, Celeste J

    2007-09-01

    Proteins or regions of proteins that do not form compact globular structures are classified as intrinsically unstructured proteins (IUPs). IUPs are common in nature and have essential molecular functions, but even a limited understanding of the evolution of their dynamic behavior is lacking. The primary objective of this work was to test the evolutionary conservation of dynamic behavior for a particular class of IUPs that form intrinsically unstructured linker domains (IULD) that tether flanking folded domains. This objective was accomplished by measuring the backbone flexibility of several IULD homologues using nuclear magnetic resonance (NMR) spectroscopy. The backbone flexibility of five IULDs, representing three kingdoms, was measured and analyzed. Two IULDs from animals, one IULD from fungi, and two IULDs from plants showed similar levels of backbone flexibility that were consistent with the absence of a compact globular structure. In contrast, the amino acid sequences of the IULDs from these three taxa showed no significant similarity. To investigate how the dynamic behavior of the IULDs could be conserved in the absence of detectable sequence conservation, evolutionary rate studies were performed on a set of nine mammalian IULDs. The results of this analysis showed that many sites in the IULD are evolving neutrally, suggesting that dynamic behavior can be maintained in the absence of natural selection. This work represents the first experimental test of the evolutionary conservation of dynamic behavior and demonstrates that amino acid sequence conservation is not required for the conservation of dynamic behavior and presumably molecular function. PMID:17721672

  12. Cloning and nucleotide sequencing of genes for three small, acid-soluble proteins from Bacillus subtilis spores.

    PubMed Central

    Connors, M J; Mason, J M; Setlow, P

    1986-01-01

    Three Bacillus subtilis genes (termed sspA, sspB, and sspD) which code for small, acid-soluble spore proteins (SASPs) have been cloned, and their complete nucleotide sequence has been determined. The amino acid sequences of the SASPs coded for by these genes are similar to each other and to those of the SASP-1 of B. subtilis (coded for by the sspC gene) and the SASP-A/C family of B. megaterium. The sspA and sspB genes are expressed only in sporulation, in parallel with each other and with the sspC gene. Two regions upstream of the postulated transcription start sites for the sspA and B genes have significant homology with the analogous regions of the sspC gene and the SASP-A/C gene family. Purification of two of the three major B, subtilis SASPs (alpha and beta) and determination of their amino-terminal sequences indicated that the sspA gene codes for SASP-alpha and that the sspB gene codes for SASP-beta. This was confirmed by the introduction of deletion mutations into the cloned sspA and sspB genes and transfer of these deletions into the B. subtilis chromosome with concomitant loss of the wild-type gene. Images PMID:3009398

  13. Nucleotide sequence of the fadR gene, a multifunctional regulator of fatty acid metabolism in Escherichia coli.

    PubMed Central

    DiRusso, C C

    1988-01-01

    The Escherichia coli fadR gene is a multifunctional regulator of fatty acid and acetate metabolism. In the present work the nucleotide sequence of the 1.3 kb DNA fragment which encodes FadR has been determined. The coding sequence of the fadR gene is 714 nucleotides long and is preceded by a typical E. coli ribosome binding site and is followed by a sequence predicted to be sufficient for factor-independent chain termination. Primer extension experiments demonstrated that the transcription of the fadR gene initiates with an adenine nucleotide 33 nucleotides upstream from the predicted start of translation. The derived fadR peptide has a calculated molecular weight of 26,972. This is in reasonable agreement with the apparent molecular weight of 29,000 previously estimated on the basis of maxi-cell analysis of plasmid encoded proteins. There is a segment of twenty amino acids within the predicted peptide which resembles the DNA recognition and binding site of many transcriptional regulatory proteins. Images PMID:2843809

  14. The amino acid sequence of protein SCMK-B2C from the high-sulphur fraction of wool keratin

    PubMed Central

    Elleman, T. C.

    1972-01-01

    1. The amino acid sequence of a protein from the reduced and carboxymethylated high-sulphur fraction of wool has been determined. 2. The sequence of this S-carboxymethylkerateine (SCMK-B2C) of 151 amino acid residues displays much internal homology and an unusual residue distribution. Thus a ten-residue sequence occurs four times near the N-terminus and five times near the C-terminus with few changes. These regions contain much of the molecule's half-cystine, whereas between them there is a region of 19 residues that are mainly small and devoid of cystine and proline. 3. Certain models of the wool fibre based on its mechanical and physical properties propose a matrix of small compact globular units linked together to form beaded chains. The unusual distribution of the component residues of protein SCMK-B2C suggests structures in the wool-fibre matrix compatible with certain features of the proposed models. PMID:4678578

  15. Nuclear gene for mitochondrial leucyl-tRNA synthetase of Neurospora crassa: isolation, sequence, chromosomal mapping, and evidence that the leu-5 locus specifies structural information.

    PubMed Central

    Chow, C M; Metzenberg, R L; Rajbhandary, U L

    1989-01-01

    We have isolated and characterized the nuclear gene for the mitochondrial leucyl-tRNA synthetase (LeuRS) of Neurospora crassa and have established that a defect in this structural gene is responsible for the leu-5 phenotype. We have purified mitochondrial LeuRS protein, determined its N-terminal sequence, and used this sequence information to identify and isolate a full-length genomic DNA clone. The 3.7-kilobase-pair region representing the structural gene and flanking regions has been sequenced. The 5' ends of the mRNA were mapped by S1 nuclease protection, and the 3' ends were determined from the sequence of cDNA clones. The gene contains a single short intron, 60 base pairs long. The methionine-initiated open reading frame specifies a 52-amino-acid mitochondrial targeting sequence followed by a 942-amino-acid protein. Restriction fragment length polymorphism analyses mapped the mitochondrial LeuRS structural gene to linkage group V, exactly where the leu-5 mutation had been mapped before. We show that the leu-5 strain has a defect in the structural gene for mitochondrial LeuRS by restoring growth under restrictive conditions for this strain after transformation with a wild-type copy of the mitochondrial LeuRS gene. We have cloned the mutant allele present in the leu-5 strain and identified the defect as being due to a Thr-to-Pro change in mitochondrial LeuRS. Finally, we have used immunoblotting to show that despite the apparent lack of mitochondrial LeuRS activity in leu-5 extracts, the leu-5 strain contains levels of mitochondrial LeuRS protein to similar to those of the wild-type strain. Images PMID:2574823

  16. The nucleotide sequence of HLA-B{sup *}2704 reveals a new amino acid substitution in exon 4 which is also present in HLA-B{sup *}2706

    SciTech Connect

    Rudwaleit, M.; Bowness, P.; Wordsworth, P.

    1996-12-31

    The HLA-B27 subtype HLA-B{sup *}2704 is virtually absent in Caucasians but common in Orientals, where it is associated with ankylosing spondylitis. The amino acid sequence of HLA-B{sup *}2704 has been established by peptide mapping and was shown to differ by two amino acids from HLA-B{sup *}2705, HLA-B{sup *}2704 is characterized by a serine for aspartic acid substitution at position 77 and glutamic acid for valine at position 152. To date, however, no nucleotide sequence confirming these changes at the DNA level has been published. 13 refs., 2 figs.

  17. Correction of projective distortion in long-image-sequence mosaics without prior information

    NASA Astrophysics Data System (ADS)

    Yang, Chenhui; Mao, Hongwei; Abousleman, Glen; Si, Jennie

    2010-04-01

    Image mosaicking is the process of piecing together multiple video frames or still images from a moving camera to form a wide-area or panoramic view of the scene being imaged. Mosaics have widespread applications in many areas such as security surveillance, remote sensing, geographical exploration, agricultural field surveillance, virtual reality, digital video, and medical image analysis, among others. When mosaicking a large number of still images or video frames, the quality of the resulting mosaic is compromised by projective distortion. That is, during the mosaicking process, the image frames that are transformed and pasted to the mosaic become significantly scaled down and appear out of proportion with respect to the mosaic. As more frames continue to be transformed, important target information in the frames can be lost since the transformed frames become too small, which eventually leads to the inability to continue further. Some projective distortion correction techniques make use of prior information such as GPS information embedded within the image, or camera internal and external parameters. Alternatively, this paper proposes a new algorithm to reduce the projective distortion without using any prior information whatsoever. Based on the analysis of the projective distortion, we approximate the projective matrix that describes the transformation between image frames using an affine model. Using singular value decomposition, we can deduce the affine model scaling factor that is usually very close to 1. By resetting the image scale of the affine model to 1, the transformed image size remains unchanged. Even though the proposed correction introduces some error in the image matching, this error is typically acceptable and more importantly, the final mosaic preserves the original image size after transformation. We demonstrate the effectiveness of this new correction algorithm on two real-world unmanned air vehicle (UAV) sequences. The proposed method is

  18. A molecular mechanism realizing sequence-specific recognition of nucleic acids by TDP-43

    PubMed Central

    Furukawa, Yoshiaki; Suzuki, Yoh; Fukuoka, Mami; Nagasawa, Kenichi; Nakagome, Kenta; Shimizu, Hideaki; Mukaiyama, Atsushi; Akiyama, Shuji

    2016-01-01

    TAR DNA-binding protein 43 (TDP-43) is a DNA/RNA-binding protein containing two consecutive RNA recognition motifs (RRM1 and RRM2) in tandem. Functional abnormality of TDP-43 has been proposed to cause neurodegeneration, but it remains obscure how the physiological functions of this protein are regulated. Here, we show distinct roles of RRM1 and RRM2 in the sequence-specific substrate recognition of TDP-43. RRM1 was found to bind a wide spectrum of ssDNA sequences, while no binding was observed between RRM2 and ssDNA. When two RRMs are fused in tandem as in native TDP-43, the fused construct almost exclusively binds ssDNA with a TG-repeat sequence. In contrast, such sequence-specificity was not observed in a simple mixture of RRM1 and RRM2. We thus propose that the spatial arrangement of multiple RRMs in DNA/RNA binding proteins provides steric effects on the substrate-binding site and thereby controls the specificity of its substrate nucleotide sequences. PMID:26838063

  19. Peptide mapping and amino acid sequencing of two catechol 1,2-dioxygenases (CD I1 and CD I2) from Acinetobacter lwoffii K24.

    PubMed

    Kim, S I; Ha, K S

    1997-10-31

    The partial amino acid sequences of two catechol 1,2-dioxygenases (CD I1 and CD I2) from Acinetobacter lwoffii K24 have been determined by analysis of peptides after cleavages with endopeptidase Lys-C, endopeptidase Glu-C, trypsin, and chemicals (cyanogen bromide and BNPS-skatole). They include 248 amino acid sequences (4 fragments) of CD I1 and 211 amino acid sequences (5 fragments) of CD I2. Two enzymes have more than 50% sequence homology with type I catechol 1,2-dioxygenases and less than 30% sequence homology with type II catechol 1,2-dioxygenases. Two enzymes have similar hydropathy profiles in the N-terminal region, suggesting that they have similar secondary structures. PMID:9387151

  20. The Role of HIV-1 gp41 Glycoprotein in Infectious Tropism Inferred from Physico-Chemical Properties of its Amino Acid Sequence

    NASA Astrophysics Data System (ADS)

    Figueroa, E.; Villarreal, C.; Huerta, L.; Cocho, G.

    2006-09-01

    We performed a statistical analysis of the amino acid sequence of the gp41 ectodomain of the Human Immunodeficiency Virus type 1. We found strong correlations between physicochemical properties of highly variable residues and the viral infectious tropism.

  1. Characterization of fatty acid-producing wastewater microbial communities using next generation sequencing technologies

    EPA Science Inventory

    While wastewater represents a viable source of bacterial biodiesel production, very little is known on the composition of these microbial communities. We studied the taxonomic diversity and succession of microbial communities in bioreactors accumulating fatty acids using 454-pyro...

  2. De novo Sequencing and Transcriptome Analysis of Pinellia ternata Identify the Candidate Genes Involved in the Biosynthesis of Benzoic Acid and Ephedrine

    PubMed Central

    Zhang, Guang-hui; Jiang, Ni-hao; Song, Wan-ling; Ma, Chun-hua; Yang, Sheng-chao; Chen, Jun-wen

    2016-01-01

    Background: The medicinal herb, Pinellia ternata, is purported to be an anti-emetic with analgesic and sedative effects. Alkaloids are the main biologically active compounds in P. ternata, especially ephedrine that is a phenylpropylamino alkaloid specifically produced by Ephedra and Catha edulis. However, how ephedrine is synthesized in plants is uncertain. Only the phenylalanine ammonia lyase (PAL) and relevant genes in this pathway have been characterized. Genomic information of P. ternata is also unavailable. Results: We analyzed the transcriptome of the tuber of P. ternata with the Illumina HiSeq™ 2000 sequencing platform. 66,813,052 high-quality reads were generated, and these reads were assembled de novo into 89,068 unigenes. Most known genes involved in benzoic acid biosynthesis were identified in the unigene dataset of P. ternata, and the expression patterns of some ephedrine biosynthesis-related genes were analyzed by reverse transcription quantitative real-time PCR (RT-qPCR). Also, 14,468 simple sequence repeats (SSRs) were identified from 12,000 unigenes. Twenty primer pairs for SSRs were randomly selected for the validation of their amplification effect. Conclusion: RNA-seq data was used for the first time to provide a comprehensive gene information on P. ternata at the transcriptional level. These data will advance molecular genetics in this valuable medicinal plant. PMID:27579029

  3. Complete Genome Sequence of Amino Acid-Utilizing Eubacterium acidaminophilum al-2 (DSM 3953)

    PubMed Central

    Poehlein, Anja; Andreesen, Jan R.

    2014-01-01

    Eubacterium acidaminophilum is a strictly anaerobic, Gram-positive, rod-shaped bacterium which belongs to cluster XI of the Clostridia. It ferments amino acids by a Stickland reaction. The genome harbors a chromosome (2.25 Mb) and a megaplasmid (0.8 Mb). It contains several gene clusters coding for selenocysteine-containing, glycine-derived, and amino acid-degrading reductases. PMID:24926057

  4. Using Chou's pseudo amino acid composition to predict protein quaternary structure: a sequence-segmented PseAAC approach.

    PubMed

    Zhang, Shao-Wu; Chen, Wei; Yang, Feng; Pan, Quan

    2008-10-01

    In the protein universe, many proteins are composed of two or more polypeptide chains, generally referred to as subunits, which associate through noncovalent interactions and, occasionally, disulfide bonds to form protein quaternary structures. It has long been known that the functions of proteins are closely related to their quaternary structures; some examples include enzymes, hemoglobin, DNA polymerase, and ion channels. However, it is extremely labor-expensive and even impossible to quickly determine the structures of hundreds of thousands of protein sequences solely from experiments. Since the number of protein sequences entering databanks is increasing rapidly, it is highly desirable to develop computational methods for classifying the quaternary structures of proteins from their primary sequences. Since the concept of Chou's pseudo amino acid composition (PseAAC) was introduced, a variety of approaches, such as residue conservation scores, von Neumann entropy, multiscale energy, autocorrelation function, moment descriptors, and cellular automata, have been utilized to formulate the PseAAC for predicting different attributes of proteins. Here, in a different approach, a sequence-segmented PseAAC is introduced to represent protein samples. Meanwhile, multiclass SVM classifier modules were adopted to classify protein quaternary structures. As a demonstration, the dataset constructed by Chou and Cai [(2003) Proteins 53:282-289] was adopted as a benchmark dataset. The overall jackknife success rates thus obtained were 88.2-89.1%, indicating that the new approach is quite promising for predicting protein quaternary structure. PMID:18427713

  5. The Homeodomain Resource: a comprehensive collection of sequence, structure, interaction, genomic and functional information on the homeodomain protein family

    PubMed Central

    Moreland, R. Travis; Ryan, Joseph F.; Pan, Christopher; Baxevanis, Andreas D.

    2009-01-01

    The Homeodomain Resource is a curated collection of sequence, structure, interaction, genomic and functional information on the homeodomain family. The current version builds upon previous versions by the addition of new, complete sets of homeodomain sequences from fully sequenced genomes, the expansion of existing curated homeodomain information and the improvement of data accessibility through better search tools and more complete data integration. This release contains 1534 full-length homeodomain-containing sequences, 93 experimentally derived homeodomain structures, 101 homeodomain protein–protein interactions, 107 homeodomain DNA-binding sites and 206 homeodomain proteins implicated in human genetic disorders. Database URL: The Homeodomain Resource is freely available and can be accessed at http://research.nhgri.nih.gov/homeodomain/ PMID:20157477

  6. Coding in 2D: Using Intentional Dispersity to Enhance the Information Capacity of Sequence-Coded Polymer Barcodes.

    PubMed

    Laure, Chloé; Karamessini, Denise; Milenkovic, Olgica; Charles, Laurence; Lutz, Jean-François

    2016-08-26

    A 2D approach was studied for the design of polymer-based molecular barcodes. Uniform oligo(alkoxyamine amide)s, containing a monomer-coded binary message, were synthesized by orthogonal solid-phase chemistry. Sets of oligomers with different chain-lengths were prepared. The physical mixture of these uniform oligomers leads to an intentional dispersity (1st dimension fingerprint), which is measured by electrospray mass spectrometry. Furthermore, the monomer sequence of each component of the mass distribution can be analyzed by tandem mass spectrometry (2nd dimension sequencing). By summing the sequence information of all components, a binary message can be read. A 4-bytes extended ASCII-coded message was written on a set of six uniform oligomers. Alternatively, a 3-bytes sequence was written on a set of five oligomers. In both cases, the coded binary information was recovered. PMID:27484303

  7. Effects of the amino acid sequence on thermal conduction through β-sheet crystals of natural silk protein.

    PubMed

    Zhang, Lin; Bai, Zhitong; Ban, Heng; Liu, Ling

    2015-11-21

    Recent experiments have discovered very different thermal conductivities between the spider silk and the silkworm silk. Decoding the molecular mechanisms underpinning the distinct thermal properties may guide the rational design of synthetic silk materials and other biomaterials for multifunctionality and tunable properties. However, such an understanding is lacking, mainly due to the complex structure and phonon physics associated with the silk materials. Here, using non-equilibrium molecular dynamics, we demonstrate that the amino acid sequence plays a key role in the thermal conduction process through β-sheets, essential building blocks of natural silks and a variety of other biomaterials. Three representative β-sheet types, i.e. poly-A, poly-(GA), and poly-G, are shown to have distinct structural features and phonon dynamics leading to different thermal conductivities. A fundamental understanding of the sequence effects may stimulate the design and engineering of polymers and biopolymers for desired thermal properties. PMID:26455593

  8. Ribonuclease "XlaI," an activity from Xenopus laevis oocytes that excises intervening sequences from yeast transfer ribonucleic acid precursors.

    PubMed Central

    Otsuka, A; de Paolis, A; Tocchini-Valentini, G P

    1981-01-01

    A ribonuclease (RNase) activity, RNase "XlaI," responsible for the excision of intervening sequences from two yeast transfer ribonucleic acid (tRNA) precursors, pre-tRNA(Tyr) and pre-tRNA(3Leu), has been purified 54-fold from nuclear extracts of Xenopus laevis oocytes. The RNase preparation is essentially free of contaminating RNase. A quantitative assay for RNase XlaI was developed, and the reaction products were characterized. RNase XlaI cleavage sites in the yeast tRNA precursors were identical to those made by yeast extracts (including 3'-phosphate and 5'-hydroxyl termini). Cleavage of pre-tRNA(3Leu) by RNase XlaI and subsequent ligation of the half-tRNA molecules do not require removal of the 5' leader or 3' trailer sequences. Images PMID:6765601

  9. Two Molecular Information Processing Systems Based on Catalytic Nucleic Acids

    NASA Astrophysics Data System (ADS)

    Stojanovic, Milan

    Mixtures of molecules are capable of powerful information processing [1]. This statement is in the following way self-evident: it is a hierarchically organized complex mixture of molecules that is formulating it to other similarly organized mixtures of molecules. By making such a statement I am not endorsing the extreme forms of reductionism; rather, I am making what I think is a small first step towards harnessing information processing prowess of molecules and, hopefully, overcoming some limitations of more traditional computing paradigms. There are different ideas on how to understand and use molecular information processing abilities and I will list some below. My list is far from inclusive, and delineations are far from clear-cut; whenever available, I will provide examples from our research efforts. I should stress, for a computer science audience that I am a chemist. Thus, my approach may have much different focus and mathematical rigor, then if it would be taken by a computer scientist.

  10. Tetra-allelic SNPs: Informative forensic markers compiled from public whole-genome sequence data.

    PubMed

    Phillips, C; Amigo, J; Carracedo, Á; Lareu, M V

    2015-11-01

    Multiple-allele single nucleotide polymorphisms (SNPs) are potentially useful for forensic DNA analysis as they can provide more discrimination power than normal binary SNPs. In addition, the presence in a profile of more than two alleles per marker provides a clearer indication of mixed DNA than assessments of imbalanced signals in the peak pairs of binary SNPs. Using the 1000 Genomes Phase III human variant data release of 2014 as the starting point, this study collated 961 tetra-allelic SNPs that pass minimum sequence quality thresholds and where four separate nucleotide substitution alleles were detected. Although most of these loci had three of the four alleles in combined frequencies of 2% or less, 160 had high heterozygosities with 50 exceeding those of 'ideal' 0.5:0.5 binary SNPs. From this set of most polymorphic tetra-allelic SNPs, we identified markers most informative for forensic purposes and explored these loci in detail. Subsets of the most polymorphic tetra-allelic SNPs will make useful additions to current panels of forensic identification SNPs and ancestry-informative SNPs. The 24 most discriminatory tetra-allelic SNPs were estimated to detect more than two alleles in at least one marker per profile in 99.9% of mixtures of African contributors. In European contributor mixtures 99.4% of profiles would show multiple allele patterns, but this drops to 92.6% of East Asian contributor mixtures due to reduced levels of polymorphism for the 24 SNPs in this population group. PMID:26209763

  11. Purification, amino acid sequence and mode of action of bifidocin B produced by Bifidobacterium bifidum NCFB 1454.

    PubMed

    Yildirim, Z; Winters, D K; Johnson, M G

    1999-01-01

    Bifidocin B produced by Bifidobacterium bifidum NCFB 1454 was purified to homogeneity by a rapid and simple three step purification procedure which included freeze drying, Micro-Cel adsorption/desorption and cation exchange chromatography. The purification resulted in 18% recovery and an approximately 1900-fold increase in the specific activity and purity of bifidocin B. Treatment with bifidocin B caused sensitive cells to lose high amounts of intracellular K+ ions and u.v.-absorbing materials, and to become more permeable to ONPG. Bifidocin B adsorbed to the Gram-positive bacteria but not the Gram-negative bacteria tested. Its adsorption was pH-dependent but not time-dependent. For sensitive cells, the adsorption and lethal action of bifidocin B was very rapid. In 5 min, 95% of bifidocin B adsorbed onto sensitive cells. Several salts inhibited the binding of bifidocin B, which could be overcome by increasing the amount of bifidocin B added. Pre-treatment of sensitive cells and cell walls with detergents, organic solvents or enzymes did not cause a reduction in subsequent cellular binding of bifidocin B, but cell wall preparations treated with methanol:chloroform and hot 20% (w/v) TCA lost the ability to adsorb bifidocin B. Also, the addition of purified heterologous lipoteichoic acid to sensitive cells completely blocked the adsorption of bifidocin B. The amino acid sequence indicated that the bacteriocin contained 36 residues. N-terminal amino acid sequence analysis yielded a sequence of KYYGNGVTCGLHDCRVDRGKATCGIINNGGMWGDIG. Curing experiments with 20 micrograms ml-1 acriflavine yielded cell derivatives that no longer produced bifidocin B but retained immunity to bifidocin B. Production of bifidocin B, but not immunity to bifidocin B, was associated with a plasmid of about 8 kb in this strain. PMID:10030011

  12. Amino acid sequences of two novel long-chain neurotoxins from the venom of the sea snake Laticauda colubrina.

    PubMed

    Kim, H S; Tamiya, N

    1982-11-01

    From the venom of a population of the sea snake Laticauda colubrina from the Solomon Islands, a neurotoxic component, Laticauda colubrina a (toxin Lc a), was isolated in 16.6% (A280) yield. Similarly, from the venom of a population of L. colubrina from the Philippines, a neurotoxic component, Laticauda colubrina b (toxin Lc b), was obtained in 10.0% (A280) yield. The LD50 values of these toxins were 0.12 microgram/g body wt. on intramuscular injection in mice. Toxins Lc a and Lc b were each composed of molecules containing 69 amino acid residues with eight half-cystine residues. The complete amino acid sequences of these two toxins were elucidated. Toxins Lc a and Lc b are different from each other at five positions of their sequences, namely at positions 31 (Phe/Ser), 32 (Leu/Ile), 33 (Lys/Arg), 50 (Pro/Arg) and 53 (Asp/His) (residues in parentheses give the residues in toxins Lc a and Lc b respectively). Toxins Lc a and Lc b have a novel structure in that they have only four disulphide bridges, although the whole amino acid sequences are homologous to those of other known long-chain neurotoxins. It is remarkable that toxins Lc a and Lc b are not coexistent at the detection error of 6% of the other toxin. Populations of Laticauda colubrina from the Solomon Islands and from the Philippines have either toxin Lc a or toxin Lc b and not both of them. PMID:7159381

  13. The sequence of rat leukosialin (W3/13 antigen) reveals a molecule with O-linked glycosylation of one third of its extracellular amino acids.

    PubMed Central

    Killeen, N; Barclay, A N; Willis, A C; Williams, A F

    1987-01-01

    Leukosialin is one of the major glycoproteins of thymocytes and T lymphocytes and is notable for a very high content of O-linked carbohydrate structures. The full protein sequence for rat leukosialin as translated from cDNA clones is now reported. The molecule contains 371 amino acids with 224 residues outside the cell, one transmembrane sequence and 124 cytoplasmic residues. Data from the peptide sequence and carbohydrate composition suggest that one in three of the extracellular amino acids may be O-glycosylated with no N-linked glycosylation sites. The cDNA sequence contained a CpG rich region in the 3' coding sequence and a large 3' non-coding region which included tandem repeats of the sequence GGAT. Images Fig. 4. PMID:2965006

  14. Amorphous/nanocrystalline silicon biosensor for the specific identification of unamplified nucleic acid sequences using gold nanoparticle probes

    NASA Astrophysics Data System (ADS)

    Martins, Rodrigo; Baptista, Pedro; Raniero, Leandro; Doria, Gonçalo; Silva, Leonardo; Franco, Ricardo; Fortunato, Elvira

    2007-01-01

    Amorphous/nanocrystalline silicon pi 'ii'n devices fabricated on micromachined glass substrates are integrated with oligonucleotide-derivatized gold nanoparticles for a colorimetric detection method. The method enables the specific detection and quantification of unamplified nucleic acid sequences (DNA and RNA) without the need to functionalize the glass surface, allowing for resolution of single nucleotide differences between DNA and RNA sequences—single nucleotide polymorphism and mutation detection. The detector's substrate is glass and the sample is directly applied on the back side of the biosensor, ensuring a direct optical coupling of the assays with a concomitant maximum photon capture and the possibility to reuse the sensor.

  15. An Interpretation of the Ancestral Codon from Miller’s Amino Acids and Nucleotide Correlations in Modern Coding Sequences

    PubMed Central

    Carels, Nicolas; de Leon, Miguel Ponce

    2015-01-01

    Purine bias, which is usually referred to as an “ancestral codon”, is known to result in short-range correlations between nucleotides in coding sequences, and it is common in all species. We demonstrate that RWY is a more appropriate pattern than the classical RNY, and purine bias (Rrr) is the product of a network of nucleotide compensations induced by functional constraints on the physicochemical properties of proteins. Through deductions from universal correlation properties, we also demonstrate that amino acids from Miller’s spark discharge experiment are compatible with functional primeval proteins at the dawn of living cell radiation on earth. These amino acids match the hydropathy and secondary structures of modern proteins. PMID:25922573

  16. Rapid Nucleic Acid Sequencing Methods--Alternative Approaches to Facilitating Learning.

    ERIC Educational Resources Information Center

    Bryce, Charles F. A.

    1982-01-01

    Because advanced students had difficulty in interpreting cleavage patterns obtained by gel electrophoresis related to rapid sequencing techniques for DNA and RNA, several formats were developed to aid in understanding this topic. Formats included print, print plus scrambled print, interactive computer-based instruction, and high-resolution…

  17. Draft Genome Sequence of Ustilago trichophora RK089, a Promising Malic Acid Producer

    PubMed Central

    Zambanini, Thiemo; Buescher, Joerg M.; Meurer, Guido; Blank, Lars M.

    2016-01-01

    The basidiomycetous smut fungus Ustilago trichophora RK089 produces malate from glycerol. De novo genome sequencing revealed a 20.7-Mbp genome (301 gap-closed contigs, 246 scaffolds). A comparison to the genome of Ustilago maydis 521 revealed all essential genes for malate production from glycerol contributing to metabolic engineering for improving malate production. PMID:27469969

  18. Draft Genome Sequence of Ustilago trichophora RK089, a Promising Malic Acid Producer.

    PubMed

    Zambanini, Thiemo; Buescher, Joerg M; Meurer, Guido; Wierckx, Nick; Blank, Lars M

    2016-01-01

    The basidiomycetous smut fungus Ustilago trichophora RK089 produces malate from glycerol. De novo genome sequencing revealed a 20.7-Mbp genome (301 gap-closed contigs, 246 scaffolds). A comparison to the genome of Ustilago maydis 521 revealed all essential genes for malate production from glycerol contributing to metabolic engineering for improving malate production. PMID:27469969

  19. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... submissions in computer readable form. (a) The computer readable form required by § 1.821(e) shall meet the following requirements: (1) The computer readable form shall contain a single “Sequence Listing” as either...

  20. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... submissions in computer readable form. (a) The computer readable form required by § 1.821(e) shall meet the following requirements: (1) The computer readable form shall contain a single “Sequence Listing” as either...

  1. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... submissions in computer readable form. (a) The computer readable form required by § 1.821(e) shall meet the following requirements: (1) The computer readable form shall contain a single “Sequence Listing” as either...

  2. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... submissions in computer readable form. (a) The computer readable form required by § 1.821(e) shall meet the following requirements: (1) The computer readable form shall contain a single “Sequence Listing” as either...

  3. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... submissions in computer readable form. (a) The computer readable form required by § 1.821(e) shall meet the following requirements: (1) The computer readable form shall contain a single “Sequence Listing” as either...

  4. Molecular characterization of the body site-specific human epidermal cytokeratin 9: cDNA cloning, amino acid sequence, and tissue specificity of gene expression.

    PubMed

    Langbein, L; Heid, H W; Moll, I; Franke, W W

    1993-12-01

    Differentiation of human plantar and palmar epidermis is characterized by the suprabasal synthesis of a major special intermediate-sized filament (IF) protein, the type I (acidic) cytokeratin 9 (CK 9). Using partial amino acid (aa) sequence information obtained by direct Edman sequencing of peptides resulting from proteolytic digestion of purified CK 9, we synthesized several redundant primers by 'back-translation'. Amplification by polymerase chain reaction (PCR) of cDNAs obtained by reverse transcription of mRNAs from human foot sole epidermis, including 5'-primer extension, resulted in multiple overlapping cDNA clones, from which the complete cDNA (2353 bp) could be constructed. This cDNA encoded the CK 9 polypeptide with a calculated molecular weight of 61,987 and an isoelectric point at about pH 5.0. The aa sequence deduced from cDNA was verified in several parts by comparison with the peptide sequences and showed the typical structure of type I CKs, with a head (153 aa), and alpha-helical coiled-coil-forming rod (306 aa), and a tail (163 aa) domain. The protein displayed the highest homology to human CK 10, not only in the highly conserved rod domain but also in large parts of the head and the tail domains. On the other hand, the aa sequence revealed some remarkable differences from CK 10 and other CKs, even in the most conserved segments of the rod domain. The nuclease digestion pattern seen on Southern blot analysis of human genomic DNA indicated the existence of a unique CK 9 gene. Using CK 9-specific riboprobes for hybridization on Northern blots of RNAs from various epithelia, a mRNA of about 2.4 kb in length could be identified only in foot sole epidermis, and a weaker cross-hybridization signal was seen in RNA from bovine heel pad epidermis at about 2.0 kb. A large number of tissues and cell cultures were examined by PCR of mRNA-derived cDNAs, using CK 9-specific primers. But even with this very sensitive signal amplification, only palmar

  5. Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles.

    PubMed

    Rodrigue, Nicolas; Philippe, Hervé; Lartillot, Nicolas

    2010-03-01

    Modeling the interplay between mutation and selection at the molecular level is key to evolutionary studies. To this end, codon-based evolutionary models have been proposed as pertinent means of studying long-range evolutionary patterns and are widely used. However, these approaches have not yet consolidated results from amino acid level phylogenetic studies showing that selection acting on proteins displays strong site-specific effects, which translate into heterogeneous amino acid propensities across the columns of alignments; related codon-level studies have instead focused on either modeling a single selective context for all codon columns, or a separate selective context for each codon column, with the former strategy deemed too simplistic and the latter deemed overparameterized. Here, we integrate recent developments in nonparametric statistical approaches to propose a probabilistic model that accounts for the heterogeneity of amino acid fitness profiles across the coding positions of a gene. We apply the model to a dozen real protein-coding gene alignments and find it to produce biologically plausible inferences, for instance, as pertaining to site-specific amino acid constraints, as well as distributions of scaled selection coefficients. In their account of mutational features as well as the heterogeneous regimes of selection at the amino acid level, the modeling approaches studied here can form a backdrop for several extensions, accounting for other selective features, for variable population size, or for subtleties of mutational features, all with parameterizations couched within population-genetic theory. PMID:20176949

  6. Terminal sequence studies of high-molecular-weight ribonucleic acid. The 3′-termini of rabbit globin messenger ribonucleic acid

    PubMed Central

    Hunt, John A.

    1973-01-01

    Haemoglobin mRNA isolated from EDTA-treated polyribosomes has an apparent molecular weight of 120000–180000 estimated by condensation with 3H-labelled isoniazid after periodate oxidation. Analysis of the ribonuclease digests of isoniazid-labelled RNA by paper electrophoresis and column chromatography enables the amount of contaminating 18S, 7S, 5S and 4S RNA to be estimated, and a corrected molecular weight of globin mRNA as the acid is 161000 or 500 nucleotides in length. This molecule contains two groups of 3′-terminal sequences in equal yield; G-Y-A6 and G-Y-A7 in the ratio 3:2, and G-N9–16-Y-A2 and G-N9–16-Y-N3 in the ratio 3:2. The significance of these sequences is discussed in relation to the poly(A) content of globin mRNA, the specificity of the sequences, and possible function in processing and biosynthesis of mRNA. PMID:4737318

  7. In-silico design of computational nucleic acids for molecular information processing

    PubMed Central

    2013-01-01

    Within recent years nucleic acids have become a focus of interest for prototype implementations of molecular computing concepts. During the same period the importance of ribonucleic acids as components of the regulatory networks within living cells has increasingly been revealed. Molecular computers are attractive due to their ability to function within a biological system; an application area extraneous to the present information technology paradigm. The existence of natural information processing architectures (predominately exemplified by protein) demonstrates that computing based on physical substrates that are radically different from silicon is feasible. Two key principles underlie molecular level information processing in organisms: conformational dynamics of macromolecules and self-assembly of macromolecules. Nucleic acids support both principles, and moreover computational design of these molecules is practicable. This study demonstrates the simplicity with which one can construct a set of nucleic acid computing units using a new computational protocol. With the new protocol, diverse classes of nucleic acids imitating the complete set of boolean logical operators were constructed. These nucleic acid classes display favourable thermodynamic properties and are significantly similar to the approximation of successful candidates implemented in the laboratory. This new protocol would enable the construction of a network of interconnecting nucleic acids (as a circuit) for molecular information processing. PMID:23647621

  8. In-silico design of computational nucleic acids for molecular information processing.

    PubMed

    Ramlan, Effirul Ikhwan; Zauner, Klaus-Peter

    2013-01-01

    Within recent years nucleic acids have become a focus of interest for prototype implementations of molecular computing concepts. During the same period the importance of ribonucleic acids as components of the regulatory networks within living cells has increasingly been revealed. Molecular computers are attractive due to their ability to function within a biological system; an application area extraneous to the present information technology paradigm. The existence of natural information processing architectures (predominately exemplified by protein) demonstrates that computing based on physical substrates that are radically different from silicon is feasible. Two key principles underlie molecular level information processing in organisms: conformational dynamics of macromolecules and self-assembly of macromolecules. Nucleic acids support both principles, and moreover computational design of these molecules is practicable. This study demonstrates the simplicity with which one can construct a set of nucleic acid computing units using a new computational protocol. With the new protocol, diverse classes of nucleic acids imitating the complete set of boolean logical operators were constructed. These nucleic acid classes display favourable thermodynamic properties and are significantly similar to the approximation of successful candidates implemented in the laboratory. This new protocol would enable the construction of a network of interconnecting nucleic acids (as a circuit) for molecular information processing. PMID:23647621

  9. Sequence Learning in 4-Month-Old Infants: Do Infants Represent Ordinal Information?

    ERIC Educational Resources Information Center

    Lewkowicz, David J.; Berent, Iris

    2009-01-01

    This study investigated how 4-month-old infants represent sequences: Do they track the statistical relations among specific sequence elements (e.g., AB, BC) or do they encode abstract ordinal positions (i.e., B is second)? Infants were habituated to sequences of 4 moving and sounding elements--3 of the elements varied in their ordinal position…

  10. Integration of Temporal and Ordinal Information during Serial Interception Sequence Learning

    ERIC Educational Resources Information Center

    Gobel, Eric W.; Sanchez, Daniel J.; Reber, Paul J.

    2011-01-01

    The expression of expert motor skills typically involves learning to perform a precisely timed sequence of movements. Research examining incidental sequence learning has relied on a perceptually cued task that gives participants exposure to repeating motor sequences but does not require timing of responses for accuracy. In the 1st experiment, a…

  11. Identification of the amino acid sequence that targets peroxiredoxin 6 to lysosome-like structures of lung epithelial cells.

    PubMed

    Sorokina, Elena M; Feinstein, Sheldon I; Milovanova, Tatyana N; Fisher, Aron B

    2009-11-01

    Peroxiredoxin 6 (Prdx6), an enzyme with glutathione peroxidase and PLA2 (aiPLA2) activities, is highly expressed in respiratory epithelium, where it participates in phospholipid turnover and antioxidant defense. Prdx6 has been localized by immunocytochemistry and subcellular fractionation to acidic organelles (lung lamellar bodies and lysosomes) and cytosol. On the basis of their pH optima, we have postulated that protein subcellular localization determines the balance between the two activities of Prdx6. Using green fluorescent protein-labeled protein expression in alveolar epithelial cell lines, we showed Prdx6 localization to organellar structures resembling lamellar bodies in mouse lung epithelial (MLE-12) cells and lysosomes in A549 cells. Localization within lamellar bodies/lysosomes was in the luminal compartment. Targeting to lysosome-like organelles was abolished by the deletion of amino acids 31-40 from the Prdx6 NH2-terminal region; deletion of the COOH-terminal region had no effect. A green fluorescent protein-labeled peptide containing only amino acids 31-40 showed lysosomal targeting that was abolished by mutation of S32 or G34 within the peptide. Studies with mutated protein indicated that lipid binding was not necessary for Prdx6 targeting. This peptide sequence has no homology to known organellar targeting motifs. These studies indicate that the localization of Prdx6 in acidic organelles and consequent PLA2 activity depend on a novel 10-aa peptide located at positions 31-40 of the protein. PMID:19700648

  12. From Amino Acid to Glucosinolate Biosynthesis: Protein Sequence Changes in the Evolution of Methylthioalkylmalate Synthase in Arabidopsis[W][OA

    PubMed Central

    de Kraker, Jan-Willem; Gershenzon, Jonathan

    2011-01-01

    Methylthioalkylmalate synthase (MAM) catalyzes the committed step in the side chain elongation of Met, yielding important precursors for glucosinolate biosynthesis in Arabidopsis thaliana and other Brassicaceae species. MAM is believed to have evolved from isopropylmalate synthase (IPMS), an enzyme involved in Leu biosynthesis, based on phylogenetic analyses and an overlap of catalytic abilities. Here, we investigated the changes in protein structure that have occurred during the recruitment of IPMS from amino acid to glucosinolate metabolism. The major sequence difference between IPMS and MAM is the absence of 120 amino acids at the C-terminal end of MAM that constitute a regulatory domain for Leu-mediated feedback inhibition. Truncation of this domain in Arabidopsis IPMS2 results in loss of Leu feedback inhibition and quaternary structure, two features common to MAM enzymes, plus an 8.4-fold increase in the kcat/Km for a MAM substrate. Additional exchange of two amino acids in the active site resulted in a MAM-like enzyme that had little residual IPMS activity. Hence, combination of the loss of the regulatory domain and a few additional amino acid exchanges can explain the evolution of MAM from IPMS during its recruitment from primary to secondary metabolism. PMID:21205930

  13. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    SciTech Connect

    Xie, Gary; Dalin, Eileen; Tice, Hope; Chertkov, Olga; Land, Miriam L

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 C and pH 5.0 and fer-ments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemi-cellulose. This bacterium is also considered as a potential probiotic. Complete genome squence of a representative strain, B. coagulans strain 36D1, is presented and discussed.

  14. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    SciTech Connect

    Rhee, Mun Su; Moritz, Brelan E.; Xie, Gary; Glavina Del Rio, Tijana; Dalin, Eileen; Tice, Hope; Bruce, David; Goodwin, Lynne A.; Chertkov, Olga; Brettin, Thomas S; Han, Cliff; Detter, J. Chris; Pitluck, Sam; Land, Miriam L; Patel, Milind; Ou, Mark; Harbrucker, Roberta; Ingram, Lonnie O.; Shanmugam, Keelnathan T.

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 C and pH 5.0 and fer- ments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this spo- rogenic lactic acid bacterium to grow at 50-55 C and pH 5.0 makes this organism an attrac- tive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemi- cellulose. This bacterium is also considered as a potential probiotic. Complete genome se- quence of a representative strain, B. coagulans strain 36D1, is presented and discussed.

  15. Diverse Bacterial PKS Sequences Derived From Okadaic Acid-Producing Dinoflagellates

    PubMed Central

    Perez, Roberto; Liu, Li; Lopez, Jose; An, Tianying; Rein, Kathleen S.

    2008-01-01

    Okadaic acid (OA) and the related dinophysistoxins are isolated from dinoflagellates of the genus Prorocentrum and Dinophysis. Bacteria of the Roseobacter group have been associated with okadaic acid producing dinoflagellates and have been previously implicated in OA production. Analysis of 16S rRNA libraries reveals that Roseobacter are the most abundant bacteria associated with OA producing dinoflagellates of the genus Prorocentrum and are not found in association with non-toxic dinoflagellates. While some polyketide synthase (PKS) genes form a highly supported Prorocentrum clade, most appear to be bacterial, but unrelated to Roseobacter or Alpha-Proteobacterial PKSs or those derived from other Alveolates Karenia brevis or Crytosporidium parvum. PMID:18728765

  16. Complete Genome Sequence of Moraxella osloensis Strain KMC41, a Producer of 4-Methyl-3-Hexenoic Acid, a Major Malodor Compound in Laundry.

    PubMed

    Goto, Takatsugu; Hirakawa, Hideki; Morita, Yuji; Tomida, Junko; Sato, Jun; Matsumura, Yuta; Mitani, Asako; Niwano, Yu; Takeuchi, Kohei; Kubota, Hiromi; Kawamura, Yoshiaki

    2016-01-01

    We report the complete genome sequence of Moraxella osloensis strain KMC41, isolated from laundry with malodor. The KMC41 genome comprises a 2,445,556-bp chromosome and three plasmids. A fatty acid desaturase and at least four β-oxidation-related genes putatively associated with 4-methyl-3-hexenoic acid generation were detected in the KMC41 chromosome. PMID:27445387

  17. Complete Genome Sequence of Moraxella osloensis Strain KMC41, a Producer of 4-Methyl-3-Hexenoic Acid, a Major Malodor Compound in Laundry

    PubMed Central

    Hirakawa, Hideki; Morita, Yuji; Tomida, Junko; Sato, Jun; Matsumura, Yuta; Mitani, Asako; Niwano, Yu; Takeuchi, Kohei; Kubota, Hiromi; Kawamura, Yoshiaki

    2016-01-01

    We report the complete genome sequence of Moraxella osloensis strain KMC41, isolated from laundry with malodor. The KMC41 genome comprises a 2,445,556-bp chromosome and three plasmids. A fatty acid desaturase and at least four β-oxidation-related genes putatively associated with 4-methyl-3-hexenoic acid generation were detected in the KMC41 chromosome. PMID:27445387

  18. The Fugitive Literature of Acid Rain: Making Use of Nonconventional Information Sources in a Vertical File.

    ERIC Educational Resources Information Center

    Lovenburg, Susan L.; Stoss, Frederick W.

    1988-01-01

    Discusses the advantages of vertical file collections for nonconventional literature, and describes the classification scheme used for fugitive literature by the Acid Rain Information Clearinghouse at the Center for Environmental Information. An annotated list of organizations and examples of titles they offer is provided. (8 notes with…

  19. cDNA cloning and structural characterization of a lectin from the mussel Crenomytilus grayanus with a unique amino acid sequence and antibacterial activity.

    PubMed

    Kovalchuk, Svetlana N; Chikalovets, Irina V; Chernikov, Oleg V; Molchanova, Valentina I; Li, Wei; Rasskazov, Valery A; Lukyanov, Pavel A

    2013-10-01

    An amino acid sequence of GalNAc/Gal-specific lectin from the mussel Crenomytilus grayanus (CGL) was determined by cDNA sequencing. CGL consists of 150 amino acid residues, contains three tandem repeats with high sequence similarities to each other (up to 73%) and does not belong to any known lectins family. According to circular dichroism results CGL is a β/α-protein with the predominance of β-structure. CGL was predicted to adopt a ß-trefoil fold. The lectin exhibits antibacterial activity and might be involved in the recognition and clearance of bacterial pathogens in the shellfish. PMID:23886951

  20. Snake venom toxins. The amino acid sequence of toxin Vi2, a homologue of pancreatic trypsin inhibitor, from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Strydom, D J

    1977-04-25

    The amino acid sequence of venom component Vi2, a protein of low toxicity from Dendroaspis polylepis polylepis venom was determined by automatic sequence analysis in combination with sequence studies on tryptic peptides. This protein, the most retarded fraction of this venom on a cation-exchange resin, is a homologue of bovine pancreatic trypsin inhibitor consisting of a single chain of 57 amino acid residues containing six half-cystine residues. The active site lysyl residue of bovine trypsin inhibitor is conserved in Vi2 although large differences are found in the rest of the molecule. PMID:857902

  1. The complete amino acid sequence of the major Kunitz trypsin inhibitor from the seeds of Prosopsis juliflora.

    PubMed

    Negreiros, A N; Carvalho, M M; Xavier Filho, J; Blanco-Labra, A; Shewry, P R; Richardson, M

    1991-01-01

    The major inhibitor of trypsin in seeds of Prosopsis juliflora was purified by precipitation with ammonium sulphate, ion-exchange column chromatography on DEAE- and CM-Sepharose and preparative reverse phase HPLC on a Vydac C-18 column. The protein inhibited trypsin in the stoichiometric ratio of 1:1, but had only weak activity against chymotrypsin and did not inhibit human salivary or porcine pancreatic alpha-amylases. SDS-PAGE indicated that the inhibitor has a Mr of ca 20,000, and IEF-PAGE showed that the pI is 8.8. The complete amino acid sequence was determined by automatic degradation, and by DABITC/PITC microsequence analysis of peptides obtained from enzyme digestions of the reduced and S-carboxymethylated protein with trypsin, chymotrypsin, elastase, the Glu-specific protease from S. aureus and the Lys-specific protease from Lysobacter enzymogenes. The inhibitor consisted of two polypeptide chains, of 137 residues (alpha chain) and 38 residues (beta chain) linked together by a single disulphide bond. The amino acid sequence of the protein exhibited homology with a number of Kunitz proteinase inhibitors from other legume seeds, the bifunctional subtilisin/alpha-amylase inhibitors from cereals and the taste-modifying protein miraculin. PMID:1367792

  2. Isolation and complete amino acid sequence of two fibrinolytic proteinases from the toxic Saturnid caterpillar Lonomia achelous.

    PubMed

    Amarant, T; Burkhart, W; LeVine, H; Arocha-Pinango, C L; Parikh, I

    1991-08-30

    The major toxic and fibrinolytic activity of the saliva and hemolymph of the larval form of Lonomia achelous was purified to homogeneity by a combination of metal chelate and affinity chromatography. Two apparent isozymes, Achelase I (213 amino acids, pIcalc = 10.55) and Achelase II (214 amino acids, pIcalc = 8.51), were sequenced by automated Edman degradation, and their C-termini confirmed by Fourier-transform mass spectrometry. The calculated molecular weights (22,473 and 22,727) correspond well to Mr estimates of 24,000 by SDS-PAGE. No carbohydrate was detected during sequencing. The enzymes degraded all three chains of fibrin, alpha greater than beta much greater than gamma, yielding a fragmentation pattern indistinguishable from that produced by trypsin. Chromogenic peptides S-2222 (Factor Xa and trypsin), S-2251 (plasmin), S-2302 (kallikrein) and S-2444 (urokinase) were substrates while S-2288 (broad range of serine proteinases including thrombin) was not hydrolyzed. Among a range of inhibitors Hg+2, aminophenylmercuriacetate, leupeptin, antipain and E-64 but not N-ethylmaleimide or iodoacetate abolished the activity of the purified isozymes against S-2444. Phenylmethylsulfonyl fluoride, soybean trypsin inhibitor and aprotinin were less effective. The presence of the classic catalytic triad (histidine-41, aspartate-86 and serine-189) suggests that Achelases I and II may be serine proteinases, but with a potentially free cysteine-185 which could react with thiol proteinase-directed reagents. PMID:1911844

  3. ISHAN: sequence homology analysis package.

    PubMed

    Shil, Pratip; Dudani, Niraj; Vidyasagar, Pandit B

    2006-01-01

    Sequence based homology studies play an important role in evolutionary tracing and classification of proteins. Various methods are available to analyze biological sequence information. However, with the advent of proteomics era, there is a growing demand for analysis of huge amount of biological sequence information, and it has become necessary to have programs that would provide speedy analysis. ISHAN has been developed as a homology analysis package, built on various sequence analysis tools viz FASTA, ALIGN, CLUSTALW, PHYLIP and CODONW (for DNA sequences). This JAVA application offers the user choice of analysis tools. For testing, ISHAN was applied to perform phylogenetic analysis for sets of Caspase 3 DNA sequences and NF-kappaB p105 amino acid sequences. By integrating several tools it has made analysis much faster and reduced manual intervention. PMID:17274766

  4. Cloning and nucleotide sequencing of a novel 7 beta-(4-carboxybutanamido)cephalosporanic acid acylase gene of Bacillus laterosporus and its expression in Escherichia coli and Bacillus subtilis.

    PubMed

    Aramori, I; Fukagawa, M; Tsumura, M; Iwami, M; Ono, H; Kojo, H; Kohsaka, M; Ueda, Y; Imanaka, H

    1991-12-01

    A strain of Bacillus species which produced an enzyme named glutaryl 7-ACA acylase which converts 7 beta-(4-carboxybutanamido)cephalosporanic acid (glutaryl 7-ACA) to 7-amino cephalosporanic acid (7-ACA) was isolated from soil. The gene for the glutaryl 7-ACA acylase was cloned with pHSG298 in Escherichia coli JM109, and the nucleotide sequence was determined by the M13 dideoxy chain termination method. The DNA sequence revealed only one large open reading frame composed of 1,902 bp corresponding to 634 amino acid residues. The deduced amino acid sequence contained a potential signal sequence in its amino-terminal region. Expression of the gene for glutaryl 7-ACA acylase was performed in both E. coli and Bacillus subtilis. The enzyme preparations purified from either recombinant strain of E. coli or B. subtilis were shown to be identical with each other as regards the profile of sodium dodecyl sulfate-polyacrylamide gel electrophoresis and were composed of a single peptide with the molecular size of 70 kDa. Determination of the amino-terminal sequence of the two enzyme preparations revealed that both amino-terminal sequences (the first nine amino acids) were identical and completely coincided with residues 28 to 36 of the open reading frame. Extracellular excretion of the enzyme was observed in a recombinant strain of B. subtilis. PMID:1744041

  5. Genotyping by sequencing resolves shallow population structure to inform conservation of Chinook salmon (Oncorhynchus tshawytscha)

    PubMed Central

    Larson, Wesley A; Seeb, Lisa W; Everett, Meredith V; Waples, Ryan K; Templin, William D; Seeb, James E

    2014-01-01

    Recent advances in population genomics have made it possible to detect previously unidentified structure, obtain more accurate estimates of demographic parameters, and explore adaptive divergence, potentially revolutionizing the way genetic data are used to manage wild populations. Here, we identified 10 944 single-nucleotide polymorphisms using restriction-site-associated DNA (RAD) sequencing to explore population structure, demography, and adaptive divergence in five populations of Chinook salmon (Oncorhynchus tshawytscha) from western Alaska. Patterns of population structure were similar to those of past studies, but our ability to assign individuals back to their region of origin was greatly improved (>90% accuracy for all populations). We also calculated effective size with and without removing physically linked loci identified from a linkage map, a novel method for nonmodel organisms. Estimates of effective size were generally above 1000 and were biased downward when physically linked loci were not removed. Outlier tests based on genetic differentiation identified 733 loci and three genomic regions under putative selection. These markers and genomic regions are excellent candidates for future research and can be used to create high-resolution panels for genetic monitoring and population assignment. This work demonstrates the utility of genomic data to inform conservation in highly exploited species with shallow population structure. PMID:24665338

  6. Discrimination of soluble and aggregation-prone proteins based on sequence information.

    PubMed

    Fang, Yaping; Fang, Jianwen

    2013-04-01

    Understanding the factors governing protein solubility is a key to grasp the mechanisms of protein solubility and may provide insight into protein aggregation and misfolding related diseases such as Alzheimer's disease. In this work, we attempt to identify factors important to protein solubility using feature selection. Firstly, we calculate 1438 features including physicochemical properties and statistics for each protein. Random Forest algorithm is used to select the most informative and the minimal subset of features based on their predictive performance. A predictive model is built based on 17 selected features. Compared with previous models, our model achieves better performance with a sensitivity of 0.82, specificity 0.85, ACC 0.84, AUC 0.91 and MCC 0.67. Furthermore, a model using a redundancy-reduced dataset (sequence identity <= 30%) achieves the same performance as the model without redundancy reduction. Our results provide not only a reliable model for predicting protein solubility but also a list of features important to protein solubility. The predictive model is implemented as a freely available web application at . PMID:23440081

  7. Genotyping by sequencing resolves shallow population structure to inform conservation of Chinook salmon (Oncorhynchus tshawytscha).

    PubMed

    Larson, Wesley A; Seeb, Lisa W; Everett, Meredith V; Waples, Ryan K; Templin, William D; Seeb, James E

    2014-03-01

    Recent advances in population genomics have made it possible to detect previously unidentified structure, obtain more accurate estimates of demographic parameters, and explore adaptive divergence, potentially revolutionizing the way genetic data are used to manage wild populations. Here, we identified 10 944 single-nucleotide polymorphisms using restriction-site-associated DNA (RAD) sequencing to explore population structure, demography, and adaptive divergence in five populations of Chinook salmon (Oncorhynchus tshawytscha) from western Alaska. Patterns of population structure were similar to those of past studies, but our ability to assign individuals back to their region of origin was greatly improved (>90% accuracy for all populations). We also calculated effective size with and without removing physically linked loci identified from a linkage map, a novel method for nonmodel organisms. Estimates of effective size were generally above 1000 and were biased downward when physically linked loci were not removed. Outlier tests based on genetic differentiation identified 733 loci and three genomic regions under putative selection. These markers and genomic regions are excellent candidates for future research and can be used to create high-resolution panels for genetic monitoring and population assignment. This work demonstrates the utility of genomic data to inform conservation in highly exploited species with shallow population structure. PMID:24665338

  8. Genomic signatures in viral sequences by in-frame and out-frame mutual information.

    PubMed

    Serrano-Solís, Víctor; Cocho, Germinal; José, Marco V

    2016-08-21

    In order to understand the unique biology of viruses, we use the Mutual Information Function (MIF) to characterize 792 viral sequences comprising 458 viral whole genomes. A 3-base periodicity (3-bp) was observed only in DNA-viruses whereas RNA-viruses showed irregular patterns. The correlation of MIF values at frequencies of 3-bp (in-frame) with frequencies of 4 and 5bps (out-frame), turned out to be useful to distinguish viruses according to their respective taxonomic order, and whether they pertain to any of the three different kingdoms, Eubacteria, Archaea and Eukarya. The clustering of viruses was carried out by the use of a new statistics, namely, the pair of in- and out-frame values of the MIF. The clustering thus obtained turned out to be entirely consistent with the current viral taxonomy. As a result we were able to compare in a single plot both viral and cellular genomes unlike any given phylogenetic reconstruction. PMID:27178876

  9. Nucleic acid-binding molecules with high affinity and base sequence specificity: intercalating agents covalently linked to oligodeoxynucleotides.

    PubMed Central

    Asseline, U; Delarue, M; Lancelot, G; Toulmé, F; Thuong, N T; Montenay-Garestier, T; Hélène, C

    1984-01-01

    Oligodeoxyribonucleotides covalently linked to an intercalating agent via a polymethylene linker were synthesized. Oligothymidylates attached to an acridine dye (Acr) through the 3'-phosphate group [(Tp)n(CH2) mAcr ] specifically interact with the complementary sequence. The interaction is strongly stabilized by the intercalating agent. By using absorption and fluorescence spectroscopies, it is shown that complex formation between (Tp)n(CH2) mAcr and poly(rA) involves the formation of n A X T base pairs, where n is the number of thymines in the oligonucleotide. The acridine ring intercalates between A X T base pairs. Fluorescence excitation spectra reveal the existence of two environments for the acridine ring, whose relative contributions depend on the linker length (m). The binding of (Tp)4(CH2) mAcr to poly(rA) is analyzed in terms of site binding and cooperative interactions between oligonucleotides along the polynucleotide lattice. Thermodynamic parameters show that the covalent attachment of the acridine ring strongly stabilizes the binding of the oligonucleotide to its complementary sequence. The stabilization depends on the linker length; the compound with m = 5 gives a more stable complex than that with m = 3. These results open the way to the synthesis of a family of molecules exhibiting both high-affinity and high-specificity for a nucleic acid base sequence. PMID:6587350

  10. Pseudomonas sp. strain CA5 (a selenite-reducing bacterium) 16S rRNA gene complete sequence. National Institute of Health, National Center for Biotechnology Information, GenBank sequence. Accession FJ422810.1.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This study used 1321 base pair 16S rRNA gene sequence methods to confirm the phylogenetic position of a soil isolate as a bacterium belonging to the genus Pesudomonas sp. Morphological, biochemical characteristics, and fatty acid profiles are consistent with the 16S rRNA gene sequence identification...

  11. Amino acid sequence and posttranslational modifications of human factor VII sub a from plasma and transfected baby hamster kidney cells

    SciTech Connect

    Thim, L.; Bjoern, S.; Christensen, M.; Nicolaisen, E.M.; Lund-Hansen, T.; Pedersen, A.H.; Hedner, U. )

    1988-10-04

    Blood coagulation factor VII is a vitamin K dependent glycoprotein which in its activated form, factor VII{sub a}, participates in the coagulation process by activating factor X and/or factor IX in the presence of Ca{sup 2+} and tissue factor. Three types of potential posttranslational modifications exist in the human factor VII{sub a} molecule, namely, 10 {gamma}-carboxylated, N-terminally located glutamic acid residues, 1 {beta}-hydroxylated aspartic acid residue, and 2 N-glycosylated asparagine residues. In the present study, the amino acid sequence and posttranslational modifications of recombinant factor VII{sub a} as purified from the culture medium of a transfected baby hamster kidney cell line have been compared to human plasma factor VII{sub a}. By use of HPLC, amino acid analysis, peptide mapping, and automated Edman degradation, the protein backbone of recombinant factor VII{sub a} was found to be identical with human factor VII{sub a}. Asparagine residues 145 and 322 were found to be fully N-glycosylated in human plasma factor VII{sub a}. In the recombinant factor VII{sub a}, asparagine residue 322 was fully glycosylated whereas asparagine residue 145 was only partially (approximately 66%) glycosylated. Besides minor differences in the sialic acid and fucose contents, the overall carbohydrate compositions were nearly identical in recombinant factor VII{sub a} and human plasma factor VII{sub a}. These results show that factor VII{sub a} as produced in the transfected baby hamster kidney cells is very similar to human plasma factor VII{sub a} and that this cell line thus might represent an alternative source for human factor VII{sub a}.

  12. Complete Genome Sequence of the Amino Acid-Fermenting Clostridium propionicum X2 (DSM 1682)

    PubMed Central

    Poehlein, Anja; Schlien, Katja; Chowdhury, Nilanjan Pal; Gottschalk, Gerhard; Buckel, Wolfgang

    2016-01-01

    Clostridium propionicum is a strict anaerobic, Gram positive, rod-shaped bacterium that belongs to the clostridial cluster XIVb. The genome consists of one replicon (3.1 Mb) and harbors 2,936 predicted protein-encoding genes. The genome encodes all enzymes required for fermentation of the amino acids α-alanine, β-alanine, serine, threonine, and methionine. PMID:27081148

  13. Purification, characterization, and complete amino acid sequence of a trypsin inhibitor from amaranth (Amaranthus hypochondriacus) seeds.

    PubMed Central

    Valdes-Rodriguez, S; Segura-Nieto, M; Chagolla-Lopez, A; Verver y Vargas-Cortina, A; Martinez-Gallardo, N; Blanco-Labra, A

    1993-01-01

    A protein proteinase inhibitor was purified from a seed extract of amaranth (Amaranthus hypochondriacus) by precipitation with (NH4)2SO4, gel-filtration chromatography, ion-exchange chromatography, and reverse-phase high-performance liquid chromatography. It is a 69-amino acid protein with a high content of valine, arginine, and glutamic acid, but lacking in methionine. The inhibitor has a relative molecular weight of 7400 and an isoelectric point of 7.5. It is a serine proteinase inhibitor that recognizes chymotrypsin, trypsin, and trypsin-like proteinase activities extracted from larvae of the insect Prostephanus truncatus. This inhibitor belongs to the potato-I inhibitor family, showing the closest homology (59.5%) with the Lycopersicum peruvianum trypsin inhibitor, and (51%) with the proteinase inhibitor 5 extracted from the seeds of Cucurbita maxima. The position of the lysine-aspartic acid residues present in the active site of the amaranth inhibitor are found in almost the same relative position as in the inhibitor from C. maxima. PMID:8290633

  14. An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads

    PubMed Central

    2013-01-01

    Background Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly. Results Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies. Conclusions Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies. PMID:24564333

  15. Peptide vaccine against canine parvovirus: identification of two neutralization subsites in the N terminus of VP2 and optimization of the amino acid sequence.

    PubMed

    Casal, J I; Langeveld, J P; Cortés, E; Schaaper, W W; van Dijk, E; Vela, C; Kamstrup, S; Meloen, R H

    1995-11-01

    The N-terminal domain of the major capsid protein VP2 of canine parvovirus was shown to be an excellent target for development of a synthetic peptide vaccine, but detailed information about number of epitopes, optimal length, sequence choice, and site of coupling to the carrier protein was lacking. Therefore, several overlapping peptides based on this N terminus were synthesized to establish conditions for optimal and reproducible induction of neutralizing antibodies in rabbits. The specificity and neutralizing ability of the antibody response for these peptides were determined. Within the N-terminal 23 residues of VP2, two subsites able to induce neutralizing antibodies and which overlapped by only two glycine residues at positions 10 and 11 could be discriminated. The shortest sequence sufficient for neutralization induction was nine residues. Peptides longer than 13 residues consistently induced neutralization, provided that their N termini were located between positions 1 and 11 of VP2. The orientation of the peptides at the carrier protein was also of importance, being more effective when coupled through the N terminus than through the C terminus to keyhole limpet hemocyanin. The results suggest that the presence of amino acid residues 2 to 21 (and probably 3 to 17) of VP2 in a single peptide is preferable for a synthetic peptide vaccine. PMID:7474152

  16. High Genetic Diversity among Strains of the Unindustrialized Lactic Acid Bacterium Carnobacterium maltaromaticum in Dairy Products as Revealed by Multilocus Sequence Typing

    PubMed Central

    Rahman, Abdur; Cailliez-Grimal, Catherine; Bontemps, Cyril; Payot, Sophie; Chaillou, Stéphane; Revol-Junelles, Anne-Marie

    2014-01-01

    Dairy products are colonized with three main classes of lactic acid bacteria (LAB): opportunistic bacteria, traditional starters, and industrial starters. Most of the population structure studies were previously performed with LAB species belonging to these three classes and give interesting knowledge about the population structure of LAB at the stage where they are already industrialized. However, these studies give little information about the population structure of LAB prior their use as an industrial starter. Carnobacterium maltaromaticum is a LAB colonizing diverse environments, including dairy products. Since this bacterium was discovered relatively recently, it is not yet commercialized as an industrial starter, which makes C. maltaromaticum an interesting model for the study of unindustrialized LAB population structure in dairy products. A multilocus sequence typing scheme based on an analysis of fragments of the genes dapE, ddlA, glpQ, ilvE, pyc, pyrE, and leuS was applied to a collection of 47 strains, including 28 strains isolated from dairy products. The scheme allowed detecting 36 sequence types with a discriminatory index of 0.98. The whole population was clustered in four deeply branched lineages, in which the dairy strains were spread. Moreover, the dairy strains could exhibit a high diversity within these lineages, leading to an overall dairy population with a diversity level as high as that of the nondairy population. These results are in agreement with the hypothesis according to which the industrialization of LAB leads to a diversity reduction in dairy products. PMID:24747901

  17. Inferences from protein and nucleic acid sequences - Early molecular evolution, divergence of kingdoms and rates of change

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.; Barker, W. C.; Mclaughlin, P. J.

    1974-01-01

    Description of new sensitive, objective methods for establishing the probable common ancestry of very distantly related sequences and the quantitative evolutionary change which has taken place. These methods are applied to four families of proteins and nucleic acids and evolutionary trees will be derived where possible. Of the three families containing duplications of genetic material, two are nucleic acids: transfer RNA and 5S ribosomal RNA. Both of these structures are functional in the synthesis of coded proteins, and prototypes must have been present in the cell at the inception of the fundamental coding process that all living things share. There are many types of tRNA which recognize the various nucleotide triplets and the 20 amino acids. These types are thought to have arisen as a result of many gene duplications. Relationships among these types are discussed. The 5S ribosomal RNA, presently functional in both eukaryotes and prokaryotes, is very likely descended from an early form incorporating almost a complete duplication of genetic material. The amount of evolution in the various lines can again be compared. The other two families containing duplications are proteins; ferredoxin and cytochrome c.

  18. Amino acid sequence alignment of bacterial and mammalian pancreatic serine proteases based on topological equivalences.

    PubMed

    James, M N; Delbaere, L T; Brayer, G D

    1978-06-01

    The three-dimensional structures of the bacterial serine proteases SGPA, SGPB, and alpha-lytic protease have been compared with those of the pancreatic enzymes alpha-chymotrypsin and elastase. This comparison shows that approximately 60% (55-64%) of the alpha-carbon atom positions of the bacterial serine proteases are topologically equivalent to the alpha-carbon atom positions of the pancreatic enzymes. The corresponding value for a comparison of the bacterial enzymes among themselves is approximately 84%. The results of these topological comparisons have been used to deduce an experimentally sound sequence alignment for these several enzymes. This alignment shows that there is extensive tertiary structural homology among the bacteria and pancreatic enzymes without significant primary sequence identity (less than 21%). The acquisition of a zymogen function by the pancreatic enzymes is accompanied by two major changes to the bacterial enzymes' architecture: an insertion of 9 residues to increase the length of the N-terminal loop, and one of 12 residues to a loop near the activation salt bridge. In addition, in these two enzyme families, the methionine loop (residues 164-182) adopts very different comformations which are associated with their altered substrate specificities. PMID:96920

  19. Genome Sequence of Anaerobacillus macyae JMM-4T (DSM 16346), the First Genomic Information of the Newly Established Genus Anaerobacillus

    PubMed Central

    Wang, Jie-ping; Liu, Guo-hong; Ge, Ci-bin; Chen, Qian-qian; Zhu, Yu-jing; Chen, Zheng

    2015-01-01

    Anaerobacillus macyae JMM-4T (DSM 16346) is a Gram-positive, spore-forming, strictly anaerobic, and arsenate-respiring bacterium. Here, we report the 4.26-Mb genome sequence of A. macyae JMM-4T, which is the first genome information of the newly established genus Anaerobacillus. PMID:26272580

  20. Genome Sequence of Anaerobacillus macyae JMM-4T (DSM 16346), the First Genomic Information of the Newly Established Genus Anaerobacillus.

    PubMed

    Wang, Jie-Ping; Liu, Bo; Liu, Guo-Hong; Ge, Ci-Bin; Chen, Qian-Qian; Zhu, Yu-Jing; Chen, Zheng

    2015-01-01

    Anaerobacillus macyae JMM-4(T) (DSM 16346) is a Gram-positive, spore-forming, strictly anaerobic, and arsenate-respiring bacterium. Here, we report the 4.26-Mb genome sequence of A. macyae JMM-4(T), which is the first genome information of the newly established genus Anaerobacillus. PMID:26272580

  1. Automated extraction of typing information for bacterial pathogens from whole genome sequence data: Neisseria meningitidis as an exemplar.

    PubMed

    Jolley, K A; Maiden, M C

    2013-01-01

    Whole genome sequence (WGS) data are increasingly used to characterise bacterial pathogens. These data provide detailed information on the genotypes and likely phenotypes of aetiological agents, enabling the relationships of samples from potential disease outbreaks to be established precisely. However, the generation of increasing quantities of sequence data does not, in itself, resolve the problems that many microbiological typing methods have addressed over the last 100 years or so; indeed, providing large volumes of unstructured data can confuse rather than resolve these issues. Here we review the nascent field of storage of WGS data for clinical application and show how curated sequence-based typing schemes on websites have generated an infrastructure that can exploit WGS for bacterial typing efficiently. We review the tools that have been implemented within the PubMLST website to extract clinically useful, strain-characterisation information that can be provided to physicians and public health professionals in a timely, concise and understandable way. These data can be used to inform medical decisions such as how to treat a patient, whether to instigate public health action, and what action might be appropriate. The information is compatible both with previous sequence-based typing data and also with data obtained in the absence of WGS, providing a flexible infrastructure for WGS-based clinical microbiology. PMID:23369391

  2. DNA sequence of the control region of phage D108: the N-terminal amino acid sequences of repressor and transposase are similar both in phage D108 and in its relative, phage Mu.

    PubMed Central

    Mizuuchi, M; Weisberg, R A; Mizuuchi, K

    1986-01-01

    We have determined the DNA sequence of the control region of phage D108 up to position 1419 at the left end of the phage genome. Open reading frames for the repressor gene, ner gene, and the 5' part of the A gene (which codes for transposase) are found in the sequence. The genetic organization of this region of phage D108 is quite similar to that of phage Mu in spite of considerable divergence, both in the nucleotide sequence and in the amino acid sequences of the regulatory proteins of the two phages. The N-terminal amino acid sequences of the transposases of the two phages also share only limited homology. On the other hand, a significant amino acid sequence homology was found within each phage between the N-terminal parts of the repressor and transposase. We propose that the N-terminal domains of the repressor and transposase of each phage interact functionally in the process of making the decision between the lytic and the lysogenic mode of growth. PMID:3012481

  3. Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information

    PubMed Central

    Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

    2016-01-01

    3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids. PMID:27467780

  4. Analysis of a nucleotide-binding site of 5-lipoxygenase by affinity labelling: binding characteristics and amino acid sequences.

    PubMed Central

    Zhang, Y Y; Hammarberg, T; Radmark, O; Samuelsson, B; Ng, C F; Funk, C D; Loscalzo, J

    2000-01-01

    5-Lipoxygenase (5LO) catalyses the first two steps in the biosynthesis of leukotrienes, which are inflammatory mediators derived from arachidonic acid. 5LO activity is stimulated by ATP; however, a consensus ATP-binding site or nucleotide-binding site has not been found in its protein sequence. In the present study, affinity and photoaffinity labelling of 5LO with 5'-p-fluorosulphonylbenzoyladenosine (FSBA) and 2-azido-ATP showed that 5LO bound to the ATP analogues quantitatively and specifically and that the incorporation of either analogue inhibited ATP stimulation of 5LO activity. The stoichiometry of the labelling was 1.4 mol of FSBA/mol of 5LO (of which ATP competed with 1 mol/mol) or 0.94 mol of 2-azido-ATP/mol of 5LO (of which ATP competed with 0.77 mol/mol). Labelling with FSBA prevented further labelling with 2-azido-ATP, indicating that the same binding site was occupied by both analogues. Other nucleotides (ADP, AMP, GTP, CTP and UTP) also competed with 2-azido-ATP labelling, suggesting that the site was a general nucleotide-binding site rather than a strict ATP-binding site. Ca(2+), which also stimulates 5LO activity, had no effect on the labelling of the nucleotide-binding site. Digestion with trypsin and peptide sequencing showed that two fragments of 5LO were labelled by 2-azido-ATP. These fragments correspond to residues 73-83 (KYWLNDDWYLK, in single-letter amino acid code) and 193-209 (FMHMFQSSWNDFADFEK) in the 5LO sequence. Trp-75 and Trp-201 in these peptides were modified by the labelling, suggesting that they were immediately adjacent to the C-2 position of the adenine ring of ATP. Given the stoichiometry of the labelling, the two peptide sequences of 5LO were probably near each other in the enzyme's tertiary structure, composing or surrounding the ATP-binding site of 5LO. PMID:11042125

  5. Internally Recurring Hippocampal Sequences as a Population Template of Spatiotemporal Information

    PubMed Central

    Villette, Vincent; Malvache, Arnaud; Tressard, Thomas; Dupuy, Nathalie; Cossart, Rosa

    2015-01-01

    Summary The hippocampus is essential for spatiotemporal cognition. Sequences of neuronal activation provide a substrate for this fundamental function. At the behavioral timescale, these sequences have been shown to occur either in the presence of successive external landmarks or through internal mechanisms within an episodic memory task. In both cases, activity is externally constrained by the organization of the task and by the size of the environment explored. Therefore, it remains unknown whether hippocampal activity can self-organize into a default mode in the absence of any external memory demand or spatiotemporal boundary. Here we show that, in the presence of self-motion cues, a population code integrating distance naturally emerges in the hippocampus in the form of recurring sequences. These internal dynamics clamp spontaneous travel since run distance distributes into integer multiples of the span of these sequences. These sequences may thus guide navigation when external landmarks are reduced. PMID:26494280

  6. Isolation and amino acid sequences of opossum vasoactive intestinal polypeptide and cholecystokinin octapeptide.

    PubMed Central

    Eng, J; Yu, J; Rattan, S; Yalow, R S

    1992-01-01

    Evolutionary history suggests that the marsupials entered South America from North America about 75 million years ago and subsequently dispersed into Australia before the separation between South America and Antarctica-Australia. A question of interest is whether marsupial peptides resemble the corresponding peptides of Old or New World mammals. Previous studies had shown that "little" gastrin of the North American marsupial, the opossum, is identical in length to that of the New World mammals, the guinea pig and chinchilla. In this report, we demonstrate that opossum cholecystokinin octapeptide, like that of the Australian marsupials, the Eastern quoll and the Tamar wallaby, is identical to the cholecystokinin octapeptide of Old World mammals and differs from that of the guinea pig and chinchilla. However, opossum vasoactive intestinal polypeptide differs from the usual Old World mammalian vasoactive intestinal polypeptide in five sites: [sequence; see text]. PMID:1542675

  7. Evolution of early life inferred from protein and ribonucleic acid sequences

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.; Schwartz, R. M.

    1978-01-01

    The chemical structures of ferredoxin, 5S ribosomal RNA, and c-type cytochrome sequences have been employed to construct a phylogenetic tree which connects all major photosynthesizing organisms: the three types of bacteria, blue-green algae, and chloroplasts. Anaerobic and aerobic bacteria, eukaryotic cytoplasmic components and mitochondria are also included in the phylogenetic tree. Anaerobic nonphotosynthesizing bacteria similar to Clostridium were the earliest organisms, arising more than 3.2 billion years ago. Bacterial photosynthesis evolved nearly 3.0 billion years ago, while oxygen-evolving photosynthesis, originating in the blue-green algal line, came into being about 2.0 billion years ago. The phylogenetic tree supports the symbiotic theory of the origin of eukaryotes.

  8. The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants.

    PubMed

    Yip, Yum L; Scheib, Holger; Diemand, Alexander V; Gattiker, Alexandre; Famiglietti, Livia M; Gasteiger, Elisabeth; Bairoch, Amos

    2004-05-01

    Missense mutation leading to single amino acid polymorphism (SAP) is the type of mutation most frequently related to human diseases. The Swiss-Prot protein knowledgebase records information on such mutations in various sections of a protein entry, namely in the "feature," "comment," and "reference" fields. To facilitate users in obtaining the most relevant information about each human SAP recorded in the knowledgebase, the Swiss-Prot Variant web pages were created to provide a summary of available sequence information, as well as additional structural information on each variant. In particular, the ModSNP database was set up to store information related to SAPs and to manage the modeling of SAPs onto protein structures via an automatic homology modeling pipeline. Currently, among the 16,566 human SAPs recorded in the Swiss-Prot knowledgebase (release 42.5, 21 November 2003), more than 25% have corresponding 3D-models. Of these variants, 47% are related to disease, 26% are polymorphisms, and 27% are not yet clearly classified. The ModSNP database is updated and the subsequent model construction pipeline is launched with each weekly Swiss-Prot release. Thus, the ModSNP database represents a valuable resource for the structural analysis of protein variation. The Swiss-Prot variant pages are accessible from the NiceProt view of a Swiss-Prot entry on the ExPASy server (www.expasy.org/), via a hyperlink created for the stable and unique identifier FTId of each human SAP. PMID:15108278

  9. A Possible Mechanism of Zika Virus Associated Microcephaly: Imperative Role of Retinoic Acid Response Element (RARE) Consensus Sequence Repeats in the Viral Genome.

    PubMed

    Kumar, Ashutosh; Singh, Himanshu N; Pareek, Vikas; Raza, Khursheed; Dantham, Subrahamanyam; Kumar, Pavan; Mochan, Sankat; Faiq, Muneeb A

    2016-01-01

    Owing to the reports of microcephaly as a consistent outcome in the fetuses of pregnant women infected with ZIKV in Brazil, Zika virus (ZIKV)-microcephaly etiomechanistic relationship has recently been implicated. Researchers, however, are still struggling to establish an embryological basis for this interesting causal handcuff. The present study reveals robust evidence in favor of a plausible ZIKV-microcephaly cause-effect liaison. The rationale is based on: (1) sequence homology between ZIKV genome and the response element of an early neural tube developmental marker "retinoic acid" in human DNA and (2) comprehensive similarities between the details of brain defects in ZIKV-microcephaly and retinoic acid embryopathy. Retinoic acid is considered as the earliest factor for regulating anteroposterior axis of neural tube and positioning of structures in developing brain through retinoic acid response elements (RARE) consensus sequence (5'-AGGTCA-3') in promoter regions of retinoic acid-dependent genes. We screened genomic sequences of already reported virulent ZIKV strains (including those linked to microcephaly) and other viruses available in National Institute of Health genetic sequence database (GenBank) for the RARE consensus repeats and obtained results strongly bolstering our hypothesis that ZIKV strains associated with microcephaly may act through precipitation of dysregulation in retinoic acid-dependent genes by introducing extra stretches of RARE consensus sequence repeats in the genome of developing brain cells. Additional support to our hypothesis comes from our findings that screening of other viruses for RARE consensus sequence repeats is positive only for those known to display neurotropism and cause fetal brain defects (for which maternal-fetal transmission during developing stage may be required). The numbers of RARE sequence repeats appeared to match with the virulence of screened positive viruses. Although, bioinformatic evidence and embryological

  10. Analyses of mitochondrial amino acid sequence datasets support the proposal that specimens of Hypodontus macropi from three species of macropodid hosts represent distinct species

    PubMed Central

    2013-01-01

    Background Hypodontus macropi is a common intestinal nematode of a range of kangaroos and wallabies (macropodid marsupials). Based on previous multilocus enzyme electrophoresis (MEE) and nuclear ribosomal DNA sequence data sets, H. macropi has been proposed to be complex of species. To test this proposal using independent molecular data, we sequenced the whole mitochondrial (mt) genomes of individuals of H. macropi from three different species of hosts (Macropus robustus robustus, Thylogale billardierii and Macropus [Wallabia] bicolor) as well as that of Macropicola ocydromi (a related nematode), and undertook a comparative analysis of the amino acid sequence datasets derived from these genomes. Results The mt genomes sequenced by next-generation (454) technology from H. macropi from the three host species varied from 13,634 bp to 13,699 bp in size. Pairwise comparisons of the amino acid sequences predicted from these three mt genomes revealed differences of 5.8% to 18%. Phylogenetic analysis of the amino acid sequence data sets using Bayesian Inference (BI) showed that H. macropi from the three different host species formed distinct, well-supported clades. In addition, sliding window analysis of the mt genomes defined variable regions for future population genetic studies of H. macropi in different macropodid hosts and geographical regions around Australia. Conclusions The present analyses of inferred mt protein sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. PMID:24261823

  11. QuShape: Rapid, accurate, and best-practices quantification of nucleic acid probing information, resolved by capillary electrophoresis

    PubMed Central

    Karabiber, Fethullah; McGinnis, Jennifer L.; Favorov, Oleg V.; Weeks, Kevin M.

    2013-01-01

    Chemical probing of RNA and DNA structure is a widely used and highly informative approach for examining nucleic acid structure and for evaluating interactions with protein and small-molecule ligands. Use of capillary electrophoresis to analyze chemical probing experiments yields hundreds of nucleotides of information per experiment and can be performed on automated instruments. Extraction of the information from capillary electrophoresis electropherograms is a computationally intensive multistep analytical process, and no current software provides rapid, automated, and accurate data analysis. To overcome this bottleneck, we developed a platform-independent, user-friendly software package, QuShape, that yields quantitatively accurate nucleotide reactivity information with minimal user supervision. QuShape incorporates newly developed algorithms for signal decay correction, alignment of time-varying signals within and across capillaries and relative to the RNA nucleotide sequence, and signal scaling across channels or experiments. An analysis-by-reference option enables multiple, related experiments to be fully analyzed in minutes. We illustrate the usefulness and robustness of QuShape by analysis of RNA SHAPE (selective 2′-hydroxyl acylation analyzed by primer extension) experiments. PMID:23188808

  12. Using Triple Helix Forming Peptide Nucleic Acids for Sequence-selective Recognition of Double-stranded RNA

    PubMed Central

    Hnedzko, Dziyana; Cheruiyot, Samwel K.; Rozners, Eriks

    2014-01-01

    Non-coding RNAs play important roles in regulation of gene expression. Specific recognition and inhibition of these biologically important RNAs that form complex double-helical structures will be highly useful for fundamental studies in biology and practical applications in medicine. This protocol describes a strategy developed in our laboratory for sequence-selective recognition of double-stranded RNA (dsRNA) using triple helix forming peptide nucleic acids (PNAs) that bind in the major grove of RNA helix. The strategy developed uses chemically modified nucleobases, such as 2-aminopyridine (M) that enables strong triple helical binding at physiologically relevant conditions, and 2-pyrimidinone (P) and 3-oxo-2,3-dihydropyridazine (E) that enable recognition of isolated pyrimidines in the purine rich strand of the RNA duplex. Detailed protocols for preparation of modified PNA monomers, solid-phase synthesis and HPLC purification of PNA oligomers, and measuring dsRNA binding affinity using isothermal titration calorimetry are included. PMID:25199637

  13. Nucleic acid sequences encoding D1 and D1/D2 domains of human coxsackievirus and adenovirus receptor (CAR)

    DOEpatents

    Freimuth, Paul I.

    2010-04-06

    The invention provides recombinant human CAR (coxsackievirus and adenovirus receptor) polypeptides which bind adenovirus. Specifically, polypeptides corresponding to adenovirus binding domain D1 and the entire extracellular domain of human CAR protein comprising D1 and D2 are provided. In another aspect, the invention provides nucleic acid sequences encoding these domains and expression vectors for producing the domains and bacterial cells containing such vectors. The invention also includes an isolated fusion protein comprised of the D1 polypeptide fused to a polypeptide which facilitates folding of D1 when expressed in bacteria. The functional D1 domain finds application in a therapeutic method for treating a patient infected with a CAR D1-binding virus, and also in a method for identifying an antiviral compound which interferes with viral attachment. The invention also provides a method for specifically targeting a cell for infection by a virus which binds to D1.

  14. Prediction of Residue Status to Be Protected or Not Protected From Hy-drogen Exchange Using Amino Acid Sequence Only.

    PubMed

    Nikita V, Dovidchenko; Oxana V, Galzitskaya

    2008-01-01

    We have outlined here some structural aspects of local flexibility. Important functional properties are related to flexible segments. We try to predict regions that have been shown to exhibit the highest probability of being folded in the equilibrium intermediate or native state and will be protected from hydrogen exchange using amino acid sequence only. Our approach FoldUnfold for the prediction of unstructured regions has been applied to seven different proteins. For 80% of the residues considered in this paper we can predict correctly their status: will they be protected or not from hydrogen exchange. An additional goal of our study is to assess whether properties inferred using the bioinformatics approach are easily applicable to predict behavior of proteins in solution. PMID:18949078

  15. Prediction of Residue Status to Be Protected or Not Protected From Hy-drogen Exchange Using Amino Acid Sequence Only

    PubMed Central

    Dovidchenko, Nikita V; Galzitskaya, Oxana V

    2008-01-01

    We have outlined here some structural aspects of local flexibility. Important functional properties are related to flexible segments. We try to predict regions that have been shown to exhibit the highest probability of being folded in the equilibrium intermediate or native state and will be protected from hydrogen exchange using amino acid sequence only. Our approach FoldUnfold for the prediction of unstructured regions has been applied to seven different proteins. For 80% of the residues considered in this paper we can predict correctly their status: will they be protected or not from hydrogen exchange. An additional goal of our study is to assess whether properties inferred using the bioinformatics approach are easily applicable to predict behavior of proteins in solution. PMID:18949078

  16. The enzymatic nature of an anonymous protein sequence cannot reliably be inferred from superfamily level structural information alone.

    PubMed

    Roche, Daniel Barry; Brüls, Thomas

    2015-05-01

    As the largest fraction of any proteome does not carry out enzymatic functions, and in order to leverage 3D structural data for the annotation of increasingly higher volumes of sequence data, we wanted to assess the strength of the link between coarse grained structural data (i.e., homologous superfamily level) and the enzymatic versus non-enzymatic nature of protein sequences. To probe this relationship, we took advantage of 41 phylogenetically diverse (encompassing 11 distinct phyla) genomes recently sequenced within the GEBA initiative, for which we integrated structural information, as defined by CATH, with enzyme level information, as defined by Enzyme Commission (EC) numbers. This analysis revealed that only a very small fraction (about 1%) of domain sequences occurring in the analyzed genomes was found to be associated with homologous superfamilies strongly indicative of enzymatic function. Resorting to less stringent criteria to define enzyme versus non-enzyme biased structural classes or excluding highly prevalent folds from the analysis had only modest effect on this proportion. Thus, the low genomic coverage by structurally anchored protein domains strongly associated to catalytic activities indicates that, on its own, the power of coarse grained structural information to infer the general property of being an enzyme is rather limited. PMID:25559918

  17. Primary structure of a histidine-rich proteolytic fragment of human ceruloplasmin. II. Amino acid sequence of the tryptic peptides.

    PubMed

    Kingston, I B; Kingston, B L; Putnam, F W

    1980-04-10

    Amino acid sequence studies of tryptic peptides isolated from a histidine-rich fragment (Cp F5) of human ceruloplasmin are described. Nineteen tryptic peptides were isolated from unmodified Cp F5 and five tryptic peptides were isolated from citraconylated Cp F5. These peptides, together with the cyanogen bromide fragments reported previously, allowed the assembly of the complete sequence of Cp F5. The fragment has 159 residues and a molecular weight of 18,650; it lacks carbohydrate, is rich in histidine, and contains 1 free cysteine that may be part of a copper-binding site. Human ceruloplasmin is a single polypeptide chain with a molecular weight of about 130,000 that is readily cleaved to large fragments by proteolytic enzymes; the relationships of Cp F5 to intact ceruloplasmin and to structural subunits earlier proposed is described. Cp F5 probably is an intact globular domain that is attached to the COOH-terminal end of ceruloplasmin by a labile interdomain peptide bond. PMID:6987230

  18. Immunoreactivity of polyclonal antibodies generated against the carboxy terminus of the predicted amino acid sequence of the Huntington disease gene

    SciTech Connect

    Alkatib, G.; Graham, R.; Pelmear-Telenius, A.

    1994-09-01

    A cDNA fragment spanning the 3{prime}-end of the Huntington disease gene (from 8052 to 9252) was cloned into a prokaryotic expression vector containing the E. Coli lac promoter and a portion of the coding sequence for {beta}-galactosidase. The truncated {beta}-galactosidase gene was cleaved with BamHl and fused in frame to the BamHl fragment of the Huntington disease gene 3{prime}-end. Expression analysis of proteins made in E. Coli revealed that 20-30% of the total cellular proteins was represented by the {beta}-galactosidase-huntingtin fusion protein. The identity of the Huntington disease protein amino acid sequences was confirmed by protein sequence analysis. Affinity chromatography was used to purify large quantities of the fusion protein from bacterial cell lysates. Affinity-purified proteins were used to immunize New Zealand white rabbits for antibody production. The generated polyclonal antibodies were used to immunoprecipitate the Huntington disease gene product expressed in a neuroblastoma cell line. In this cell line the antibodies precipitated two protein bands of apparent gel migrations of 200 and 150 kd which together, correspond to the calculated molecular weight of the Huntington disease gene product (350 kd). Immunoblotting experiments revealed the presence of a large precursor protein in the range of 350-750 kd which is in agreement with the predicted molecular weight of the protein without post-translational modifications. These results indicate that the huntingtin protein is cleaved into two subunits in this neuroblastoma cell line and implicate that cleavage of a large precursor protein may contribute to its biological activity. Experiments are ongoing to determine the precursor-product relationship and to examine the synthesis of the huntingtin protein in freshly isolated rat brains, and to determine cellular and subcellular distribution of the gene product.

  19. Ambient temperature detection of PCR amplicons with a novel sequence-specific nucleic acid lateral flow biosensor.

    PubMed

    Ang, Geik Yong; Yu, Choo Yee; Yean, Chan Yean

    2012-01-01

    In the field of diagnostics, molecular amplification targeting unique genetic signature sequences has been widely used for rapid identification of infectious agents, which significantly aids physicians in determining the choice of treatment as well as providing important epidemiological data for surveillance and disease control assessment. We report the development of a rapid nucleic acid lateral flow biosensor (NALFB) in a dry-reagent strip format for the sequence-specific detection of single-stranded polymerase chain reaction (PCR) amplicons at ambient temperature (22-25°C). The NALFB was developed in combination with a linear-after-the-exponential PCR assay and the applicability of this biosensor was demonstrated through detection of the cholera toxin gene from diarrheal-causing toxigenic Vibrio cholerae. Amplification using the advanced asymmetric PCR boosts the production of fluorescein-labeled single-stranded amplicons, allowing capture probes immobilized on the NALFB to hybridize specifically with complementary targets in situ on the strip. Subsequent visual formation of red lines is achieved through the binding of conjugated gold nanoparticles to the fluorescein label of the captured amplicons. The visual detection limit observed with synthetic target DNA was 0.3 ng and 1 pg with pure genomic DNA. Evaluation of the NALFB with 164 strains of V. cholerae and non-V. cholerae bacteria recorded 100% for both sensitivity and specificity. The whole procedure of the low-cost NALFB, which is performed at ambient temperature, eliminates the need for preheated buffers or additional equipment, greatly simplifying the protocol for sequence-specific PCR amplicon analysis. PMID:22705404

  20. EPSE Project 2: Designing and Evaluating Short Teaching Sequences, Informed by Research Evidence.

    ERIC Educational Resources Information Center

    Leach, John; Hind, Andy; Lewis, Jenny; Scott, Phil

    2002-01-01

    Reports on Project 2 from the Evidence-based Practice in Science Education (EPSE) Research Network. In this project, teachers and researchers worked collaboratively on the design of three short teaching sequences on electric circuits. (DDR)

  1. Primary structure of a histidine-rich proteolytic fragment of human ceruloplasmin. I. Amino acid sequence of the cyanogen bromide peptides.

    PubMed

    Kingston, I B; Kingston, B L; Putnam, F W

    1980-04-10

    A histidine-rich fragment, Cp F5, with a molecular weight of 18,650 was isolated from human ceruloplasmin. It consists of 159 amino acids and contains a possible copper-binding site. The sequence of the first 18 NH2-terminal residues of Cp F5 was determined by automated Edman degradation. Cp F5 was cleaved by cyanogen bromide to produce nine fragments of from 2 to 63 residues. The amino acid sequence of all of the cyanogen bromide fragments was investigated using automated and manual Edman degradation, the fragments being digested with trypsin, chymotrypsin, thermolysin, staphylococcal protease, and pepsin as appropriate. The results, in conjunction with the data on the tryptic peptides reported in the accompanying paper (Kingston, I.B., Kingston, B.L., and Putnam, F.L. (1980) J. Biol. Chem. 255, 2886-2896), establish the complete amino acid sequence of Cp F5. PMID:6987229

  2. Protective immunogenicity of two synthetic peptides selected from the amino acid sequence of Bordetella pertussis toxin subunit S1.

    PubMed Central

    Askelöf, P; Rodmalm, K; Wrangsell, G; Larsson, U; Svenson, S B; Cowell, J L; Undén, A; Bartfai, T

    1990-01-01

    Two peptides, corresponding to amino acids 1-17 and 169-186 of the amino acid sequence of pertussis toxin (PT) subunit S1, were synthesized and coupled to the diphtheria toxin cross-reactive mutant protein CRM 197 and evaluated for immunogenicity and protective capacity against PT challenge in vivo. The peptide-CRM conjugates induced high antibody titers against native toxin in mice (BALB/c, C57/Black, and outbred NMRI) as measured by ELISA. Upon PT challenge (0.5 microgram of toxin) of the NMRI mice, the CRM conjugates of peptides 1-17 and 169-186 fully protected the mice from PT-induced leukocytosis. Immunization with the corresponding bovine serum albumin conjugates of these two peptides also fully protected mice. Rabbit antiserum to the peptide 1-17-CRM conjugate was highly efficient in inhibiting the ADP-ribosylating activity of PT but did not neutralize the clustering effect of PT on Chinese hamster ovary cells. In contrast, the rabbit antiserum raised against the peptide 169-186-CRM conjugate neutralized the clustering effect of PT on Chinese hamster ovary cells but did not inhibit the enzymatic activity of PT. Peptide 169-186-CRM conjugates mimic the immunoglobulin binding properties of PT and also cause clustering of Chinese hamster ovary cells. The CRM conjugates of these two peptides constitute a synthetic pertussis vaccine candidate with the ability to provide a chemically well-defined, safe, and efficient pertussis vaccine. Images PMID:2304902

  3. Hybridization properties of long nucleic acid probes for detection of variable target sequences, and development of a hybridization prediction algorithm

    PubMed Central

    Öhrmalm, Christina; Jobs, Magnus; Eriksson, Ronnie; Golbob, Sultan; Elfaitouri, Amal; Benachenhou, Farid; Strømme, Maria; Blomberg, Jonas

    2010-01-01

    One of the main problems in nucleic acid-based techniques for detection of infectious agents, such as influenza viruses, is that of nucleic acid sequence variation. DNA probes, 70-nt long, some including the nucleotide analog deoxyribose-Inosine (dInosine), were analyzed for hybridization tolerance to different amounts and distributions of mismatching bases, e.g. synonymous mutations, in target DNA. Microsphere-linked 70-mer probes were hybridized in 3M TMAC buffer to biotinylated single-stranded (ss) DNA for subsequent analysis in a Luminex® system. When mismatches interrupted contiguous matching stretches of 6 nt or longer, it had a strong impact on hybridization. Contiguous matching stretches are more important than the same number of matching nucleotides separated by mismatches into several regions. dInosine, but not 5-nitroindole, substitutions at mismatching positions stabilized hybridization remarkably well, comparable to N (4-fold) wobbles in the same positions. In contrast to shorter probes, 70-nt probes with judiciously placed dInosine substitutions and/or wobble positions were remarkably mismatch tolerant, with preserved specificity. An algorithm, NucZip, was constructed to model the nucleation and zipping phases of hybridization, integrating both local and distant binding contributions. It predicted hybridization more exactly than previous algorithms, and has the potential to guide the design of variation-tolerant yet specific probes. PMID:20864443

  4. Nucleic acid amplification in vitro: detection of sequences with low copy numbers and application to diagnosis of human immunodeficiency virus type 1 infection.

    PubMed Central

    Guatelli, J C; Gingeras, T R; Richman, D D

    1989-01-01

    The enzymatic amplification of specific nucleic acid sequences in vitro has revolutionized the use of nucleic acid hybridization assays for viral detection. With this method, the copy number of a pathogen-specific sequence is increased several orders of magnitude before detection is attempted. The sensitivity and specificity of detection are thus markedly improved. Mullis and Faloona devised the first method of sequence amplification in vitro, the polymerase chain reaction (K.B. Mullis and F.A. Faloona, Methods Enzymol. 155:355-350, 1987). By this method, synthetic oligonucleotide primers direct repeated, target-specific, deoxyribonucleic acid-synthetic reactions, resulting in an exponential increase in the amount of the specific target sequence. The application of sequence amplification to viral detection was initially performed with human immunodeficiency virus type 1 and human T-cell lymphoma virus type I. In principle, however, this approach can be applied to the detection of any deoxyribonucleic or ribonucleic acid virus; the only requirement is that sufficient nucleotide sequence data exist to allow the synthesis of target-specific oligonucleotide primers. The use of target amplification in vitro will permit a variety of studies of viral pathogenesis which have not been feasible because of the low copy number of the viral nucleic acids in infected material. This approach is particularly applicable to the study of human retroviral infections, which are chronic and persistent and are characterized by low titers of virus in tissues. In addition, target amplification in vitro will facilitate the development of new methods of sequence detection, which will be useful for rapid viral diagnosis in the clinical laboratory. PMID:2650862

  5. Amino acid sequence homology between Piv, an essential protein in site-specific DNA inversion in Moraxella lacunata, and transposases of an unusual family of insertion elements.

    PubMed Central

    Lenich, A G; Glasgow, A C

    1994-01-01

    Deletion analysis of the subcloned DNA inversion region of Moraxella lacunata indicates that Piv is the only M. lacunata-encoded factor required for site-specific inversion of the tfpQ/tfpI pilin segment. The predicted amino acid sequence of Piv shows significant homology solely with the transposases/integrases of a family of insertion sequence elements, suggesting that Piv is a novel site-specific recombinase. Images PMID:8021196

  6. Efficient transfer of information from hexitol nucleic acids to RNA during nonenzymatic oligomerization

    NASA Technical Reports Server (NTRS)

    Kozlov, I. A.; De Bouvere, B.; Van Aerschot, A.; Herdewijn, P.; Orgel, L. E.

    1999-01-01

    Hexitol nucleic acids (HNAs) are DNA analogues that contain the standard nucleoside bases attached to a phosphorylated 1,5-anhydrohexitol backbone. We find that HNAs support efficient information transfer in nonensymatic template-directed reactions. HNA heterosequences appeared to be superior to the corresponding DNA heterosequences in facilitating synthesis of complementary oligonucleotides from nucleoside-5'-phosphoro-2-methyl imidazolides.

  7. Toward allotetraploid cotton genome assembly: integration of a high-density molecular genetic linkage map with DNA sequence information

    PubMed Central

    2012-01-01

    Background Cotton is the world’s most important natural textile fiber and a significant oilseed crop. Decoding cotton genomes will provide the ultimate reference and resource for research and utilization of the species. Integration of high-density genetic maps with genomic sequence information will largely accelerate the process of whole-genome assembly in cotton. Results In this paper, we update a high-density interspecific genetic linkage map of allotetraploid cultivated cotton. An additional 1,167 marker loci have been added to our previously published map of 2,247 loci. Three new marker types, InDel (insertion-deletion) and SNP (single nucleotide polymorphism) developed from gene information, and REMAP (retrotransposon-microsatellite amplified polymorphism), were used to increase map density. The updated map consists of 3,414 loci in 26 linkage groups covering 3,667.62 cM with an average inter-locus distance of 1.08 cM. Furthermore, genome-wide sequence analysis was finished using 3,324 informative sequence-based markers and publicly-available Gossypium DNA sequence information. A total of 413,113 EST and 195 BAC sequences were physically anchored and clustered by 3,324 sequence-based markers. Of these, 14,243 ESTs and 188 BACs from different species of Gossypium were clustered and specifically anchored to the high-density genetic map. A total of 2,748 candidate unigenes from 2,111 ESTs clusters and 63 BACs were mined for functional annotation and classification. The 337 ESTs/genes related to fiber quality traits were integrated with 132 previously reported cotton fiber quality quantitative trait loci, which demonstrated the important roles in fiber quality of these genes. Higher-level sequence conservation between different cotton species and between the A- and D-subgenomes in tetraploid cotton was found, indicating a common evolutionary origin for orthologous and paralogous loci in Gossypium. Conclusion This study will serve as a valuable genomic resource

  8. The outer capsid protein VP4 of equine rotavirus strain H-2 represents a unique VP4 type by amino acid sequence analysis.

    PubMed

    Hardy, M E; Gorziglia, M; Woode, G N

    1993-03-01

    The nucleotide and deduced amino acid sequence of G serotype 3 equine rotavirus strain H-2 was determined. A predicted 776-amino-acid H-2 VP4 shows less than or equal to 85.3% identity to other rotavirus VP4 types sequenced to date and thus represents a new P serotype. A PCR-generated probe derived from a cDNA clone of H-2 gene 4 hybridized to gene 4 of several tissue-culture-adapted equine rotavirus isolates, demonstrating that the gene 4 allele present in the H-2 strain is present in the equine rotavirus population. PMID:8382410

  9. Single Amino Acid Substitutions in the Chemotactic Sequence of Urokinase Receptor Modulate Cell Migration and Invasion

    PubMed Central

    Franco, Paola; Pavone, Vincenzo; Mugione, Pietro; Di Carluccio, Gioconda; Masucci, Maria Teresa; Arra, Claudio; Pirozzi, Giuseppe; Stoppelli, Maria Patrizia; Carriero, Maria Vincenza

    2012-01-01

    The receptor for urokinase-type plasminogen activator (uPAR) plays an important role in controlling cell migration. uPAR binds urokinase and vitronectin extracellular ligands, and signals in complex with transmembrane receptors such as Formyl-peptide Receptors (FPR)s and integrins. Previous work from this laboratory has shown that synthetic peptides, corresponding to the uPAR88–92 chemotactic sequence, when carrying the S90P or S90E substitutions, up- or down-regulate cell migration, respectively. To gain mechanistic insights into these opposite cell responses, the functional consequences of S90P and S90E mutations in full-length uPAR were evaluated. First, (HEK)-293 embryonic kidney cells expressing uPARS90P exhibit enhanced FPR activation, increased random and directional cell migration, long-lasting Akt phosphorylation, and increased adhesion to vitronectin, as well as uPAR/vitronectin receptor association. In contrast, the S90E substitution prevents agonist-triggered FPR activation and internalization, decreases binding and adhesion to vitronectin, and inhibits uPAR/vitronectin receptor association. Also, 293/uPARS90P cells appear quite elongated and their cytoskeleton well organized, whereas 293/uPARS90E cells assume a large flattened morphology, with random orientation of actin filaments. Interestingly, when HT1080 cells co-express wild type uPAR with uPAR S90E, the latter behaves as a dominant-negative, impairing uPAR-mediated signaling and reducing cell wound repair as well as lung metastasis in nude mice. In contrast, signaling, wound repair and in vivo lung metastasis of HT1080 cells bearing wild type uPAR are enhanced when they co-express uPARS90P. In conclusion, our findings indicate that Ser90 is a critical residue for uPAR signaling and that the S90P and S90E exert opposite effects on uPAR activities. These findings may be accommodated in a molecular model, in which uPARS90E and uPARS90P are forced into inactive and active forms, respectively

  10. Targeted next-generation sequencing of deafness genes in hearing-impaired individuals uncovers informative mutations

    PubMed Central

    Vona, Barbara; Müller, Tobias; Nanda, Indrajit; Neuner, Cordula; Hofrichter, Michaela A. H.; Schröder, Jörg; Bartsch, Oliver; Läßig, Anne; Keilmann, Annerose; Schraven, Sebastian; Kraus, Fabian; Shehata-Dieler, Wafaa; Haaf, Thomas

    2014-01-01

    Purpose: Targeted next-generation sequencing provides a remarkable opportunity to identify variants in known disease genes, particularly in extremely heterogeneous disorders such as nonsyndromic hearing loss. The present study attempts to shed light on the complexity of hearing impairment. Methods: Using one of two next-generation sequencing panels containing either 80 or 129 deafness genes, we screened 30 individuals with nonsyndromic hearing loss (from 23 unrelated families) and analyzed 9 normal-hearing controls. Results: Overall, we found an average of 3.7 variants (in 80 genes) with deleterious prediction outcome, including a number of novel variants, in individuals with nonsyndromic hearing loss and 1.4 in controls. By next-generation sequencing alone, 12 of 23 (52%) probands were diagnosed with monogenic forms of nonsyndromic hearing loss; one individual displayed a DNA sequence mutation together with a microdeletion. Two (9%) probands have Usher syndrome. In the undiagnosed individuals (10/23; 43%) we detected a significant enrichment of potentially pathogenic variants as compared to controls. Conclusion: Next-generation sequencing combined with microarrays provides the diagnosis for approximately half of the GJB2 mutation–negative individuals. Usher syndrome was found to be more frequent in the study cohort than anticipated. The conditions in a proportion of individuals with nonsyndromic hearing loss, particularly in the undiagnosed group, may have been caused or modified by an accumulation of unfavorable variants across multiple genes. PMID:24875298

  11. Complete Genome Sequence of the d-Amino Acid Catabolism Bacterium Phaeobacter sp. Strain JL2886, Isolated from Deep Seawater of the South China Sea.

    PubMed

    Fu, Yingnan; Wang, Rui; Zhang, Zilian; Jiao, Nianzhi

    2016-01-01

    Phaeobacter sp. strain JL2886, isolated from deep seawater of the South China Sea, can catabolize d-amino acids. Here, we report the complete genome sequence of Phaeobacter sp. JL2886. It comprises ~4.06 Mbp, with a G+C content of 61.52%. A total of 3,913 protein-coding genes and 10 genes related to d-amino acid catabolism were obtained. PMID:27587825

  12. Complete Genome Sequence of the d-Amino Acid Catabolism Bacterium Phaeobacter sp. Strain JL2886, Isolated from Deep Seawater of the South China Sea

    PubMed Central

    Fu, Yingnan; Wang, Rui

    2016-01-01

    Phaeobacter sp. strain JL2886, isolated from deep seawater of the South China Sea, can catabolize d-amino acids. Here, we report the complete genome sequence of Phaeobacter sp. JL2886. It comprises ~4.06 Mbp, with a G+C content of 61.52%. A total of 3,913 protein-coding genes and 10 genes related to d-amino acid catabolism were obtained. PMID:27587825

  13. Ion Torrent Personal Genome Machine Sequencing for Genomic Typing of Neisseria meningitidis for Rapid Determination of Multiple Layers of Typing Information

    PubMed Central

    Szczepanowski, Rafael; Claus, Heike; Jünemann, Sebastian; Prior, Karola; Harmsen, Dag

    2012-01-01

    Neisseria meningitidis causes invasive meningococcal disease in infants, toddlers, and adolescents worldwide. DNA sequence-based typing, including multilocus sequence typing, analysis of genetic determinants of antibiotic resistance, and sequence typing of vaccine antigens, has become the standard for molecular epidemiology of the organism. However, PCR of multiple targets and consecutive Sanger sequencing provide logistic constraints to reference laboratories. Taking advantage of the recent development of benchtop next-generation sequencers (NGSs) and of BIGSdb, a database accommodating and analyzing genome sequence data, we therefore explored the feasibility and accuracy of Ion Torrent Personal Genome Machine (PGM) sequencing for genomic typing of meningococci. Three strains from a previous meningococcus serogroup B community outbreak were selected to compare conventional typing results with data generated by semiconductor chip-based sequencing. In addition, sequencing of the meningococcal type strain MC58 provided information about the general performance of the technology. The PGM technology generated sequence information for all target genes addressed. The results were 100% concordant with conventional typing results, with no further editing being necessary. In addition, the amount of typing information, i.e., nucleotides and target genes analyzed, could be substantially increased by the combined use of genome sequencing and BIGSdb compared to conventional methods. In the near future, affordable and fast benchtop NGS machines like the PGM might enable reference laboratories to switch to genomic typing on a routine basis. This will reduce workloads and rapidly provide information for laboratory surveillance, outbreak investigation, assessment of vaccine preventability, and antibiotic resistance gene monitoring. PMID:22461678

  14. A Solution for Establishing the Information Technology Service Management Processes Implementation Sequence

    NASA Astrophysics Data System (ADS)

    Arcilla, Magdalena; Calvo-Manzano, Jose; Cuevas, Gonzalo; Gómez, Gerzon; Ruiz, Elena; San Feliu, Tomás

    This paper addresses the implementation sequence of Services Management processes defined in ITIL v2, from a topological perspective. Graphs Theory is used to represent the existing dependencies among the ITIL v2 processes, in order to find clusters of strongly connected processes. These clusters will help to determine the implementation priority of the service management processes. For it, OPreSSD (Organizational Procedure for Service Support and Service Delivery) is proposed in order to identify the processes implementation sequence related to the Service Support (SS) and Service Delivery (SD) areas.

  15. Extremely Acidophilic Protists from Acid Mine Drainage Host Rickettsiales-Lineage Endosymbionts That Have Intervening Sequences in Their 16S rRNA Genes

    PubMed Central

    Baker, Brett J.; Hugenholtz, Philip; Dawson, Scott C.; Banfield, Jillian F.

    2003-01-01

    During a molecular phylogenetic survey of extremely acidic (pH < 1), metal-rich acid mine drainage habitats in the Richmond Mine at Iron Mountain, Calif., we detected 16S rRNA gene sequences of a novel bacterial group belonging to the order Rickettsiales in the Alphaproteobacteria. The closest known relatives of this group (92% 16S rRNA gene sequence identity) are endosymbionts of the protist Acanthamoeba. Oligonucleotide 16S rRNA probes were designed and used to observe members of this group within acidophilic protists. To improve visualization of eukaryotic populations in the acid mine drainage samples, broad-specificity probes for eukaryotes were redesigned and combined to highlight this component of the acid mine drainage community. Approximately 4% of protists in the acid mine drainage samples contained endosymbionts. Measurements of internal pH of the protists showed that their cytosol is close to neutral, indicating that the endosymbionts may be neutrophilic. The endosymbionts had a conserved 273-nucleotide intervening sequence (IVS) in variable region V1 of their 16S rRNA genes. The IVS does not match any sequence in current databases, but the predicted secondary structure forms well-defined stem loops. IVSs are uncommon in rRNA genes and appear to be confined to bacteria living in close association with eukaryotes. Based on the phylogenetic novelty of the endosymbiont sequences and initial culture-independent characterization, we propose the name “Candidatus Captivus acidiprotistae.” To our knowledge, this is the first report of an endosymbiotic relationship in an extremely acidic habitat. PMID:12957940

  16. SNP-based genotyping in lentil: linking sequence information with phenotypes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Lentil (Lens culinaris) has been late to enter the world of high throughput molecular analysis due to a general lack of genomic resources. Using a 454 sequencing-based approach, SNPs have been identified in genes across the lentil genome. Several hundred have been turned into single SNP KASP assay...

  17. Whole genome sequencing of elite rice cultivars as a comprehensive information resource for marker assisted selection

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Current advances in sequencing technologies and bioinformatics allow to determine a nearly complete genomic background of rice, a staple food for the poor people. Consequently, comprehensive databases of variation among thousands of varieties is currently being assembled and released. Proper analysi...

  18. Factors Related to Developing Instructional Information Sequences: Phase I. Final Report.

    ERIC Educational Resources Information Center

    Dansereau, D. R.; And Others

    A two-phased research project sought to develop instructional sequences for computer-assisted technical training materials which would reduce student learning time. In Phase I, technical concepts embedded in training materials were subjected to Inscal Multidimensional Scaling to determine their complexity and relationships; the materials were then…

  19. Next-generation re-sequencing of genes involved in increased platelet reactivity in diabetic patients on acetylsalicylic acid.

    PubMed

    Postula, Marek; Janicki, Piotr K; Eyileten, Ceren; Rosiak, Marek; Kaplon-Cieslicka, Agnieszka; Sugino, Shigekazu; Wilimski, Radosław; Kosior, Dariusz A; Opolski, Grzegorz; Filipiak, Krzysztof J; Mirowska-Guzel, Dagmara

    2016-06-01

    The objective of this study was to investigate whether rare missense genetic variants in several genes related to platelet functions and acetylsalicylic acid (ASA) response are associated with the platelet reactivity in patients with diabetes type 2 (T2D) on ASA therapy. Fifty eight exons and corresponding introns of eight selected genes, including PTGS1, PTGS2, TXBAS1, PTGIS, ADRA2A, ADRA2B, TXBA2R, and P2RY1 were re-sequenced in 230 DNA samples from T2D patients by using a pooled PCR amplification and next-generation sequencing by Illumina HiSeq2000. The observed non-synonymous variants were confirmed by individual genotyping of 384 DNA samples comprising of the individuals from the original discovery pools and additional verification cohort of 154 ASA-treated T2DM patients. The association between investigated phenotypes (ASA induced changes in platelets reactivity by PFA-100, VerifyNow and serum thromboxane B2 level [sTxB2]), and accumulation of rare missense variants (genetic burden) in investigated genes was tested using statistical collapsing tests. We identified a total of 35 exonic variants, including 3 common missense variants, 15 rare missense variants, and 17 synonymous variants in 8 investigated genes. The rare missense variants exhibited statistically significant difference in the accumulation pattern between a group of patients with increased and normal platelet reactivity based on PFA-100 assay. Our study suggests that genetic burden of the rare functional variants in eight genes may contribute to differences in the platelet reactivity measured with the PFA-100 assay in the T2DM patients treated with ASA. PMID:26599574

  20. Identification of G and P genotype-specific motifs in the predicted VP7 and VP4 amino acid sequences.

    PubMed

    Ma, Yongping

    2015-12-01

    Equine rotavirus (ERV) strain L338 (G13P[18]) has a unique G and P genotype. However, the evolutionary relationship of L338 with other ERVs is still unknown. Here whole genome analysis of the L338 ERV strain was independently performed. Its genotype constellations were determined as G13-P[18]-I6-R9-C9-M6-A6-N9-T12-E14-H11, confirming previous genotype assignments. The L338 strain only shared the P[18] and I6 genotypes with other ERVs. The nucleotide sequences of the other 9 RNA segments were different from those of cogent genes of all other group A rotavirus (RVA) strains including ERVs and formed unique phylogenetic lineages. The L338 evolutionary footprints were tentatively identified in both VP7 and VP4 amino acid sequences: two regions were found in VP7 and twelve in VP4. The conserved regions shared between L338 and other group A rotavirus strains (RVAs) indicated that L338 was more closely related genomically to animal and human RVAs other than ERVs, suggesting that L338 may not be an endogenous equine RV but have emerged as an interspecies reassortant with other RVA strains. Furthermore, genotype-specific motifs of all 27 G and 37 P types were identified in regions 7-1a (aa 91-100) of VP7 and regions 8-1 (aa146-151) and 8-3 (aa113-118 and 125-135) of VP4 (VP8*). PMID:26321159