Science.gov

Sample records for acid sequence predicts

  1. Predicting intrinsic disorder from amino acid sequence.

    PubMed

    Obradovic, Zoran; Peng, Kang; Vucetic, Slobodan; Radivojac, Predrag; Brown, Celeste J; Dunker, A Keith

    2003-01-01

    Blind predictions of intrinsic order and disorder were made on 42 proteins subsequently revealed to contain 9,044 ordered residues, 284 disordered residues in 26 segments of length 30 residues or less, and 281 disordered residues in 2 disordered segments of length greater than 30 residues. The accuracies of the six predictors used in this experiment ranged from 77% to 91% for the ordered regions and from 56% to 78% for the disordered segments. The average of the order and disorder predictions ranged from 73% to 77%. The prediction of disorder in the shorter segments was poor, from 25% to 66% correct, while the prediction of disorder in the longer segments was better, from 75% to 95% correct. Four of the predictors were composed of ensembles of neural networks. This enabled them to deal more efficiently with the large asymmetry in the training data through diversified sampling from the significantly larger ordered set and achieve better accuracy on ordered and long disordered regions. The exclusive use of long disordered regions for predictor training likely contributed to the disparity of the predictions on long versus short disordered regions, while averaging the output values over 61-residue windows to eliminate short predictions of order or disorder probably contributed to the even greater disparity for three of the predictors. This experiment supports the predictability of intrinsic disorder from amino acid sequence. PMID:14579347

  2. Predicting protein disorder by analyzing amino acid sequence

    PubMed Central

    Yang, Jack Y; Yang, Mary Qu

    2008-01-01

    Background Many protein regions and some entire proteins have no definite tertiary structure, presenting instead as dynamic, disorder ensembles under different physiochemical circumstances. These proteins and regions are known as Intrinsically Unstructured Proteins (IUP). IUP have been associated with a wide range of protein functions, along with roles in diseases characterized by protein misfolding and aggregation. Results Identifying IUP is important task in structural and functional genomics. We exact useful features from sequences and develop machine learning algorithms for the above task. We compare our IUP predictor with PONDRs (mainly neural-network-based predictors), disEMBL (also based on neural networks) and Globplot (based on disorder propensity). Conclusion We find that augmenting features derived from physiochemical properties of amino acids (such as hydrophobicity, complexity etc.) and using ensemble method proved beneficial. The IUP predictor is a viable alternative software tool for identifying IUP protein regions and proteins. PMID:18831799

  3. The value of short amino acid sequence matches for prediction of protein allergenicity.

    PubMed

    Silvanovich, Andre; Nemeth, Margaret A; Song, Ping; Herman, Rod; Tagliani, Laura; Bannon, Gary A

    2006-03-01

    Typically, genetically engineered crops contain traits encoded by one or a few newly expressed proteins. The allergenicity assessment of newly expressed proteins is an important component in the safety evaluation of genetically engineered plants. One aspect of this assessment involves sequence searches that compare the amino acid sequence of the protein to all known allergens. Analyses are performed to determine the potential for immunologically based cross-reactivity where IgE directed against a known allergen could bind to the protein and elicit a clinical reaction in sensitized individuals. Bioinformatic searches are designed to detect global sequence similarity and short contiguous amino acid sequence identity. It has been suggested that potential allergen cross-reactivity may be predicted by identifying matches as short as six to eight contiguous amino acids between the protein of interest and a known allergen. A series of analyses were performed, and match probabilities were calculated for different size peptides to determine if there was a scientifically justified search window size that identified allergen sequence characteristics. Four probability modeling methods were tested: (1) a mock protein and a mock allergen database, (2) a mock protein and genuine allergen database, (3) a genuine allergen and genuine protein database, and (4) a genuine allergen and genuine protein database combined with a correction for repeating peptides. These analyses indicated that searches for short amino acid sequence matches of eight amino acids or fewer to identify proteins as potential cross-reactive allergens is a product of chance and adds little value to allergy assessments for newly expressed proteins.

  4. Fast computational methods for predicting protein structure from primary amino acid sequence

    DOEpatents

    Agarwal, Pratul Kumar

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  5. ENTPRISE: An Algorithm for Predicting Human Disease-Associated Amino Acid Substitutions from Sequence Entropy and Predicted Protein Structures

    PubMed Central

    Zhou, Hongyi; Gao, Mu; Skolnick, Jeffrey

    2016-01-01

    The advance of next-generation sequencing technologies has made exome sequencing rapid and relatively inexpensive. A major application of exome sequencing is the identification of genetic variations likely to cause Mendelian diseases. This requires processing large amounts of sequence information and therefore computational approaches that can accurately and efficiently identify the subset of disease-associated variations are needed. The accuracy and high false positive rates of existing computational tools leave much room for improvement. Here, we develop a boosted tree regression machine-learning approach to predict human disease-associated amino acid variations by utilizing a comprehensive combination of protein sequence and structure features. On comparing our method, ENTPRISE, to the state-of-the-art methods SIFT, PolyPhen-2, MUTATIONASSESSOR, MUTATIONTASTER, FATHMM, ENTPRISE exhibits significant improvement. In particular, on a testing dataset consisting of only proteins with balanced disease-associated and neutral variations defined as having the ratio of neutral/disease-associated variations between 0.3 and 3, the Mathews Correlation Coefficient by ENTPRISE is 0.493 as compared to 0.432 by PPH2-HumVar, 0.406 by SIFT, 0.403 by MUTATIONASSESSOR, 0.402 by PPH2-HumDiv, 0.305 by MUTATIONTASTER, and 0.181 by FATHMM. ENTPRISE is then applied to nucleic acid binding proteins in the human proteome. Disease-associated predictions are shown to be highly correlated with the number of protein-protein interactions. Both these predictions and the ENTPRISE server are freely available for academic users as a web service at http://cssb.biology.gatech.edu/entprise/. PMID:26982818

  6. ENTPRISE: An Algorithm for Predicting Human Disease-Associated Amino Acid Substitutions from Sequence Entropy and Predicted Protein Structures.

    PubMed

    Zhou, Hongyi; Gao, Mu; Skolnick, Jeffrey

    2016-01-01

    The advance of next-generation sequencing technologies has made exome sequencing rapid and relatively inexpensive. A major application of exome sequencing is the identification of genetic variations likely to cause Mendelian diseases. This requires processing large amounts of sequence information and therefore computational approaches that can accurately and efficiently identify the subset of disease-associated variations are needed. The accuracy and high false positive rates of existing computational tools leave much room for improvement. Here, we develop a boosted tree regression machine-learning approach to predict human disease-associated amino acid variations by utilizing a comprehensive combination of protein sequence and structure features. On comparing our method, ENTPRISE, to the state-of-the-art methods SIFT, PolyPhen-2, MUTATIONASSESSOR, MUTATIONTASTER, FATHMM, ENTPRISE exhibits significant improvement. In particular, on a testing dataset consisting of only proteins with balanced disease-associated and neutral variations defined as having the ratio of neutral/disease-associated variations between 0.3 and 3, the Mathews Correlation Coefficient by ENTPRISE is 0.493 as compared to 0.432 by PPH2-HumVar, 0.406 by SIFT, 0.403 by MUTATIONASSESSOR, 0.402 by PPH2-HumDiv, 0.305 by MUTATIONTASTER, and 0.181 by FATHMM. ENTPRISE is then applied to nucleic acid binding proteins in the human proteome. Disease-associated predictions are shown to be highly correlated with the number of protein-protein interactions. Both these predictions and the ENTPRISE server are freely available for academic users as a web service at http://cssb.biology.gatech.edu/entprise/.

  7. Fusion protein predicted amino acid sequence of the first US avian pneumovirus isolate and lack of heterogeneity among other US isolates.

    PubMed

    Seal, B S; Sellers, H S; Meinersmann, R J

    2000-02-01

    Avian pneumovirus (APV) was first isolated from turkeys in the west-central US following emergence of turkey rhinotracheitis (TRT) during 1996. Subsequently, several APV isolates were obtained from the north-central US. Matrix (M) and fusion (F) protein genes of these isolates were examined for sequence heterogeneity and compared with European APV subtypes A and B. Among US isolates the M gene shared greater than 98% nucleotide sequence identity with only one nonsynonymous change occurring in a single US isolate. Although the F gene among US APV isolates shared 98% nucleotide sequence identity, nine conserved substitutions were detected in the predicted amino acid sequence. The predicted amino acid sequence of the US APV isolate's F protein had 72% sequence identity to the F protein of APV subtype A and 71% sequence identity to the F protein of APV subtype B. This compares with 83% sequence identity between the APV subtype A and B predicted amino acid sequences of the F protein. The US isolates were phylogenetically distinguishable from their European counterparts based on F gene nucleotide or predicted amino acid sequences. Lack of sequence heterogeneity among US APV subtypes indicates these viruses have maintained a relatively stable population since the first outbreak of TRT. Phylogenetic analysis of the F protein among APV isolates supports classification of US isolates as a new APV subtype C.

  8. Hybridization properties of long nucleic acid probes for detection of variable target sequences, and development of a hybridization prediction algorithm

    PubMed Central

    Öhrmalm, Christina; Jobs, Magnus; Eriksson, Ronnie; Golbob, Sultan; Elfaitouri, Amal; Benachenhou, Farid; Strømme, Maria; Blomberg, Jonas

    2010-01-01

    One of the main problems in nucleic acid-based techniques for detection of infectious agents, such as influenza viruses, is that of nucleic acid sequence variation. DNA probes, 70-nt long, some including the nucleotide analog deoxyribose-Inosine (dInosine), were analyzed for hybridization tolerance to different amounts and distributions of mismatching bases, e.g. synonymous mutations, in target DNA. Microsphere-linked 70-mer probes were hybridized in 3M TMAC buffer to biotinylated single-stranded (ss) DNA for subsequent analysis in a Luminex® system. When mismatches interrupted contiguous matching stretches of 6 nt or longer, it had a strong impact on hybridization. Contiguous matching stretches are more important than the same number of matching nucleotides separated by mismatches into several regions. dInosine, but not 5-nitroindole, substitutions at mismatching positions stabilized hybridization remarkably well, comparable to N (4-fold) wobbles in the same positions. In contrast to shorter probes, 70-nt probes with judiciously placed dInosine substitutions and/or wobble positions were remarkably mismatch tolerant, with preserved specificity. An algorithm, NucZip, was constructed to model the nucleation and zipping phases of hybridization, integrating both local and distant binding contributions. It predicted hybridization more exactly than previous algorithms, and has the potential to guide the design of variation-tolerant yet specific probes. PMID:20864443

  9. Immunoreactivity of polyclonal antibodies generated against the carboxy terminus of the predicted amino acid sequence of the Huntington disease gene

    SciTech Connect

    Alkatib, G.; Graham, R.; Pelmear-Telenius, A.

    1994-09-01

    A cDNA fragment spanning the 3{prime}-end of the Huntington disease gene (from 8052 to 9252) was cloned into a prokaryotic expression vector containing the E. Coli lac promoter and a portion of the coding sequence for {beta}-galactosidase. The truncated {beta}-galactosidase gene was cleaved with BamHl and fused in frame to the BamHl fragment of the Huntington disease gene 3{prime}-end. Expression analysis of proteins made in E. Coli revealed that 20-30% of the total cellular proteins was represented by the {beta}-galactosidase-huntingtin fusion protein. The identity of the Huntington disease protein amino acid sequences was confirmed by protein sequence analysis. Affinity chromatography was used to purify large quantities of the fusion protein from bacterial cell lysates. Affinity-purified proteins were used to immunize New Zealand white rabbits for antibody production. The generated polyclonal antibodies were used to immunoprecipitate the Huntington disease gene product expressed in a neuroblastoma cell line. In this cell line the antibodies precipitated two protein bands of apparent gel migrations of 200 and 150 kd which together, correspond to the calculated molecular weight of the Huntington disease gene product (350 kd). Immunoblotting experiments revealed the presence of a large precursor protein in the range of 350-750 kd which is in agreement with the predicted molecular weight of the protein without post-translational modifications. These results indicate that the huntingtin protein is cleaved into two subunits in this neuroblastoma cell line and implicate that cleavage of a large precursor protein may contribute to its biological activity. Experiments are ongoing to determine the precursor-product relationship and to examine the synthesis of the huntingtin protein in freshly isolated rat brains, and to determine cellular and subcellular distribution of the gene product.

  10. Predicting Secretory Proteins of Malaria Parasite by Incorporating Sequence Evolution Information into Pseudo Amino Acid Composition via Grey System Model

    PubMed Central

    Lin, Wei-Zhong; Fang, Jian-An; Xiao, Xuan; Chou, Kuo-Chen

    2012-01-01

    The malaria disease has become a cause of poverty and a major hindrance to economic development. The culprit of the disease is the parasite, which secretes an array of proteins within the host erythrocyte to facilitate its own survival. Accordingly, the secretory proteins of malaria parasite have become a logical target for drug design against malaria. Unfortunately, with the increasing resistance to the drugs thus developed, the situation has become more complicated. To cope with the drug resistance problem, one strategy is to timely identify the secreted proteins by malaria parasite, which can serve as potential drug targets. However, it is both expensive and time-consuming to identify the secretory proteins of malaria parasite by experiments alone. To expedite the process for developing effective drugs against malaria, a computational predictor called “iSMP-Grey” was developed that can be used to identify the secretory proteins of malaria parasite based on the protein sequence information alone. During the prediction process a protein sample was formulated with a 60D (dimensional) feature vector formed by incorporating the sequence evolution information into the general form of PseAAC (pseudo amino acid composition) via a grey system model, which is particularly useful for solving complicated problems that are lack of sufficient information or need to process uncertain information. It was observed by the jackknife test that iSMP-Grey achieved an overall success rate of 94.8%, remarkably higher than those by the existing predictors in this area. As a user-friendly web-server, iSMP-Grey is freely accessible to the public at http://www.jci-bioinfo.cn/iSMP-Grey. Moreover, for the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematical equations involved in this paper. PMID:23189138

  11. Predicting secretory proteins of malaria parasite by incorporating sequence evolution information into pseudo amino acid composition via grey system model.

    PubMed

    Lin, Wei-Zhong; Fang, Jian-An; Xiao, Xuan; Chou, Kuo-Chen

    2012-01-01

    The malaria disease has become a cause of poverty and a major hindrance to economic development. The culprit of the disease is the parasite, which secretes an array of proteins within the host erythrocyte to facilitate its own survival. Accordingly, the secretory proteins of malaria parasite have become a logical target for drug design against malaria. Unfortunately, with the increasing resistance to the drugs thus developed, the situation has become more complicated. To cope with the drug resistance problem, one strategy is to timely identify the secreted proteins by malaria parasite, which can serve as potential drug targets. However, it is both expensive and time-consuming to identify the secretory proteins of malaria parasite by experiments alone. To expedite the process for developing effective drugs against malaria, a computational predictor called "iSMP-Grey" was developed that can be used to identify the secretory proteins of malaria parasite based on the protein sequence information alone. During the prediction process a protein sample was formulated with a 60D (dimensional) feature vector formed by incorporating the sequence evolution information into the general form of PseAAC (pseudo amino acid composition) via a grey system model, which is particularly useful for solving complicated problems that are lack of sufficient information or need to process uncertain information. It was observed by the jackknife test that iSMP-Grey achieved an overall success rate of 94.8%, remarkably higher than those by the existing predictors in this area. As a user-friendly web-server, iSMP-Grey is freely accessible to the public at http://www.jci-bioinfo.cn/iSMP-Grey. Moreover, for the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematical equations involved in this paper.

  12. The predicted amino acid sequence of alpha-internexin is that of a novel neuronal intermediate filament protein.

    PubMed Central

    Fliegner, K H; Ching, G Y; Liem, R K

    1990-01-01

    Our laboratory recently isolated and began to characterize a 66 kd rat brain cytoskeletal protein, dubbed alpha-internexin for its interactions in vitro with several other cytoskeletal proteins. Although alpha-internexin bore several of the characteristics of intermediate filament (IF) proteins, including the recognition by an antibody reactive with all IF proteins, it did not polymerize into 10 nm filaments under the conditions tested. Here we show that the predicted amino acid sequence of a cDNA encoding alpha-internexin shows the latter to be an IF protein, probably most closely related to the neurofilament proteins. Northern blotting shows that alpha-internexin expression is brain specific, and that rat brain alpha-internexin mRNA levels are maximal prior to birth and decline into adulthood, while the converse is seen for NF-L, the low molecular weight neurofilament subunit, suggesting that these two proteins play different roles in the developing brain. Images Fig. 1. Fig. 3. Fig. 5. PMID:2311576

  13. ASTRO-FOLD: A Combinatorial and Global Optimization Framework for Ab Initio Prediction of Three-Dimensional Structures of Proteins from the Amino Acid Sequence

    PubMed Central

    Klepeis, J. L.; Floudas, C. A.

    2003-01-01

    The field of computational biology has been revolutionized by recent advances in genomics. The completion of a number of genome projects, including that of the human genome, has paved the way toward a variety of challenges and opportunities in bioinformatics and biological systems engineering. One of the first challenges has been the determination of the structures of proteins encoded by the individual genes. This problem, which represents the progression from sequence to structure (genomics to structural genomics), has been widely known as the structure-prediction-in-protein-folding problem. We present the development and application of ASTRO-FOLD, a novel and complete approach for the ab initio prediction of protein structures given only the amino acid sequences of the proteins. The approach exhibits many novel components and the merits of its application are examined for a suite of protein systems, including a number of targets from several critical-assessment-of-structure-prediction experiments. PMID:14507680

  14. Composition for nucleic acid sequencing

    SciTech Connect

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  15. High speed nucleic acid sequencing

    SciTech Connect

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  16. Predicting the molecular complexity of sequencing libraries.

    PubMed

    Daley, Timothy; Smith, Andrew D

    2013-04-01

    Predicting the molecular complexity of a genomic sequencing library is a critical but difficult problem in modern sequencing applications. Methods to determine how deeply to sequence to achieve complete coverage or to predict the benefits of additional sequencing are lacking. We introduce an empirical bayesian method to accurately characterize the molecular complexity of a DNA sample for almost any sequencing application on the basis of limited preliminary sequencing. PMID:23435259

  17. Discrete sequence prediction and its applications

    NASA Technical Reports Server (NTRS)

    Laird, Philip

    1992-01-01

    Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We apply sequence prediction using a simple and practical sequence-prediction algorithm, called TDAG. The TDAG algorithm is first tested by comparing its performance with some common data compression algorithms. Then it is adapted to the detailed requirements of dynamic program optimization, with excellent results.

  18. Nucleotide and Predicted Amino Acid Sequence-Based Analysis of the Avian Metapneumovirus Type C Cell Attachment Glycoprotein Gene: Phylogenetic Analysis and Molecular Epidemiology of U.S. Pneumoviruses

    PubMed Central

    Alvarez, Rene; Lwamba, Humphrey M.; Kapczynski, Darrell R.; Njenga, M. Kariuki; Seal, Bruce S.

    2003-01-01

    A serologically distinct avian metapneumovirus (aMPV) was isolated in the United States after an outbreak of turkey rhinotracheitis (TRT) in February 1997. The newly recognized U.S. virus was subsequently demonstrated to be genetically distinct from European subtypes and was designated aMPV serotype C (aMPV/C). We have determined the nucleotide sequence of the gene encoding the cell attachment glycoprotein (G) of aMPV/C (Colorado strain and three Minnesota isolates) and predicted amino acid sequence by sequencing cloned cDNAs synthesized from intracellular RNA of aMPV/C-infected cells. The nucleotide sequence comprised 1,321 nucleotides with only one predicted open reading frame encoding a protein of 435 amino acids, with a predicted Mr of 48,840. The structural characteristics of the predicted G protein of aMPV/C were similar to those of the human respiratory syncytial virus (hRSV) attachment G protein, including two mucin-like regions (heparin-binding domains) flanking both sides of a CX3C chemokine motif present in a conserved hydrophobic pocket. Comparison of the deduced G-protein amino acid sequence of aMPV/C with those of aMPV serotypes A, B, and D, as well as hRSV revealed overall predicted amino acid sequence identities ranging from 4 to 16.5%, suggesting a distant relationship. However, G-protein sequence identities ranged from 72 to 97% when aMPV/C was compared to other members within the aMPV/C subtype or 21% for the recently identified human MPV (hMPV) G protein. Ratios of nonsynonymous to synonymous nucleotide changes were greater than one in the G gene when comparing the more recent Minnesota isolates to the original Colorado isolate. Epidemiologically, this indicates positive selection among U.S. isolates since the first outbreak of TRT in the United States. PMID:12682171

  19. KM+, a mannose-binding lectin from Artocarpus integrifolia: amino acid sequence, predicted tertiary structure, carbohydrate recognition, and analysis of the beta-prism fold.

    PubMed

    Rosa, J C; De Oliveira, P S; Garratt, R; Beltramini, L; Resing, K; Roque-Barreira, M C; Greene, L J

    1999-01-01

    The complete amino acid sequence of the lectin KM+ from Artocarpus integrifolia (jackfruit), which contains 149 residues/mol, is reported and compared to those of other members of the Moraceae family, particularly that of jacalin, also from jackfruit, with which it shares 52% sequence identity. KM+ presents an acetyl-blocked N-terminus and is not posttranslationally modified by proteolytic cleavage as is the case for jacalin. Rather, it possesses a short, glycine-rich linker that unites the regions homologous to the alpha- and beta-chains of jacalin. The results of homology modeling implicate the linker sequence in sterically impeding rotation of the side chain of Asp141 within the binding site pocket. As a consequence, the aspartic acid is locked into a conformation adequate only for the recognition of equatorial hydroxyl groups on the C4 epimeric center (alpha-D-mannose, alpha-D-glucose, and their derivatives). In contrast, the internal cleavage of the jacalin chain permits free rotation of the homologous aspartic acid, rendering it capable of accepting hydrogen bonds from both possible hydroxyl configurations on C4. We suggest that, together with direct recognition of epimeric hydroxyls and the steric exclusion of disfavored ligands, conformational restriction of the lectin should be considered to be a new mechanism by which selectivity may be built into carbohydrate binding sites. Jacalin and KM+ adopt the beta-prism fold already observed in two unrelated protein families. Despite presenting little or no sequence similarity, an analysis of the beta-prism reveals a canonical feature repeatedly present in all such structures, which is based on six largely hydrophobic residues within a beta-hairpin containing two classic-type beta-bulges. We suggest the term beta-prism motif to describe this feature.

  20. Chip-based sequencing nucleic acids

    SciTech Connect

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  1. Distinguishing proteins from arbitrary amino acid sequences.

    PubMed

    Yau, Stephen S-T; Mao, Wei-Guang; Benson, Max; He, Rong Lucy

    2015-01-01

    What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe. PMID:25609314

  2. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  3. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  4. Temporal and kinematic consistency predict sequence awareness.

    PubMed

    Jaynes, Molly J; Schieber, Marc H; Mink, Jonathan W

    2016-10-01

    Many human motor skills can be represented as a hierarchical series of movement patterns. Awareness of underlying patterns can improve performance and decrease cognitive load. Subjects (n = 30) tapped a finger sequence with changing stimulus-to-response mapping and a common movement sequence. Thirteen subjects (43 %) became aware that they were tapping a familiar movement sequence during the experiment. Subjects who became aware of the underlying motor pattern tapped with greater kinematic and temporal consistency from task onset, but consistency was not sufficient for awareness. We found no effect of age, musical experience, tapping evenness, or inter-key-interval on awareness of the pattern in the motor response. We propose that temporal or kinematic consistency reinforces a pattern representation, but cognitive engagement with the contents of the sequence is necessary to bring the pattern to conscious awareness. These findings predict benefit for movement strategies that limit temporal and kinematic variability during motor learning. PMID:27324192

  5. Bovine Parathyroid Hormone: Amino Acid Sequence

    PubMed Central

    Brewer, H. Bryan; Ronan, Rosemary

    1970-01-01

    Bovine parathyroid hormone has been isolated in homogeneous form, and its complete amino acid sequence determined. The bovine hormone is a single chain, 84 amino acids long. It contains amino-terminal alanine, and carboxyl-terminal glutamine. The bovine parathyroid hormone is approximately three times the length of the newly discovered hormone, thyrocalcitonin, whose action is reciprocal to parathyroid hormone. Images PMID:5275384

  6. Lossless Video Sequence Compression Using Adaptive Prediction

    NASA Technical Reports Server (NTRS)

    Li, Ying; Sayood, Khalid

    2007-01-01

    We present an adaptive lossless video compression algorithm based on predictive coding. The proposed algorithm exploits temporal, spatial, and spectral redundancies in a backward adaptive fashion with extremely low side information. The computational complexity is further reduced by using a caching strategy. We also study the relationship between the operational domain for the coder (wavelet or spatial) and the amount of temporal and spatial redundancy in the sequence being encoded. Experimental results show that the proposed scheme provides significant improvements in compression efficiencies.

  7. Predicting the aggregation propensity of prion sequences.

    PubMed

    Espargaró, Alba; Busquets, Maria Antònia; Estelrich, Joan; Sabate, Raimon

    2015-09-01

    The presence of prions can result in debilitating and neurodegenerative diseases in mammals and protein-based genetic elements in fungi. Prions are defined as a subclass of amyloids in which the self-aggregation process becomes self-perpetuating and infectious. Like all amyloids, prions polymerize into fibres with a common core formed of β-sheet structures oriented perpendicular to the fibril axes which form a structure known as a cross-β structure. The intermolecular β-sheet propensity, a characteristic of the amyloid pattern, as well as other key parameters of amyloid fibril formation can be predicted. Mathematical algorithms have been proposed to predict both amyloid and prion propensities. However, it has been shown that the presence of amyloid-prone regions in a polypeptide sequence could be insufficient for amyloid formation. It has also often been stated that the formation of amyloid fibrils does not imply that these are prions. Despite these limitations, in silico prediction of amyloid and prion propensities should help detect potential new prion sequences in mammals. In addition, the determination of amyloid-prone regions in prion sequences could be very useful in understanding the effect of sporadic mutations and polymorphisms as well as in the search for therapeutic targets.

  8. CRYSTALP2: sequence-based protein crystallization propensity prediction

    PubMed Central

    Kurgan, Lukasz; Razib, Ali A; Aghakhani, Sara; Dick, Scott; Mizianty, Marcin; Jahandideh, Samad

    2009-01-01

    Background Current protocols yield crystals for <30% of known proteins, indicating that automatically identifying crystallizable proteins may improve high-throughput structural genomics efforts. We introduce CRYSTALP2, a kernel-based method that predicts the propensity of a given protein sequence to produce diffraction-quality crystals. This method utilizes the composition and collocation of amino acids, isoelectric point, and hydrophobicity, as estimated from the primary sequence, to generate predictions. CRYSTALP2 extends its predecessor, CRYSTALP, by enabling predictions for sequences of unrestricted size and provides improved prediction quality. Results A significant majority of the collocations used by CRYSTALP2 include residues with high conformational entropy, or low entropy and high potential to mediate crystal contacts; notably, such residues are utilized by surface entropy reduction methods. We show that the collocations provide complementary information to the hydrophobicity and isoelectric point. Tests on four datasets show that CRYSTALP2 outperforms several existing sequence-based predictors (CRYSTALP, OB-score, and SECRET). CRYSTALP2's accuracy, MCC, and AROC range between 69.3 and 77.5%, 0.39 and 0.55, and 0.72 and 0.79, respectively. Our predictions are similar in quality and are complementary to the predictions of the most recent ParCrys and XtalPred methods. Our results also suggest that, as work in protein crystallization continues (thereby enlarging the population of proteins with known crystallization propensities), the prediction quality of the CRYSTALP2 method should increase. The prediction model and the datasets used in this contribution can be downloaded from . Conclusion CRYSTALP2 provides relatively accurate crystallization propensity predictions for a given protein chain that either outperform or complement the existing approaches. The proposed method can be used to support current efforts towards improving the success rate in obtaining

  9. The predictive capacity of personal genome sequencing.

    PubMed

    Roberts, Nicholas J; Vogelstein, Joshua T; Parmigiani, Giovanni; Kinzler, Kenneth W; Vogelstein, Bert; Velculescu, Victor E

    2012-05-01

    New DNA sequencing methods will soon make it possible to identify all germline variants in any individual at a reasonable cost. However, the ability of whole-genome sequencing to predict predisposition to common diseases in the general population is unknown. To estimate this predictive capacity, we use the concept of a "genometype." A specific genometype represents the genomes in the population conferring a specific level of genetic risk for a specified disease. Using this concept, we estimated the maximum capacity of whole-genome sequencing to identify individuals at clinically significant risk for 24 different diseases. Our estimates were derived from the analysis of large numbers of monozygotic twin pairs; twins of a pair share the same genometype and therefore identical genetic risk factors. Our analyses indicate that (i) for 23 of the 24 diseases, most of the individuals will receive negative test results; (ii) these negative test results will, in general, not be very informative, because the risk of developing 19 of the 24 diseases in those who test negative will still be, at minimum, 50 to 80% of that in the general population; and (iii) on the positive side, in the best-case scenario, more than 90% of tested individuals might be alerted to a clinically significant predisposition to at least one disease. These results have important implications for the valuation of genetic testing by industry, health insurance companies, public policy-makers, and consumers. PMID:22472521

  10. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  11. The DynaMine webserver: predicting protein dynamics from sequence.

    PubMed

    Cilia, Elisa; Pancsa, Rita; Tompa, Peter; Lenaerts, Tom; Vranken, Wim F

    2014-07-01

    Protein dynamics are important for understanding protein function. Unfortunately, accurate protein dynamics information is difficult to obtain: here we present the DynaMine webserver, which provides predictions for the fast backbone movements of proteins directly from their amino-acid sequence. DynaMine rapidly produces a profile describing the statistical potential for such movements at residue-level resolution. The predicted values have meaning on an absolute scale and go beyond the traditional binary classification of residues as ordered or disordered, thus allowing for direct dynamics comparisons between protein regions. Through this webserver, we provide molecular biologists with an efficient and easy to use tool for predicting the dynamical characteristics of any protein of interest, even in the absence of experimental observations. The prediction results are visualized and can be directly downloaded. The DynaMine webserver, including instructive examples describing the meaning of the profiles, is available at http://dynamine.ibsquare.be.

  12. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  13. Better prediction of functional effects for sequence variants

    PubMed Central

    2015-01-01

    Elucidating the effects of naturally occurring genetic variation is one of the major challenges for personalized health and personalized medicine. Here, we introduce SNAP2, a novel neural network based classifier that improves over the state-of-the-art in distinguishing between effect and neutral variants. Our method's improved performance results from screening many potentially relevant protein features and from refining our development data sets. Cross-validated on >100k experimentally annotated variants, SNAP2 significantly outperformed other methods, attaining a two-state accuracy (effect/neutral) of 83%. SNAP2 also outperformed combinations of other methods. Performance increased for human variants but much more so for other organisms. Our method's carefully calibrated reliability index informs selection of variants for experimental follow up, with the most strongly predicted half of all effect variants predicted at over 96% accuracy. As expected, the evolutionary information from automatically generated multiple sequence alignments gave the strongest signal for the prediction. However, we also optimized our new method to perform surprisingly well even without alignments. This feature reduces prediction runtime by over two orders of magnitude, enables cross-genome comparisons, and renders our new method as the best solution for the 10-20% of sequence orphans. SNAP2 is available at: https://rostlab.org/services/snap2web Definitions used Delta, input feature that results from computing the difference feature scores for native amino acid and feature scores for variant amino acid; nsSNP, non-synoymous SNP; PMD, Protein Mutant Database; SNAP, Screening for non-acceptable polymorphisms; SNP, single nucleotide polymorphism; variant, any amino acid changing sequence variant. PMID:26110438

  14. Why do Sequence Signatures Predict Enzyme Mechanism? Homology versus Chemistry

    PubMed Central

    Beattie, Kirsten E.; De Ferrari, Luna; Mitchell, John B. O.

    2015-01-01

    First, we identify InterPro sequence signatures representing evolutionary relatedness and, second, signatures identifying specific chemical machinery. Thus, we predict the chemical mechanisms of enzyme-catalyzed reactions from catalytic and non-catalytic subsets of InterPro signatures. We first scanned our 249 sequences using InterProScan and then used the MACiE database to identify those amino acid residues that are important for catalysis. The sequences were mutated in silico to replace these catalytic residues with glycine and then again scanned using InterProScan. Those signature matches from the original scan that disappeared on mutation were called catalytic. Mechanism was predicted using all signatures, only the 78 “catalytic” signatures, or only the 519 “non-catalytic” signatures. The non-catalytic signatures gave indistinguishable results from those for the whole feature set, with precision of 0.991 and sensitivity of 0.970. The catalytic signatures alone gave less impressive predictivity, with precision and sensitivity of 0.791 and 0.735, respectively. These results show that our successful prediction of enzyme mechanism is mostly by homology rather than by identifying catalytic machinery. PMID:26740739

  15. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-07-21

    A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.

  16. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.

  17. Optimization of short amino acid sequences classifier

    NASA Astrophysics Data System (ADS)

    Barcz, Aleksy; Szymański, Zbigniew

    This article describes processing methods used for short amino acid sequences classification. The data processed are 9-symbols string representations of amino acid sequences, divided into 49 data sets - each one containing samples labeled as reacting or not with given enzyme. The goal of the classification is to determine for a single enzyme, whether an amino acid sequence would react with it or not. Each data set is processed separately. Feature selection is performed to reduce the number of dimensions for each data set. The method used for feature selection consists of two phases. During the first phase, significant positions are selected using Classification and Regression Trees. Afterwards, symbols appearing at the selected positions are substituted with numeric values of amino acid properties taken from the AAindex database. In the second phase the new set of features is reduced using a correlation-based ranking formula and Gram-Schmidt orthogonalization. Finally, the preprocessed data is used for training LS-SVM classifiers. SPDE, an evolutionary algorithm, is used to obtain optimal hyperparameters for the LS-SVM classifier, such as error penalty parameter C and kernel-specific hyperparameters. A simple score penalty is used to adapt the SPDE algorithm to the task of selecting classifiers with best performance measures values.

  18. Methods for analyzing nucleic acid sequences

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid. The method provides a complex comprising a polymerase enzyme, a target nucleic acid molecule, and a primer, wherein the complex is immobilized on a support Fluorescent label is attached to a terminal phosphate group of the nucleotide or nucleotide analog. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The time duration of the signal from labeled nucleotides or nucleotide analogs that become incorporated is distinguished from freely diffusing labels by a longer retention in the observation volume for the nucleotides or nucleotide analogs that become incorporated than for the freely diffusing labels.

  19. Prediction of neddylation sites from protein sequences and sequence-derived properties

    PubMed Central

    2015-01-01

    Background Neddylation is a reversible post-translational modification that plays a vital role in maintaining cellular machinery. It is shown to affect localization, binding partners and structure of target proteins. Disruption of protein neddylation was observed in various diseases such as Alzheimer's and cancer. Therefore, understanding the neddylation mechanism and determining neddylation targets possibly bears a huge importance in further understanding the cellular processes. This study is the first attempt to predict neddylated sites from protein sequences by using several sequence and sequence-based structural features. Results We have developed a neddylation site prediction method using a support vector machine based on various sequence properties, position-specific scoring matrices, and disorder. Using 21 amino acid long lysine-centred windows, our model was able to predict neddylation sites successfully, with an average 5-fold stratified cross validation performance of 0.91, 0.91, 0.75, 0.44, 0.95 for accuracy, specificity, sensitivity, Matthew's correlation coefficient and area under curve, respectively. Independent test set results validated the robustness of reported new method. Additionally, we observed that neddylation sites are commonly flexible and there is a significant positively charged amino acid presence in neddylation sites. Conclusions In this study, a neddylation site prediction method was developed for the first time in literature. Common characteristics of neddylation sites and their discriminative properties were explored for further in silico studies on neddylation. Lastly, up-to-date neddylation dataset was provided for researchers working on post-translational modifications in the accompanying supplementary material of this article. PMID:26679222

  20. Predictive uncertainty in auditory sequence processing.

    PubMed

    Hansen, Niels Chr; Pearce, Marcus T

    2014-01-01

    Previous studies of auditory expectation have focused on the expectedness perceived by listeners retrospectively in response to events. In contrast, this research examines predictive uncertainty-a property of listeners' prospective state of expectation prior to the onset of an event. We examine the information-theoretic concept of Shannon entropy as a model of predictive uncertainty in music cognition. This is motivated by the Statistical Learning Hypothesis, which proposes that schematic expectations reflect probabilistic relationships between sensory events learned implicitly through exposure. Using probability estimates from an unsupervised, variable-order Markov model, 12 melodic contexts high in entropy and 12 melodic contexts low in entropy were selected from two musical repertoires differing in structural complexity (simple and complex). Musicians and non-musicians listened to the stimuli and provided explicit judgments of perceived uncertainty (explicit uncertainty). We also examined an indirect measure of uncertainty computed as the entropy of expectedness distributions obtained using a classical probe-tone paradigm where listeners rated the perceived expectedness of the final note in a melodic sequence (inferred uncertainty). Finally, we simulate listeners' perception of expectedness and uncertainty using computational models of auditory expectation. A detailed model comparison indicates which model parameters maximize fit to the data and how they compare to existing models in the literature. The results show that listeners experience greater uncertainty in high-entropy musical contexts than low-entropy contexts. This effect is particularly apparent for inferred uncertainty and is stronger in musicians than non-musicians. Consistent with the Statistical Learning Hypothesis, the results suggest that increased domain-relevant training is associated with an increasingly accurate cognitive model of probabilistic structure in music. PMID:25295018

  1. Predictive uncertainty in auditory sequence processing.

    PubMed

    Hansen, Niels Chr; Pearce, Marcus T

    2014-01-01

    Previous studies of auditory expectation have focused on the expectedness perceived by listeners retrospectively in response to events. In contrast, this research examines predictive uncertainty-a property of listeners' prospective state of expectation prior to the onset of an event. We examine the information-theoretic concept of Shannon entropy as a model of predictive uncertainty in music cognition. This is motivated by the Statistical Learning Hypothesis, which proposes that schematic expectations reflect probabilistic relationships between sensory events learned implicitly through exposure. Using probability estimates from an unsupervised, variable-order Markov model, 12 melodic contexts high in entropy and 12 melodic contexts low in entropy were selected from two musical repertoires differing in structural complexity (simple and complex). Musicians and non-musicians listened to the stimuli and provided explicit judgments of perceived uncertainty (explicit uncertainty). We also examined an indirect measure of uncertainty computed as the entropy of expectedness distributions obtained using a classical probe-tone paradigm where listeners rated the perceived expectedness of the final note in a melodic sequence (inferred uncertainty). Finally, we simulate listeners' perception of expectedness and uncertainty using computational models of auditory expectation. A detailed model comparison indicates which model parameters maximize fit to the data and how they compare to existing models in the literature. The results show that listeners experience greater uncertainty in high-entropy musical contexts than low-entropy contexts. This effect is particularly apparent for inferred uncertainty and is stronger in musicians than non-musicians. Consistent with the Statistical Learning Hypothesis, the results suggest that increased domain-relevant training is associated with an increasingly accurate cognitive model of probabilistic structure in music.

  2. Predictive uncertainty in auditory sequence processing

    PubMed Central

    Hansen, Niels Chr.; Pearce, Marcus T.

    2014-01-01

    Previous studies of auditory expectation have focused on the expectedness perceived by listeners retrospectively in response to events. In contrast, this research examines predictive uncertainty—a property of listeners' prospective state of expectation prior to the onset of an event. We examine the information-theoretic concept of Shannon entropy as a model of predictive uncertainty in music cognition. This is motivated by the Statistical Learning Hypothesis, which proposes that schematic expectations reflect probabilistic relationships between sensory events learned implicitly through exposure. Using probability estimates from an unsupervised, variable-order Markov model, 12 melodic contexts high in entropy and 12 melodic contexts low in entropy were selected from two musical repertoires differing in structural complexity (simple and complex). Musicians and non-musicians listened to the stimuli and provided explicit judgments of perceived uncertainty (explicit uncertainty). We also examined an indirect measure of uncertainty computed as the entropy of expectedness distributions obtained using a classical probe-tone paradigm where listeners rated the perceived expectedness of the final note in a melodic sequence (inferred uncertainty). Finally, we simulate listeners' perception of expectedness and uncertainty using computational models of auditory expectation. A detailed model comparison indicates which model parameters maximize fit to the data and how they compare to existing models in the literature. The results show that listeners experience greater uncertainty in high-entropy musical contexts than low-entropy contexts. This effect is particularly apparent for inferred uncertainty and is stronger in musicians than non-musicians. Consistent with the Statistical Learning Hypothesis, the results suggest that increased domain-relevant training is associated with an increasingly accurate cognitive model of probabilistic structure in music. PMID:25295018

  3. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid...

  4. Prediction of Secondary Structures Conserved in Multiple RNA Sequences.

    PubMed

    Xu, Zhenjiang Zech; Mathews, David H

    2016-01-01

    RNA structure is conserved by evolution to a greater extent than sequence. Predicting the conserved structure for multiple homologous sequences can be much more accurate than predicting the structure for a single sequence. RNAstructure is a software package that includes the programs Dynalign, Multilign, TurboFold, and PARTS for predicting conserved RNA secondary structure. This chapter provides protocols for using these programs. PMID:27665591

  5. Sequence-Based Prediction of Type III Secreted Proteins

    PubMed Central

    Arnold, Roland; Brandmaier, Stefan; Kleine, Frederick; Tischler, Patrick; Heinz, Eva; Behrens, Sebastian; Niinikoski, Antti; Mewes, Hans-Werner; Horn, Matthias; Rattei, Thomas

    2009-01-01

    The type III secretion system (TTSS) is a key mechanism for host cell interaction used by a variety of bacterial pathogens and symbionts of plants and animals including humans. The TTSS represents a molecular syringe with which the bacteria deliver effector proteins directly into the host cell cytosol. Despite the importance of the TTSS for bacterial pathogenesis, recognition and targeting of type III secreted proteins has up until now been poorly understood. Several hypotheses are discussed, including an mRNA-based signal, a chaperon-mediated process, or an N-terminal signal peptide. In this study, we systematically analyzed the amino acid composition and secondary structure of N-termini of 100 experimentally verified effector proteins. Based on this, we developed a machine-learning approach for the prediction of TTSS effector proteins, taking into account N-terminal sequence features such as frequencies of amino acids, short peptides, or residues with certain physico-chemical properties. The resulting computational model revealed a strong type III secretion signal in the N-terminus that can be used to detect effectors with sensitivity of ∼71% and selectivity of ∼85%. This signal seems to be taxonomically universal and conserved among animal pathogens and plant symbionts, since we could successfully detect effector proteins if the respective group was excluded from training. The application of our prediction approach to 739 complete bacterial and archaeal genome sequences resulted in the identification of between 0% and 12% putative TTSS effector proteins. Comparison of effector proteins with orthologs that are not secreted by the TTSS showed no clear pattern of signal acquisition by fusion, suggesting convergent evolutionary processes shaping the type III secretion signal. The newly developed program EffectiveT3 (http://www.chlamydiaedb.org) is the first universal in silico prediction program for the identification of novel TTSS effectors. Our findings will

  6. Computational methods in sequence and structure prediction

    NASA Astrophysics Data System (ADS)

    Lang, Caiyi

    This dissertation is organized into two parts. In the first part, we will discuss three computational methods for cis-regulatory element recognition in three different gene regulatory networks as the following: (a) Using a comprehensive "Phylogenetic Footprinting Comparison" method, we will investigate the promoter sequence structures of three enzymes (PAL, CHS and DFR) that catalyze sequential steps in the pathway from phenylalanine to anthocyanins in plants. Our result shows there exists a putative cis-regulatory element "AC(C/G)TAC(C)" in the upstream of these enzyme genes. We propose this cis-regulatory element to be responsible for the genetic regulation of these three enzymes and this element, might also be the binding site for MYB class transcription factor PAP1. (b) We will investigate the role of the Arabidopsis gene glutamate receptor 1.1 (AtGLR1.1) in C and N metabolism by utilizing the microarray data we obtained from AtGLR1.1 deficient lines (antiAtGLR1.1). We focus our investigation on the putatively co-regulated transcript profile of 876 genes we have collected in antiAtGLR1.1 lines. By (a) scanning the occurrence of several groups of known abscisic acid (ABA) related cisregulatory elements in the upstream regions of 876 Arabidopsis genes; and (b) exhaustive scanning of all possible 6-10 bps motif occurrence in the upstream regions of the same set of genes, we are able to make a quantative estimation on the enrichment level of each of the cis-regulatory element candidates. We finally conclude that one specific cis-regulatory element group, called "ABRE" elements, are statistically highly enriched within the 876-gene group as compared to their occurrence within the genome. (c) We will introduce a new general purpose algorithm, called "fuzzy REDUCE1", which we have developed recently for automated cis-regulatory element identification. In the second part, we will discuss our newly devised protein design framework. With this framework we have developed

  7. Detection of nucleic acid sequences by invader-directed cleavage

    DOEpatents

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  8. Sequence-only evolutionary and predicted structural features for the prediction of stability changes in protein mutants

    PubMed Central

    2013-01-01

    Background Even a single amino acid substitution in a protein sequence may result in significant changes in protein stability, structure, and therefore in protein function as well. In the post-genomic era, computational methods for predicting stability changes from only the sequence of a protein are of importance. While evolutionary relationships of protein mutations can be extracted from large protein databases holding millions of protein sequences, relevant evolutionary features for the prediction of stability changes have not been proposed. Also, the use of predicted structural features in situations when a protein structure is not available has not been explored. Results We proposed a number of evolutionary and predicted structural features for the prediction of stability changes and analysed which of them capture the determinants of protein stability the best. We trained and evaluated our machine learning method on a non-redundant data set of experimentally measured stability changes. When only the direction of the stability change was predicted, we found that the best performance improvement can be achieved by the combination of the evolutionary features mutation likelihood and SIFTscore in conjunction with the predicted structural feature secondary structure. The same two evolutionary features in the combination with the predicted structural feature accessible surface area achieved the lowest error when the prediction of actual values of stability changes was assessed. Compared to similar studies, our method achieved improvements in prediction performance. Conclusion Although the strongest feature for the prediction of stability changes appears to be the vector of amino acid identities in the sequential neighbourhood of the mutation, the most relevant combination of evolutionary and predicted structural features further improves prediction performance. Even the predicted structural features, which did not perform well on their own, turn out to be beneficial

  9. Gene and translation initiation site prediction in metagenomic sequences

    SciTech Connect

    Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John; Uberbacher, Edward C

    2012-01-01

    Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translation initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.

  10. Characterization and amino acid sequence of a fatty acid-binding protein from human heart.

    PubMed Central

    Offner, G D; Brecher, P; Sawlivich, W B; Costello, C E; Troxler, R F

    1988-01-01

    The complete amino acid sequence of a fatty acid-binding protein from human heart was determined by automated Edman degradation of CNBr, BNPS-skatole [3'-bromo-3-methyl-2-(2-nitrobenzenesulphenyl)indolenine], hydroxylamine, Staphylococcus aureus V8 proteinase, tryptic and chymotryptic peptides, and by digestion of the protein with carboxypeptidase A. The sequence of the blocked N-terminal tryptic peptide from citraconylated protein was determined by collisionally induced decomposition mass spectrometry. The protein contains 132 amino acid residues, is enriched with respect to threonine and lysine, lacks cysteine, has an acetylated valine residue at the N-terminus, and has an Mr of 14768 and an isoelectric point of 5.25. This protein contains two short internal repeated sequences from residues 48-54 and from residues 114-119 located within regions of predicted beta-structure and decreasing hydrophobicity. These short repeats are contained within two longer repeated regions from residues 48-60 and residues 114-125, which display 62% sequence similarity. These regions could accommodate the charged and uncharged moieties of long-chain fatty acids and may represent fatty acid-binding domains consistent with the finding that human heart fatty acid-binding protein binds 2 mol of oleate or palmitate/mol of protein. Detailed evidence for the amino acid sequences of the peptides has been deposited as Supplementary Publication SUP 50143 (23 pages) at the British Library Lending Division, Boston Spa, Yorkshire LS23 7BQ, U.K., from whom copies may be obtained as indicated in Biochem. J. (1988) 249, 5. PMID:3421901

  11. Hybridization and sequencing of nucleic acids using base pair mismatches

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  12. On combining protein sequences and nucleic acid sequences in phylogenetic analysis: the homeobox protein case.

    PubMed

    Agosti, D; Jacobs, D; DeSalle, R

    1996-01-01

    Amino acid encoding genes contain character state information that may be useful for phylogenetic analysis on at least two levels. The nucleotide sequence and the translated amino acid sequences have both been employed separately as character states for cladistic studies of various taxa, including studies of the genealogy of genes in multigene families. In essence, amino acid sequences and nucleic acid sequences are two different ways of character coding the information in a gene. Silent positions in the nucleotide sequence (first or third positions in codons that can accrue change without changing the identity of the amino acid that the triplet codes for) may accrue change relatively rapidly and become saturated, losing the pattern of historical divergence. On the other hand, non-silent nucleotide alterations and their accompanying amino acid changes may evolve too slowly to reveal relationships among closely related taxa. In general, the dynamics of sequence change in silent and non-silent positions in protein coding genes result in homoplasy and lack of resolution, respectively. We suggest that the combination of nucleic acid and the translated amino acid coded character states into the same data matrix for phylogenetic analysis addresses some of the problems caused by the rapid change of silent nucleotide positions and overall slow rate of change of non-silent nucleotide positions and slowly changing amino acid positions. One major theoretical problem with this approach is the apparent non-independence of the two sources of characters. However, there are at least three possible outcomes when comparing protein coding nucleic acid sequences with their translated amino acids in a phylogenetic context on a codon by codon basis. First, the two character sets for a codon may be entirely congruent with respect to the information they convey about the relationships of a certain set of taxa. Second, one character set may display no information concerning a phylogenetic

  13. Selection of sequence variants to improve dairy cattle genomic predictions

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genomic prediction reliabilities improved when adding selected sequence variants from run 5 of the 1,000 bull genomes project. High density (HD) imputed genotypes for 26,970 progeny tested Holstein bulls were combined with sequence variants for 444 Holstein animals. The first test included 481,904 c...

  14. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  15. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2006-07-04

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  16. Kit for detecting nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2001-01-01

    A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the

  17. Analysis and prediction of RNA-binding residues using sequence, evolutionary conservation, and predicted secondary structure and solvent accessibility.

    PubMed

    Zhang, Tuo; Zhang, Hua; Chen, Ke; Ruan, Jishou; Shen, Shiyi; Kurgan, Lukasz

    2010-11-01

    Identification and prediction of RNA-binding residues (RBRs) provides valuable insights into the mechanisms of protein-RNA interactions. We analyzed the contributions of a wide range of factors including amino acid sequence, evolutionary conservation, secondary structure and solvent accessibility, to the prediction/characterization of RBRs. Five feature sets were designed and feature selection was performed to find and investigate relevant features. We demonstrate that (1) interactions with positively charged amino acids Arg and Lys are preferred by the egatively charged nucleotides; (2) Gly provides flexibility for the RNA binding sites; (3) Glu with negatively charged side chain and several hydrophobic residues such as Leu, Val, Ala and Phe are disfavored in the RNA-binding sites; (4) coil residues, especially in long segments, are more flexible (than other secondary structures) and more likely to interact with RNA; (5) helical residues are more rigid and consequently they are less likely to bind RNA; and (6) residues partially exposed to the solvent are more likely to form RNA-binding sites. We introduce a novel sequence-based predictor of RBRs, RBRpred, which utilizes the selected features. RBRpred is comprehensively tested on three datasets with varied atom distance cutoffs by performing both five-fold cross validation and jackknife tests and achieves Matthew's correlation coefficient (MCC) of 0.51, 0.48 and 0.42, respectively. The quality is comparable to or better than that for state-of-the-art predictors that apply the distancebased cutoff definition. We show that the most important factor for RBRs prediction is evolutionary conservation, followed by the amino acid sequence, predicted secondary structure and predicted solvent accessibility. We also investigate the impact of using native vs. predicted secondary structure and solvent accessibility. The predictions are sufficient for the RBR prediction and the knowledge of the actual solvent accessibility

  18. tax and rex Sequences of bovine leukaemia virus from globally diverse isolates: rex amino acid sequence more variable than tax.

    PubMed

    McGirr, K M; Buehring, G C

    2005-02-01

    Bovine leukaemia virus (BLV) is an important agricultural problem with high costs to the dairy industry. Here, we examine the variation of the tax and rex genes of BLV. The tax and rex genes share 420 bases and have overlapping reading frames. The tax gene encodes a protein that functions as a transactivator of the BLV promoter, is required for viral replication, acts on cellular promoters, and is responsible for oncogenesis. The rex facilitates the export of viral mRNAs from the nucleus and regulates transcription. We have sequenced five new isolates of the tax/rex gene. We examined the five new and three previously published tax/rex DNA and predicted amino acid sequences of BLV isolates from cattle in representative regions worldwide. The highest variation among nucleic acid sequences for tax and rex was 7% and 5%, respectively; among predicted amino acid sequences for Tax and Rex, 9% and 11%, respectively. Significantly more nucleotide changes resulted in predicted amino acid changes in the rex gene than in the tax gene (P < or = 0.0006). This variability is higher than previously reported for any region of the viral genome. This research may also have implications for the development of Tax-based vaccines. PMID:15702995

  19. Comparison of the amino acid sequence of the major immunogen from three serotypes of foot and mouth disease virus.

    PubMed Central

    Makoff, A J; Paynter, C A; Rowlands, D J; Boothroyd, J C

    1982-01-01

    Cloned cDNA molecules from three serotypes of FMDV have been sequenced around the VP1-coding region. The predicted amino acid sequences for VP1 were compared with the published sequences and variable regions identified. The amino acid sequences were also analysed for hydrophilic regions. Two of the variable regions, numbered 129-160 and 193-204 overlapped hydrophilic regions, and were therefore identified as potentially immunogenic. These regions overlap regions shown by others to be immunogenic. PMID:6298715

  20. Solid phase sequencing of double-stranded nucleic acids

    DOEpatents

    Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

    2002-01-01

    This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.

  1. CRYSpred: accurate sequence-based protein crystallization propensity prediction using sequence-derived structural characteristics.

    PubMed

    Mizianty, Marcin J; Kurgan, Lukasz A

    2012-01-01

    Relatively low success rates of X-ray crystallography, which is the most popular method for solving proteins structures, motivate development of novel methods that support selection of tractable protein targets. This aspect is particularly important in the context of the current structural genomics efforts that allow for a certain degree of flexibility in the target selection. We propose CRYSpred, a novel in-silico crystallization propensity predictor that uses a set of 15 novel features which utilize a broad range of inputs including charge, hydrophobicity, and amino acid composition derived from the protein chain, and the solvent accessibility and disorder predicted from the protein sequence. Our method outperforms seven modern crystallization propensity predictors on three, independent from training dataset, benchmark test datasets. The strong predictive performance offered by the CRYSpred is attributed to the careful design of the features, utilization of the comprehensive set of inputs, and the usage of the Support Vector Machine classifier. The inputs utilized by CRYSpred are well-aligned with the existing rules-of-thumb that are used in the structural genomics studies. PMID:21919861

  2. CRYSpred: accurate sequence-based protein crystallization propensity prediction using sequence-derived structural characteristics.

    PubMed

    Mizianty, Marcin J; Kurgan, Lukasz A

    2012-01-01

    Relatively low success rates of X-ray crystallography, which is the most popular method for solving proteins structures, motivate development of novel methods that support selection of tractable protein targets. This aspect is particularly important in the context of the current structural genomics efforts that allow for a certain degree of flexibility in the target selection. We propose CRYSpred, a novel in-silico crystallization propensity predictor that uses a set of 15 novel features which utilize a broad range of inputs including charge, hydrophobicity, and amino acid composition derived from the protein chain, and the solvent accessibility and disorder predicted from the protein sequence. Our method outperforms seven modern crystallization propensity predictors on three, independent from training dataset, benchmark test datasets. The strong predictive performance offered by the CRYSpred is attributed to the careful design of the features, utilization of the comprehensive set of inputs, and the usage of the Support Vector Machine classifier. The inputs utilized by CRYSpred are well-aligned with the existing rules-of-thumb that are used in the structural genomics studies.

  3. Cerebellar sequencing: a trick for predicting the future.

    PubMed

    Leggio, M; Molinari, M

    2015-02-01

    "Looking into the future" well depicts one of the most significant concepts in cognitive neuroscience: the brain is constantly predicting future events. Such directedness toward the future has been recognized to be relevant to and beneficial for many aspects of information processing in humans, such as perception, motor and cognitive control, decision-making, theory of mind, and other cognitive processes. Because one of the most adaptive characteristics of the brain is to correct errors, the ability to look into the future represents the best chance to avoid repeating errors. Within the structures that constitute the "predictive brain," the cerebellum has been proposed to have a central function, based on its ability to generate internal models. We suggested that "sequence detection" is the operational mode of the cerebellum in predictive processing. According to this hypothesis, the cerebellum detects and simulates repetitive patterns of temporally or spatially structured events and generates internal models that can be used to make predictions. Consequently, we demonstrate that the cerebellum recognizes serial events as a sequence, detects a sequence violation, and successfully reconstructs the correct sequence of events. Thus, we hypothesize that pattern detection and prediction and processing of anticipation are cerebellum-specific functions within the brain and that the sequence detection hypothesis links the multifarious impairments that are reported in patients with cerebellar damage. We propose that this cerebellar operational mode can advance our understanding of the pathophysiological mechanisms in various clinical conditions, such as schizophrenia and autism.

  4. Cerebellar sequencing: a trick for predicting the future.

    PubMed

    Leggio, M; Molinari, M

    2015-02-01

    "Looking into the future" well depicts one of the most significant concepts in cognitive neuroscience: the brain is constantly predicting future events. Such directedness toward the future has been recognized to be relevant to and beneficial for many aspects of information processing in humans, such as perception, motor and cognitive control, decision-making, theory of mind, and other cognitive processes. Because one of the most adaptive characteristics of the brain is to correct errors, the ability to look into the future represents the best chance to avoid repeating errors. Within the structures that constitute the "predictive brain," the cerebellum has been proposed to have a central function, based on its ability to generate internal models. We suggested that "sequence detection" is the operational mode of the cerebellum in predictive processing. According to this hypothesis, the cerebellum detects and simulates repetitive patterns of temporally or spatially structured events and generates internal models that can be used to make predictions. Consequently, we demonstrate that the cerebellum recognizes serial events as a sequence, detects a sequence violation, and successfully reconstructs the correct sequence of events. Thus, we hypothesize that pattern detection and prediction and processing of anticipation are cerebellum-specific functions within the brain and that the sequence detection hypothesis links the multifarious impairments that are reported in patients with cerebellar damage. We propose that this cerebellar operational mode can advance our understanding of the pathophysiological mechanisms in various clinical conditions, such as schizophrenia and autism. PMID:25331541

  5. Prediction of Human Activity by Discovering Temporal Sequence Patterns.

    PubMed

    Li, Kang; Fu, Yun

    2014-08-01

    Early prediction of ongoing human activity has become more valuable in a large variety of time-critical applications. To build an effective representation for prediction, human activities can be characterized by a complex temporal composition of constituent simple actions and interacting objects. Different from early detection on short-duration simple actions, we propose a novel framework for long -duration complex activity prediction by discovering three key aspects of activity: Causality, Context-cue, and Predictability. The major contributions of our work include: (1) a general framework is proposed to systematically address the problem of complex activity prediction by mining temporal sequence patterns; (2) probabilistic suffix tree (PST) is introduced to model causal relationships between constituent actions, where both large and small order Markov dependencies between action units are captured; (3) the context-cue, especially interactive objects information, is modeled through sequential pattern mining (SPM), where a series of action and object co-occurrence are encoded as a complex symbolic sequence; (4) we also present a predictive accumulative function (PAF) to depict the predictability of each kind of activity. The effectiveness of our approach is evaluated on two experimental scenarios with two data sets for each: action-only prediction and context-aware prediction. Our method achieves superior performance for predicting global activity classes and local action units. PMID:26353344

  6. Prediction of fine-tuned promoter activity from DNA sequence.

    PubMed

    Siwo, Geoffrey; Rider, Andrew; Tan, Asako; Pinapati, Richard; Emrich, Scott; Chawla, Nitesh; Ferdig, Michael

    2016-01-01

    The quantitative prediction of transcriptional activity of genes using promoter sequence is fundamental to the engineering of biological systems for industrial purposes and understanding the natural variation in gene expression. To catalyze the development of new algorithms for this purpose, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized a community challenge seeking predictive models of promoter activity given normalized promoter activity data for 90 ribosomal protein promoters driving expression of a fluorescent reporter gene. By developing an unbiased modeling approach that performs an iterative search for predictive DNA sequence features using the frequencies of various k-mers, inferred DNA mechanical properties and spatial positions of promoter sequences, we achieved the best performer status in this challenge. The specific predictive features used in the model included the frequency of the nucleotide G, the length of polymeric tracts of T and TA, the frequencies of 6 distinct trinucleotides and 12 tetranucleotides, and the predicted protein deformability of the DNA sequence. Our method accurately predicted the activity of 20 natural variants of ribosomal protein promoters (Spearman correlation r = 0.73) as compared to 33 laboratory-mutated variants of the promoters (r = 0.57) in a test set that was hidden from participants. Notably, our model differed substantially from the rest in 2 main ways: i) it did not explicitly utilize transcription factor binding information implying that subtle DNA sequence features are highly associated with gene expression, and ii) it was entirely based on features extracted exclusively from the 100 bp region upstream from the translational start site demonstrating that this region encodes much of the overall promoter activity. The findings from this study have important implications for the engineering of predictable gene expression systems and the evolution of gene expression in naturally occurring

  7. Prediction of fine-tuned promoter activity from DNA sequence

    PubMed Central

    Siwo, Geoffrey; Rider, Andrew; Tan, Asako; Pinapati, Richard; Emrich, Scott; Chawla, Nitesh; Ferdig, Michael

    2016-01-01

    The quantitative prediction of transcriptional activity of genes using promoter sequence is fundamental to the engineering of biological systems for industrial purposes and understanding the natural variation in gene expression. To catalyze the development of new algorithms for this purpose, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized a community challenge seeking predictive models of promoter activity given normalized promoter activity data for 90 ribosomal protein promoters driving expression of a fluorescent reporter gene. By developing an unbiased modeling approach that performs an iterative search for predictive DNA sequence features using the frequencies of various k-mers, inferred DNA mechanical properties and spatial positions of promoter sequences, we achieved the best performer status in this challenge. The specific predictive features used in the model included the frequency of the nucleotide G, the length of polymeric tracts of T and TA, the frequencies of 6 distinct trinucleotides and 12 tetranucleotides, and the predicted protein deformability of the DNA sequence. Our method accurately predicted the activity of 20 natural variants of ribosomal protein promoters (Spearman correlation r = 0.73) as compared to 33 laboratory-mutated variants of the promoters (r = 0.57) in a test set that was hidden from participants. Notably, our model differed substantially from the rest in 2 main ways: i) it did not explicitly utilize transcription factor binding information implying that subtle DNA sequence features are highly associated with gene expression, and ii) it was entirely based on features extracted exclusively from the 100 bp region upstream from the translational start site demonstrating that this region encodes much of the overall promoter activity. The findings from this study have important implications for the engineering of predictable gene expression systems and the evolution of gene expression in naturally occurring

  8. From Artificial Amino Acids to Sequence-Defined Targeted Oligoaminoamides.

    PubMed

    Morys, Stephan; Wagner, Ernst; Lächelt, Ulrich

    2016-01-01

    Artificial oligoamino acids with appropriate protecting groups can be used for the sequential assembly of oligoaminoamides on solid-phase. With the help of these oligoamino acids multifunctional nucleic acid (NA) carriers can be designed and produced in highly defined topologies. Here we describe the synthesis of the artificial oligoamino acid Fmoc-Stp(Boc3)-OH, the subsequent assembly into sequence-defined oligomers and the formulation of tumor-targeted plasmid DNA (pDNA) polyplexes. PMID:27436323

  9. Prediction of glutathionylation sites in proteins using minimal sequence information and their experimental validation.

    PubMed

    Pal, Debojyoti; Sharma, Deepak; Kumar, Mukesh; Sandur, Santosh K

    2016-09-01

    S-glutathionylation of proteins plays an important role in various biological processes and is known to be protective modification during oxidative stress. Since, experimental detection of S-glutathionylation is labor intensive and time consuming, bioinformatics based approach is a viable alternative. Available methods require relatively longer sequence information, which may prevent prediction if sequence information is incomplete. Here, we present a model to predict glutathionylation sites from pentapeptide sequences. It is based upon differential association of amino acids with glutathionylated and non-glutathionylated cysteines from a database of experimentally verified sequences. This data was used to calculate position dependent F-scores, which measure how a particular amino acid at a particular position may affect the likelihood of glutathionylation event. Glutathionylation-score (G-score), indicating propensity of a sequence to undergo glutathionylation, was calculated using position-dependent F-scores for each amino-acid. Cut-off values were used for prediction. Our model returned an accuracy of 58% with Matthew's correlation-coefficient (MCC) value of 0.165. On an independent dataset, our model outperformed the currently available model, in spite of needing much less sequence information. Pentapeptide motifs having high abundance among glutathionylated proteins were identified. A list of potential glutathionylation hotspot sequences were obtained by assigning G-scores and subsequent Protein-BLAST analysis revealed a total of 254 putative glutathionable proteins, a number of which were already known to be glutathionylated. Our model predicted glutathionylation sites in 93.93% of experimentally verified glutathionylated proteins. Outcome of this study may assist in discovering novel glutathionylation sites and finding candidate proteins for glutathionylation. PMID:27454891

  10. Segments of amino acid sequence similarity in beta-amylases.

    PubMed

    Friedberg, F; Rhodes, C

    1988-01-01

    In alpha-amylases from animals, plants and bacteria and in beta-amylases from plants and bacteria a number of segments exhibit amino acid sequence similarity specific to the alpha or to the beta type, respectively. In the case of the beta-amylases the similar sequence regions are extensive and they are disrupted only by short interspersed dissimilar regions. Close to the C terminus, however, no such sequence similarity exist. PMID:2464171

  11. Bayesian classification for promoter prediction in human DNA sequences

    NASA Astrophysics Data System (ADS)

    Bercher, J.-F.; Jardin, P.; Duriez, B.

    2006-11-01

    Many Computational methods are yet available for data retrieval and analysis of genomic sequences, but some functional sites are difficult to characterize. In this work, we examine the problem of promoter localization in human DNA sequences. Promoters are regulatory regions that governs the expression of genes, and their prediction is reputed difficult, so that this issue is still open. We present the Chaos Game representation (CGR) of DNA sequences which has many interesting properties, and the notion of `genomic signature' that proved relevant in phylogeny applications. Based on this notion, we develop a (naïve) bayesian classifier, evaluate its performances, and show that its adaptive implementation enable to reveal or assess core-promoter positions along a DNA sequence.

  12. Amino acid sequence of a new mitochondrially synthesized proteolipid of the ATP synthase of Saccharomyces cerevisiae.

    PubMed Central

    Velours, J; Esparza, M; Hoppe, J; Sebald, W; Guerin, B

    1984-01-01

    The purification and the amino acid sequence of a proteolipid translated on ribosomes in yeast mitochondria is reported. This protein, which is a subunit of the ATP synthase, was purified by extraction with chloroform/methanol (2/1) and subsequent chromatography on phosphocellulose and reverse phase h.p.l.c. A mol. wt. of 5500 was estimated by chromatography on Bio-Gel P-30 in 80% formic acid. The complete amino acid sequence of this protein was determined by automated solid phase Edman degradation of the whole protein and of fragments obtained after cleavage with cyanogen bromide. The sequence analysis indicates a length of 48 amino acid residues. The calculated mol. wt. of 5870 corresponds to the value found by gel chromatography. This polypeptide contains three basic residues and no negatively charged side chain. The three basic residues are clustered at the C terminus. The primary structure of this protein is in full agreement with the predicted amino acid sequence of the putative polypeptide encoded by the mitochondrial aap1 gene recently discovered in Saccharomyces cerevisiae. Moreover, this protein shows 50% homology with the amino acid sequence of a putative polypeptide encoded by an unidentified reading frame also discovered near the mitochondrial ATPase subunit 6 gene in Aspergillus nidulans. Images Fig. 2. PMID:6323165

  13. Translating HIV sequences into quantitative fitness landscapes predicts viral vulnerabilities for rational immunogen design.

    PubMed

    Ferguson, Andrew L; Mann, Jaclyn K; Omarjee, Saleha; Ndung'u, Thumbi; Walker, Bruce D; Chakraborty, Arup K

    2013-03-21

    A prophylactic or therapeutic vaccine offers the best hope to curb the HIV-AIDS epidemic gripping sub-Saharan Africa, but it remains elusive. A major challenge is the extreme viral sequence variability among strains. Systematic means to guide immunogen design for highly variable pathogens like HIV are not available. Using computational models, we have developed an approach to translate available viral sequence data into quantitative landscapes of viral fitness as a function of the amino acid sequences of its constituent proteins. Predictions emerging from our computationally defined landscapes for the proteins of HIV-1 clade B Gag were positively tested against new in vitro fitness measurements and were consistent with previously defined in vitro measurements and clinical observations. These landscapes chart the peaks and valleys of viral fitness as protein sequences change and inform the design of immunogens and therapies that can target regions of the virus most vulnerable to selection pressure.

  14. Cofactory: sequence-based prediction of cofactor specificity of Rossmann folds.

    PubMed

    Geertz-Hansen, Henrik Marcus; Blom, Nikolaj; Feist, Adam M; Brunak, Søren; Petersen, Thomas Nordahl

    2014-09-01

    Obtaining optimal cofactor balance to drive production is a challenge in metabolically engineered microbial production strains. To facilitate identification of heterologous enzymes with desirable altered cofactor requirements from native content, we have developed Cofactory, a method for prediction of enzyme cofactor specificity using only primary amino acid sequence information. The algorithm identifies potential cofactor binding Rossmann folds and predicts the specificity for the cofactors FAD(H2), NAD(H), and NADP(H). The Rossmann fold sequence search is carried out using hidden Markov models whereas artificial neural networks are used for specificity prediction. Training was carried out using experimental data from protein-cofactor structure complexes. The overall performance was benchmarked against an independent evaluation set obtaining Matthews correlation coefficients of 0.94, 0.79, and 0.65 for FAD(H2), NAD(H), and NADP(H), respectively. The Cofactory method is made publicly available at http://www.cbs.dtu.dk/services/Cofactory.

  15. Amino acid sequences of proteins from Leptospira serovar pomona.

    PubMed

    Alves, S F; Lefebvre, R B; Probert, W

    2000-01-01

    This report describes a partial amino acid sequences from three putative outer envelope proteins from Leptospira serovar pomona. In order to obtain internal fragments for protein sequencing, enzymatic and chemical digestion was performed. The enzyme clostripain was used to digest the proteins 32 and 45 kDa. In situ digestion of 40 kDa molecular weight protein was accomplished using cyanogen bromide. The 32 kDa protein generated two fragments, one of 21 kDa and another of 10 kDa that yielded five residues. A fragment of 24 kDa that yielded nineteen residues of amino acids was obtained from 45 kDa protein. A fragment with a molecular weight of 20 kDa, yielding a twenty amino acids sequence from the 40 kDa protein.

  16. The amino acid sequence of Staphylococcus aureus penicillinase.

    PubMed Central

    Ambler, R P

    1975-01-01

    The amino acid sequence of the penicillinase (penicillin amido-beta-lactamhydrolase, EC 3.5.2.6) from Staphylococcus aureus strain PC1 was determined. The protein consists of a single polypeptide chain of 257 residues, and the sequence was determined by characterization of tryptic, chymotryptic, peptic and CNBr peptides, with some additional evidence from thermolysin and S. aureus proteinase peptides. A mistake in the preliminary report of the sequence is corrected; residues 113-116 are now thought to be -Lys-Lys-Val-Lys- rather than -Lys-Val-Lys-Lys-. Detailed evidence for the amino acid sequence has been deposited as Supplementary Publication SUP 50056 (91 pages) at the British Library (Lending Division), Boston Spa, Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1975) 145, 5. PMID:1218078

  17. Draft genome sequence of the docosahexaenoic acid producing thraustochytrid Aurantiochytrium sp. T66.

    PubMed

    Liu, Bin; Ertesvåg, Helga; Aasen, Inga Marie; Vadstein, Olav; Brautaset, Trygve; Heggeset, Tonje Marita Bjerkan

    2016-06-01

    Thraustochytrids are unicellular, marine protists, and there is a growing industrial interest in these organisms, particularly because some species, including strains belonging to the genus Aurantiochytrium, accumulate high levels of docosahexaenoic acid (DHA). Here, we report the draft genome sequence of Aurantiochytrium sp. T66 (ATCC PRA-276), with a size of 43 Mbp, and 11,683 predicted protein-coding sequences. The data has been deposited at DDBJ/EMBL/Genbank under the accession LNGJ00000000. The genome sequence will contribute new insight into DHA biosynthesis and regulation, providing a basis for metabolic engineering of thraustochytrids. PMID:27222814

  18. Draft genome sequence of the docosahexaenoic acid producing thraustochytrid Aurantiochytrium sp. T66.

    PubMed

    Liu, Bin; Ertesvåg, Helga; Aasen, Inga Marie; Vadstein, Olav; Brautaset, Trygve; Heggeset, Tonje Marita Bjerkan

    2016-06-01

    Thraustochytrids are unicellular, marine protists, and there is a growing industrial interest in these organisms, particularly because some species, including strains belonging to the genus Aurantiochytrium, accumulate high levels of docosahexaenoic acid (DHA). Here, we report the draft genome sequence of Aurantiochytrium sp. T66 (ATCC PRA-276), with a size of 43 Mbp, and 11,683 predicted protein-coding sequences. The data has been deposited at DDBJ/EMBL/Genbank under the accession LNGJ00000000. The genome sequence will contribute new insight into DHA biosynthesis and regulation, providing a basis for metabolic engineering of thraustochytrids.

  19. Prediction of ribosome footprint profile shapes from transcript sequences

    PubMed Central

    Liu, Tzu-Yu; Song, Yun S.

    2016-01-01

    Motivation: Ribosome profiling is a useful technique for studying translational dynamics and quantifying protein synthesis. Applications of this technique have shown that ribosomes are not uniformly distributed along mRNA transcripts. Understanding how each transcript-specific distribution arises is important for unraveling the translation mechanism. Results: Here, we apply kernel smoothing to construct predictive features and build a sparse model to predict the shape of ribosome footprint profiles from transcript sequences alone. Our results on Saccharomyces cerevisiae data show that the marginal ribosome densities can be predicted with high accuracy. The proposed novel method has a wide range of applications, including inferring isoform-specific ribosome footprints, designing transcripts with fast translation speeds and discovering unknown modulation during translation. Availability and implementation: A software package called riboShape is freely available at https://sourceforge.net/projects/riboshape Contact: yss@berkeley.edu PMID:27307616

  20. Unique sequences and predicted functions of myosins in Tetrahymena thermophila.

    PubMed

    Sugita, Maki; Iwataki, Yoshinori; Nakano, Kentaro; Numata, Osamu

    2011-07-01

    Myosins are eukaryotic actin-dependent molecular motors that play important roles in many cellular events. The function of each myosin is determined by a variety of functional domains in its tail region. In some major model organisms, the functions and properties of myosins have been investigated based on their amino acid sequences. However, in protists, myosins have been little studied beyond the level of genome sequences. We therefore investigated the mRNA expression levels and amino acid sequences of 13 myosin genes in the ciliate Tetrahymena thermophila. This study is an overview of myosins in T. thermophila, which has no typical myosins, such as class I, II, or V myosins. We showed that all 13 myosins were expressed in vegetative cells. Furthermore, these myosins could be divided into 3 subclasses based on four functional domains in their tail regions. Subclass 1 comprised of 8 myosins has both MyTH4 and FERM domains, and has a potential to function in vesicle transport or anchoring between membrane and actin filaments. Subclass 2 comprised of 4 myosins has RCC1 (regulator of chromosome condensation 1) domains, which are found only in some protists, and may have unconventional features. Subclass 3 is comprised of one myosin, which has a long coiled-coil domain like class II myosin. In addition, phylogenetic analysis on the basis of motor domains showed that T. thermophila myosins are separated into two clusters: one consists of subclasses 1 and 2, and the other consists of subclass 3.

  1. Sequencing and computational analysis of complete genome sequences of Citrus yellow mosaic badna virus from acid lime and pummelo.

    PubMed

    Borah, Basanta K; Johnson, A M Anthony; Sai Gopal, D V R; Dasgupta, Indranil

    2009-08-01

    Citrus yellow mosaic badna virus (CMBV), a member of the Family Caulimoviridae, Genus Badnavirus, is the causative agent of Citrus mosaic disease in India. Although the virus has been detected in several citrus species, only two full-length genomes, one each from Sweet orange and Rangpur lime, are available in publicly accessible databases. In order to obtain a better understanding of the genetic variability of the virus in other citrus mosaic-affected citrus species, we performed the cloning and sequence analysis of complete genomes of CMBV from two additional citrus species, Acid lime and Pummelo. We show that CMBV genomes from the two hosts share high homology with previously reported CMBV sequences and hence conclude that the new isolates represent variants of the virus present in these species. Based on in silico sequence analysis, we predict the possible function of the protein encoded by one of the five ORFs.

  2. The amino-acid sequence of kangaroo pancreatic ribonuclease.

    PubMed

    Gaastra, W; Welling, G W; Beintema, J J

    1978-05-01

    Red kangaroo (Macropus rufus) ribonuclease was isolated from pancreatic tissue by affinity chromatography. The amino acid sequence was determined by automatic sequencing of overlapping large fragments and by analysis of shorter peptides obtained by digestion with a number of proteolytic enzymes. The polypeptide chain consists of 122 amino acid residues. Compared to other ribonucleases, the N-terminal residue and residue 114 are deleted. In other pancreatic ribonucleases position 114 is occupied by a cis proline residue in an external loop at the surface of the molecule. Other remarkable substitutions are the presence of a tyrosine residue at position 123 instead of a serine which forms a hydrogen bond with the pyrimidine ring of a nucleotide substrate, and a number of hydrophobichydrophilic interchanges in the sequence 51-55, which forms part of an alpha-helix in bovine ribonuclease and exhibits few substitutions in the placental mammals. Kangaroo ribonuclease contains no carbohydrate, although the enzyme possesses a recognition site for carbohydrate attachment in the sequence Asn-Val-Thr (62-64). The enzyme differs at about 35-40% of the positions from all other mammalian pancreatic ribonucleases sequenced to date, which is in agreement with the early divergence between the marsupials and the placental mammals. From fragmentary data a tentative sequence of red-necked wallaby (Macropus rufogriseus) pancreatic ribonuclease has been derived. Eight differences with the kangaroo sequence were found.

  3. Prebiotically plausible mechanisms increase compositional diversity of nucleic acid sequences

    PubMed Central

    Derr, Julien; Manapat, Michael L.; Rajamani, Sudha; Leu, Kevin; Xulvi-Brunet, Ramon; Joseph, Isaac; Nowak, Martin A.; Chen, Irene A.

    2012-01-01

    During the origin of life, the biological information of nucleic acid polymers must have increased to encode functional molecules (the RNA world). Ribozymes tend to be compositionally unbiased, as is the vast majority of possible sequence space. However, ribonucleotides vary greatly in synthetic yield, reactivity and degradation rate, and their non-enzymatic polymerization results in compositionally biased sequences. While natural selection could lead to complex sequences, molecules with some activity are required to begin this process. Was the emergence of compositionally diverse sequences a matter of chance, or could prebiotically plausible reactions counter chemical biases to increase the probability of finding a ribozyme? Our in silico simulations using a two-letter alphabet show that template-directed ligation and high concatenation rates counter compositional bias and shift the pool toward longer sequences, permitting greater exploration of sequence space and stable folding. We verified experimentally that unbiased DNA sequences are more efficient templates for ligation, thus increasing the compositional diversity of the pool. Our work suggests that prebiotically plausible chemical mechanisms of nucleic acid polymerization and ligation could predispose toward a diverse pool of longer, potentially structured molecules. Such mechanisms could have set the stage for the appearance of functional activity very early in the emergence of life. PMID:22319215

  4. The amino-acid sequence of kangaroo pancreatic ribonuclease.

    PubMed

    Gaastra, W; Welling, G W; Beintema, J J

    1978-05-01

    Red kangaroo (Macropus rufus) ribonuclease was isolated from pancreatic tissue by affinity chromatography. The amino acid sequence was determined by automatic sequencing of overlapping large fragments and by analysis of shorter peptides obtained by digestion with a number of proteolytic enzymes. The polypeptide chain consists of 122 amino acid residues. Compared to other ribonucleases, the N-terminal residue and residue 114 are deleted. In other pancreatic ribonucleases position 114 is occupied by a cis proline residue in an external loop at the surface of the molecule. Other remarkable substitutions are the presence of a tyrosine residue at position 123 instead of a serine which forms a hydrogen bond with the pyrimidine ring of a nucleotide substrate, and a number of hydrophobichydrophilic interchanges in the sequence 51-55, which forms part of an alpha-helix in bovine ribonuclease and exhibits few substitutions in the placental mammals. Kangaroo ribonuclease contains no carbohydrate, although the enzyme possesses a recognition site for carbohydrate attachment in the sequence Asn-Val-Thr (62-64). The enzyme differs at about 35-40% of the positions from all other mammalian pancreatic ribonucleases sequenced to date, which is in agreement with the early divergence between the marsupials and the placental mammals. From fragmentary data a tentative sequence of red-necked wallaby (Macropus rufogriseus) pancreatic ribonuclease has been derived. Eight differences with the kangaroo sequence were found. PMID:658039

  5. AcalPred: a sequence-based tool for discriminating between acidic and alkaline enzymes.

    PubMed

    Lin, Hao; Chen, Wei; Ding, Hui

    2013-01-01

    The structure and activity of enzymes are influenced by pH value of their surroundings. Although many enzymes work well in the pH range from 6 to 8, some specific enzymes have good efficiencies only in acidic (pH<5) or alkaline (pH>9) solution. Studies have demonstrated that the activities of enzymes correlate with their primary sequences. It is crucial to judge enzyme adaptation to acidic or alkaline environment from its amino acid sequence in molecular mechanism clarification and the design of high efficient enzymes. In this study, we developed a sequence-based method to discriminate acidic enzymes from alkaline enzymes. The analysis of variance was used to choose the optimized discriminating features derived from g-gap dipeptide compositions. And support vector machine was utilized to establish the prediction model. In the rigorous jackknife cross-validation, the overall accuracy of 96.7% was achieved. The method can correctly predict 96.3% acidic and 97.1% alkaline enzymes. Through the comparison between the proposed method and previous methods, it is demonstrated that the proposed method is more accurate. On the basis of this proposed method, we have built an online web-server called AcalPred which can be freely accessed from the website (http://lin.uestc.edu.cn/server/AcalPred). We believe that the AcalPred will become a powerful tool to study enzyme adaptation to acidic or alkaline environment.

  6. AcalPred: A Sequence-Based Tool for Discriminating between Acidic and Alkaline Enzymes

    PubMed Central

    Lin, Hao; Chen, Wei; Ding, Hui

    2013-01-01

    The structure and activity of enzymes are influenced by pH value of their surroundings. Although many enzymes work well in the pH range from 6 to 8, some specific enzymes have good efficiencies only in acidic (pH<5) or alkaline (pH>9) solution. Studies have demonstrated that the activities of enzymes correlate with their primary sequences. It is crucial to judge enzyme adaptation to acidic or alkaline environment from its amino acid sequence in molecular mechanism clarification and the design of high efficient enzymes. In this study, we developed a sequence-based method to discriminate acidic enzymes from alkaline enzymes. The analysis of variance was used to choose the optimized discriminating features derived from g-gap dipeptide compositions. And support vector machine was utilized to establish the prediction model. In the rigorous jackknife cross-validation, the overall accuracy of 96.7% was achieved. The method can correctly predict 96.3% acidic and 97.1% alkaline enzymes. Through the comparison between the proposed method and previous methods, it is demonstrated that the proposed method is more accurate. On the basis of this proposed method, we have built an online web-server called AcalPred which can be freely accessed from the website (http://lin.uestc.edu.cn/server/AcalPred). We believe that the AcalPred will become a powerful tool to study enzyme adaptation to acidic or alkaline environment. PMID:24130738

  7. Development of an expert system for amino acid sequence identification.

    PubMed

    Hu, L; Saulinskas, E F; Johnson, P; Harrington, P B

    1996-08-01

    An expert system for amino acid sequence identification has been developed. The algorithm uses heuristic rules developed by human experts in protein sequencing. The system is applied to the chromatographic data of phenylthiohydantoin-amino acids acquired from an automated sequencer. The peak intensities in the current cycle are compared with those in the previous cycle, while the calibration and succeeding cycles are used as ancillary identification criteria when necessary. The retention time for each chromatographic peak in each cycle is corrected by the corresponding peak in the calibration cycle at the same run. The main improvement of our system compared with the onboard software used by the Applied Biosystems 477A Protein/Peptide Sequencer is that each peak in each cycle is assigned an identification name according to the corrected retention time to be used for the comparison with different cycles. The system was developed from analyses of ribonuclease A and evaluated by runs of four other protein samples that were not used in rule development. This paper demonstrates that rules developed by human experts can be automatically applied to sequence assignment. The expert system performed more accurately than the onboard software of the protein sequencer, in that the misidentification rates for the expert system were around 7%, whereas those for the onboard software were between 13 and 21%.

  8. BOOGIE: Predicting Blood Groups from High Throughput Sequencing Data

    PubMed Central

    Giollo, Manuel; Minervini, Giovanni; Scalzotto, Marta; Leonardi, Emanuela; Ferrari, Carlo; Tosatto, Silvio C. E.

    2015-01-01

    Over the last decade, we have witnessed an incredible growth in the amount of available genotype data due to high throughput sequencing (HTS) techniques. This information may be used to predict phenotypes of medical relevance, and pave the way towards personalized medicine. Blood phenotypes (e.g. ABO and Rh) are a purely genetic trait that has been extensively studied for decades, with currently over thirty known blood groups. Given the public availability of blood group data, it is of interest to predict these phenotypes from HTS data which may translate into more accurate blood typing in clinical practice. Here we propose BOOGIE, a fast predictor for the inference of blood groups from single nucleotide variant (SNV) databases. We focus on the prediction of thirty blood groups ranging from the well known ABO and Rh, to the less studied Junior or Diego. BOOGIE correctly predicted the blood group with 94% accuracy for the Personal Genome Project whole genome profiles where good quality SNV annotation was available. Additionally, our tool produces a high quality haplotype phase, which is of interest in the context of ethnicity-specific polymorphisms or traits. The versatility and simplicity of the analysis make it easily interpretable and allow easy extension of the protocol towards other phenotypes. BOOGIE can be downloaded from URL http://protein.bio.unipd.it/download/. PMID:25893845

  9. Evaluation of genotypic prediction of HIV-1 tropism using population sequencing of replicates.

    PubMed

    Ferreira, Joao Leandro de Paula; Coelho, Luana Portes Ozorio; Rodrigues, Rosangela; Cabral, Gabriela Bastos; Cavalcanti, Jaqueline de Souza; Guimaraes, Paula Morena de Souza; Brigido, Luis Fernando de Macedo

    2012-02-01

    Determination of human immunodeficiency virus tropism has contributed to the understanding of the pathogenesis of HIV and is necessary prior to the use of CCR5 antagonists. Replicate V3 sequences may generate different sequences and improve viral tropism prediction. The diversity of HIV was evaluated to access its influence on prediction. Plasma RNA was retro-transcribed and amplified using a one-step protocol, followed by nested PCR and sequencing using an ABI3130XL. Eighty-one patients, 74% male and 26% female, with a median age of 44 years had either a single sequence (n=50) or 2-4 replicates (n=31) evaluated. Most patients (92%) had used multiple anti-retroviral regimens. Tropism prediction was performed using the Geno2pheno clonal option. The number of ambiguous nucleotides, the deduced non-synonymous amino acids at V3 and the genetic distance were quantified. Using a 20% false positive rate (FPR) cut-off, 41/81 (50.6%) was predicted as X4. TCD4 was lower, 226 cells/mm(3) (IQR 82-378), in patients infected with X4; TCD4 for R5 was 324 cells/mm(3) (IQR 200-538, p<0.05). The number of ambiguous nucleotides correlated with a lower FPR value (p<0.0027). Although different sequences may be generated, the number of replicates was not associated to a lower FPR or X4 assignment, and may allow a better prediction of this biological characteristic. Ambiguous nucleotides correlate inversely to a lower FPR.

  10. Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder

    PubMed Central

    Lorenzo, J. Ramiro; Alonso, Leonardo G.; Sánchez, Ignacio E.

    2015-01-01

    Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage “Protein and nucleic acid structure and sequence analysis”. PMID:26674530

  11. Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder.

    PubMed

    Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E

    2015-01-01

    Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".

  12. Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features

    PubMed Central

    Mohammad-Noori, Morteza; Beer, Michael A.

    2014-01-01

    Abstract Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem. PMID:25033408

  13. The genome of RNA tumor viruses contains polyadenylic acid sequences.

    PubMed

    Green, M; Cartas, M

    1972-04-01

    The 70S genome of two RNA tumor viruses, murine sarcoma virus and avian myeloblastosis virus, binds to Millipore filters in buffer with high salt concentration and to glass fiber filters containing poly(U). These observations suggest that 70S RNA contains adenylic acid-rich sequences. When digested by pancreatic RNase, 70S RNA of murine sarcoma virus yielded poly(A) sequences that contain 91% adenylic acid. These poly(A) sequences sedimented as a relatively homogenous peak in sucrose gradients with a sedimentation coefficient of 4-5 S, but had a mobility during polyacrylamide gel electrophoresis that corresponds to molecules that sediment at 6-7 S. If we estimate a molecular weight for each sequence of 30,000-60,000 (100-200 nucleotides) and a molecular weight for viral 70S RNA of 3-12 million, each viral genome could contain 1-8 poly(A) sequences. Possible functions of poly(A) in the infecting viral RNA may include a role in the initiation of viral DNA or RNA synthesis, in protein maturation, or in the assembly of the viral genome.

  14. Structure prediction and analysis of neuraminidase sequence variants.

    PubMed

    Thayer, Kelly M

    2016-07-01

    Analyzing protein structure has become an integral aspect of understanding systems of biochemical import. The laboratory experiment endeavors to introduce protein folding to ascertain structures of proteins for which the structure is unavailable, as well as to critically evaluate the quality of the prediction obtained. The model system used is the highly mutable influenza virus protein neuraminidase, which is the key target in the development of therapeutics. In light of recent pandemics, understanding how mutations confer drug resistance, which translates at the molecular level to understanding how different sequence variants differ, constitutes an area of great interest because of the ramifications in public health. This lab targets upper level undergraduate biochemistry students, and aims to introduce tools to be used to explore protein folding and protein visualization in the context of the neuraminidase case study. Students proceed to critically evaluate the folded models by comparison with crystallographic structures. When validity is established, they fold a neuraminidase sequence for which a structure is not available. Through structural alignment and visual inspection of the 150 loop, students gain molecular insight into two possible conformations of the protein, which are actively being studied. Folding the third chosen sequence mimics a true research environment in allowing students to generate a structure from a sequence for which a structure was not previously available, and to assess whether their particular variant has an open or closed loop. From this vantage, they are then challenged to speculate about the connection between loop conformation and drug susceptibility. © 2016 by The International Union of Biochemistry and Molecular Biology, 44(4):361-376, 2016. PMID:26900942

  15. Using next generation transcriptome sequencing to predict an ectomycorrhizal metablome.

    SciTech Connect

    Larsen, P. E.; Sreedasyam, A.; Trivedi, G; Podila, G. K.; Cseke, L. J.; Collart, F. R.

    2011-05-13

    Mycorrhizae, symbiotic interactions between soil fungi and tree roots, are ubiquitous in terrestrial ecosystems. The fungi contribute phosphorous, nitrogen and mobilized nutrients from organic matter in the soil and in return the fungus receives photosynthetically-derived carbohydrates. This union of plant and fungal metabolisms is the mycorrhizal metabolome. Understanding this symbiotic relationship at a molecular level provides important contributions to the understanding of forest ecosystems and global carbon cycling. We generated next generation short-read transcriptomic sequencing data from fully-formed ectomycorrhizae between Laccaria bicolor and aspen (Populus tremuloides) roots. The transcriptomic data was used to identify statistically significantly expressed gene models using a bootstrap-style approach, and these expressed genes were mapped to specific metabolic pathways. Integration of expressed genes that code for metabolic enzymes and the set of expressed membrane transporters generates a predictive model of the ectomycorrhizal metabolome. The generated model of mycorrhizal metabolome predicts that the specific compounds glycine, glutamate, and allantoin are synthesized by L. bicolor and that these compounds or their metabolites may be used for the benefit of aspen in exchange for the photosynthetically-derived sugars fructose and glucose. The analysis illustrates an approach to generate testable biological hypotheses to investigate the complex molecular interactions that drive ectomycorrhizal symbiosis. These models are consistent with experimental environmental data and provide insight into the molecular exchange processes for organisms in this complex ecosystem. The method used here for predicting metabolomic models of mycorrhizal systems from deep RNA sequencing data can be generalized and is broadly applicable to transcriptomic data derived from complex systems.

  16. Sequences Of Amino Acids For Human Serum Albumin

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  17. Prediction of Membrane Transport Proteins and Their Substrate Specificities Using Primary Sequence Information

    PubMed Central

    Mishra, Nitish K.; Chang, Junil; Zhao, Patrick X.

    2014-01-01

    Background Membrane transport proteins (transporters) move hydrophilic substrates across hydrophobic membranes and play vital roles in most cellular functions. Transporters represent a diverse group of proteins that differ in topology, energy coupling mechanism, and substrate specificity as well as sequence similarity. Among the functional annotations of transporters, information about their transporting substrates is especially important. The experimental identification and characterization of transporters is currently costly and time-consuming. The development of robust bioinformatics-based methods for the prediction of membrane transport proteins and their substrate specificities is therefore an important and urgent task. Results Support vector machine (SVM)-based computational models, which comprehensively utilize integrative protein sequence features such as amino acid composition, dipeptide composition, physico-chemical composition, biochemical composition, and position-specific scoring matrices (PSSM), were developed to predict the substrate specificity of seven transporter classes: amino acid, anion, cation, electron, protein/mRNA, sugar, and other transporters. An additional model to differentiate transporters from non-transporters was also developed. Among the developed models, the biochemical composition and PSSM hybrid model outperformed other models and achieved an overall average prediction accuracy of 76.69% with a Mathews correlation coefficient (MCC) of 0.49 and a receiver operating characteristic area under the curve (AUC) of 0.833 on our main dataset. This model also achieved an overall average prediction accuracy of 78.88% and MCC of 0.41 on an independent dataset. Conclusions Our analyses suggest that evolutionary information (i.e., the PSSM) and the AAIndex are key features for the substrate specificity prediction of transport proteins. In comparison, similarity-based methods such as BLAST, PSI-BLAST, and hidden Markov models do not provide

  18. Nucleic acid sequence design via efficient ensemble defect optimization.

    PubMed

    Zadeh, Joseph N; Wolfe, Brian R; Pierce, Niles A

    2011-02-01

    We describe an algorithm for designing the sequence of one or more interacting nucleic acid strands intended to adopt a target secondary structure at equilibrium. Sequence design is formulated as an optimization problem with the goal of reducing the ensemble defect below a user-specified stop condition. For a candidate sequence and a given target secondary structure, the ensemble defect is the average number of incorrectly paired nucleotides at equilibrium evaluated over the ensemble of unpseudoknotted secondary structures. To reduce the computational cost of accepting or rejecting mutations to a random initial sequence, candidate mutations are evaluated on the leaf nodes of a tree-decomposition of the target structure. During leaf optimization, defect-weighted mutation sampling is used to select each candidate mutation position with probability proportional to its contribution to the ensemble defect of the leaf. As subsequences are merged moving up the tree, emergent structural defects resulting from crosstalk between sibling sequences are eliminated via reoptimization within the defective subtree starting from new random subsequences. Using a Θ(N(3) ) dynamic program to evaluate the ensemble defect of a target structure with N nucleotides, this hierarchical approach implies an asymptotic optimality bound on design time: for sufficiently large N, the cost of sequence design is bounded below by 4/3 the cost of a single evaluation of the ensemble defect for the full sequence. Hence, the design algorithm has time complexity Ω(N(3) ). For target structures containing N ∈{100,200,400,800,1600,3200} nucleotides and duplex stems ranging from 1 to 30 base pairs, RNA sequence designs at 37°C typically succeed in satisfying a stop condition with ensemble defect less than N/100. Empirically, the sequence design algorithm exhibits asymptotic optimality and the exponent in the time complexity bound is sharp.

  19. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  20. The amino acid sequence of Escherichia coli cyanase.

    PubMed

    Chin, C C; Anderson, P M; Wold, F

    1983-01-10

    The amino acid sequence of the enzyme cyanase (cyanate hydrolase) from Escherichia coli has been determined by automatic Edman degradation of the intact protein and of its component peptides. The primary peptides used in the sequencing were produced by cyanogen bromide cleavage at the methionine residues, yielding 4 peptides plus free homoserine from the NH2-terminal methionine, and by trypsin cleavage at the 7 arginine residues after acetylation of the lysines. Secondary peptides required for overlaps and COOH-terminal sequences were produced by chymotrypsin or clostripain cleavage of some of the larger peptides. The complete sequence of the cyanase subunit consists of 156 amino acid residues (Mr 16,350). Based on the observation that the cysteine-containing peptide is obtained as a disulfide-linked dimer, it is proposed that the covalent structure of cyanase is made up of two subunits linked by a disulfide bond between the single cystine residue in each subunit. The native enzyme (Mr 150,000) then appears to be a complex of four or five such subunit dimers.

  1. Predictions of diagenetic reactions in the presence of organic acids

    NASA Astrophysics Data System (ADS)

    Harrison, Wendy J.; Thyne, Geoffrey D.

    1992-02-01

    Stability constants have been estimated for cation complexes with anions of monofunctional and difunctional acids (combinations of Ca, Mg, Fe, Al, Sr, Mn, U, Th, Pb, Cu, Zn with formate, acetate, propionate, oxalate, malonate, succinate, and salicylate) between 0 and 200°C. Difunctional acid anions form much more stable complexes than monofunctional acid anions with aluminum; the importance of the aluminum-acetate complex is relatively minor in comparison to aluminum oxalate and malonate complexes. Divalent metal cations such as Mg, Ca, and Fe form more stable complexes with acetate than with difunctional acid anions. Aluminum-oxalate can dominate the species distribution of aluminum under acidic pH conditions, whereas the divalent cation-acetate and oxalate complexes rarely account for more than 60% of the total dissolved cation, and then only in more alkaline waters. Mineral thermodynamic affinities were calculated using the reaction path model EQ3/6 for waters having variable organic acid anion (OAA) contents under conditions representative of those found during normal burial diagenesis. The following scenarios are possible: 1) K-feldspar and albite are stable, anorthite dissolves 2) All feldpars are stable 3) Carbonates can be very unstable to slightly unstable, but never increase in stability. Organic acid anions are ineffective at neutral to alkaline pH in modifying stabilities of aluminosilicate minerals whereas the anions are variably effective under a wide range of pH in modifying carbonate mineral stabilities. Reaction path calculations demonstrate that the sequence of mineral reactions occurring in an arkosic sandstone-fluid system is only slightly modified by the presence of OAA. A spectrum of possible sandstone alteration mineralogies can be obtained depending on the selected boundary conditions: EQ3/6 predictions include quartz overgrowth, calcite replacement of plagioclase, albitization of plagioclase, and the formation of porosity-occluding calcite

  2. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  3. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    SciTech Connect

    Nolan, John P.; White, P. Scott

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  4. Molecular cloning and amino acid sequence of human 5-lipoxygenase

    SciTech Connect

    Matsumoto, T.; Funk, C.D.; Radmark, O.; Hoeoeg, J.O.; Joernvall, H.; Samuelsson, B.

    1988-01-01

    5-Lipoxygenase (EC 1.13.11.34), a Ca/sup 2 +/- and ATP-requiring enzyme, catalyzes the first two steps in the biosynthesis of the peptidoleukotrienes and the chemotactic factor leukotriene B/sub 4/. A cDNA clone corresponding to 5-lipoxygenase was isolated from a human lung lambda gt11 expression library by immunoscreening with a polyclonal antibody. Additional clones from a human placenta lambda gt11 cDNA library were obtained by plaque hybridization with the /sup 32/P-labeled lung cDNA clone. Sequence data obtained from several overlapping clones indicate that the composite DNAs contain the complete coding region for the enzyme. From the deduced primary structure, 5-lipoxygenase encodes a 673 amino acid protein with a calculated molecular weight of 77,839. Direct analysis of the native protein and its proteolytic fragments confirmed the deduced composition, the amino-terminal amino acid sequence, and the structure of many internal segments. 5-Lipoxygenase has no apparent sequence homology with leukotriene A/sub 4/ hydrolase or Ca/sup 2 +/-binding proteins. RNA blot analysis indicated substantial amounts of an mRNA species of approx. = 2700 nucleotides in leukocytes, lung, and placenta.

  5. Coevolutionary modeling of protein sequences: Predicting structure, function, and mutational landscapes

    NASA Astrophysics Data System (ADS)

    Weigt, Martin

    Over the last years, biological research has been revolutionized by experimental high-throughput techniques, in particular by next-generation sequencing technology. Unprecedented amounts of data are accumulating, and there is a growing request for computational methods unveiling the information hidden in raw data, thereby increasing our understanding of complex biological systems. Statistical-physics models based on the maximum-entropy principle have, in the last few years, played an important role in this context. To give a specific example, proteins and many non-coding RNA show a remarkable degree of structural and functional conservation in the course of evolution, despite a large variability in amino acid sequences. We have developed a statistical-mechanics inspired inference approach - called Direct-Coupling Analysis - to link this sequence variability (easy to observe in sequence alignments, which are available in public sequence databases) to bio-molecular structure and function. In my presentation I will show, how this methodology can be used (i) to infer contacts between residues and thus to guide tertiary and quaternary protein structure prediction and RNA structure prediction, (ii) to discriminate interacting from non-interacting protein families, and thus to infer conserved protein-protein interaction networks, and (iii) to reconstruct mutational landscapes and thus to predict the phenotypic effect of mutations. References [1] M. Figliuzzi, H. Jacquier, A. Schug, O. Tenaillon and M. Weigt ''Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1'', Mol. Biol. Evol. (2015), doi: 10.1093/molbev/msv211 [2] E. De Leonardis, B. Lutz, S. Ratz, S. Cocco, R. Monasson, A. Schug, M. Weigt ''Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction'', Nucleic Acids Research (2015), doi: 10.1093/nar/gkv932 [3] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. Marks, C

  6. Propensity Scores for Prediction and Characterization of Bioluminescent Proteins from Sequences

    PubMed Central

    Huang, Hui-Ling

    2014-01-01

    Bioluminescent proteins (BLPs) are a class of proteins with various mechanisms of light emission such as bioluminescence and fluorescence from luminous organisms. While valuable for commercial and medical applications, identification of BLPs, including luciferases and fluorescent proteins (FPs), is rather challenging, owing to their high variety of protein sequences. Moreover, characterization of BLPs facilitates mutagenesis analysis to enhance bioluminescence and fluorescence. Therefore, this study proposes a novel methodological approach to estimating the propensity scores of 400 dipeptides and 20 amino acids in order to design two prediction methods and characterize BLPs based on a scoring card method (SCM). The SCMBLP method for predicting BLPs achieves an accuracy of 90.83% for 10-fold cross-validation higher than existing support vector machine based methods and a test accuracy of 82.85%. A dataset consisting of 269 luciferases and 216 FPs is also established to design the SCMLFP prediction method, which achieves training and test accuracies of 97.10% and 96.28%, respectively. Additionally, four informative physicochemical properties of 20 amino acids are identified using the estimated propensity scores to characterize BLPs as follows: 1) high transfer free energy from inside to the protein surface, 2) high occurrence frequency of residues in the transmembrane regions of the protein, 3) large hydrophobicity scale from the native protein structure, and 4) high correlation coefficient (R = 0.921) between the amino acid compositions of BLPs and integral membrane proteins. Further analyzing BLPs reveals that luciferases have a larger value of R (0.937) than FPs (0.635), suggesting that luciferases tend to locate near the cell membrane location rather than FPs for convenient receipt of extracellular ions. Importantly, the propensity scores of dipeptides and amino acids and the identified properties facilitate efforts to predict, characterize, and apply BLPs

  7. Propensity scores for prediction and characterization of bioluminescent proteins from sequences.

    PubMed

    Huang, Hui-Ling

    2014-01-01

    Bioluminescent proteins (BLPs) are a class of proteins with various mechanisms of light emission such as bioluminescence and fluorescence from luminous organisms. While valuable for commercial and medical applications, identification of BLPs, including luciferases and fluorescent proteins (FPs), is rather challenging, owing to their high variety of protein sequences. Moreover, characterization of BLPs facilitates mutagenesis analysis to enhance bioluminescence and fluorescence. Therefore, this study proposes a novel methodological approach to estimating the propensity scores of 400 dipeptides and 20 amino acids in order to design two prediction methods and characterize BLPs based on a scoring card method (SCM). The SCMBLP method for predicting BLPs achieves an accuracy of 90.83% for 10-fold cross-validation higher than existing support vector machine based methods and a test accuracy of 82.85%. A dataset consisting of 269 luciferases and 216 FPs is also established to design the SCMLFP prediction method, which achieves training and test accuracies of 97.10% and 96.28%, respectively. Additionally, four informative physicochemical properties of 20 amino acids are identified using the estimated propensity scores to characterize BLPs as follows: 1) high transfer free energy from inside to the protein surface, 2) high occurrence frequency of residues in the transmembrane regions of the protein, 3) large hydrophobicity scale from the native protein structure, and 4) high correlation coefficient (R = 0.921) between the amino acid compositions of BLPs and integral membrane proteins. Further analyzing BLPs reveals that luciferases have a larger value of R (0.937) than FPs (0.635), suggesting that luciferases tend to locate near the cell membrane location rather than FPs for convenient receipt of extracellular ions. Importantly, the propensity scores of dipeptides and amino acids and the identified properties facilitate efforts to predict, characterize, and apply BLPs

  8. Acid mine drainage prediction and remediation

    SciTech Connect

    Robb, G.; Robinson, J.

    1996-12-31

    The use of constructed wetlands for treatment of acid mine drainage is discussed in the article. Drainage characteristics and mine water flow rate are identified as important predictors of remediation success. Aerobic and anaerobic chemical reaction processes are described. Problems and potential uses of wetlands are briefly described.

  9. Predictive sequence analysis of the Candidatus Liberibacter asiaticus proteome.

    PubMed

    Cong, Qian; Kinch, Lisa N; Kim, Bong-Hyun; Grishin, Nick V

    2012-01-01

    Candidatus Liberibacter asiaticus (Ca. L. asiaticus) is a parasitic gram-negative bacterium that is closely associated with Huanglongbing (HLB), a worldwide citrus disease. Given the difficulty in culturing the bacterium and thus in its experimental characterization, computational analyses of the whole Ca. L. asiaticus proteome can provide much needed insights into the mechanisms of the disease and guide the development of treatment strategies. In this study, we applied state-of-the-art sequence analysis tools to every Ca. L. asiaticus protein. Our results are available as a public website at http://prodata.swmed.edu/liberibacter_asiaticus/. In particular, we manually curated the results to predict the subcellular localization, spatial structure and function of all Ca. L. asiaticus proteins (http://prodata.swmed.edu/liberibacter_asiaticus/curated/). This extensive information should facilitate the study of Ca. L. asiaticus proteome function and its relationship to disease. Pilot studies based on the information from our website have revealed several potential virulence factors, discussed herein. PMID:22815919

  10. Predictive Sequence Analysis of the Candidatus Liberibacter asiaticus Proteome

    PubMed Central

    Cong, Qian; Kinch, Lisa N.; Kim, Bong-Hyun; Grishin, Nick V.

    2012-01-01

    Candidatus Liberibacter asiaticus (Ca. L. asiaticus) is a parasitic Gram-negative bacterium that is closely associated with Huanglongbing (HLB), a worldwide citrus disease. Given the difficulty in culturing the bacterium and thus in its experimental characterization, computational analyses of the whole Ca. L. asiaticus proteome can provide much needed insights into the mechanisms of the disease and guide the development of treatment strategies. In this study, we applied state-of-the-art sequence analysis tools to every Ca. L. asiaticus protein. Our results are available as a public website at http://prodata.swmed.edu/liberibacter_asiaticus/. In particular, we manually curated the results to predict the subcellular localization, spatial structure and function of all Ca. L. asiaticus proteins (http://prodata.swmed.edu/liberibacter_asiaticus/curated/). This extensive information should facilitate the study of Ca. L. asiaticus proteome function and its relationship to disease. Pilot studies based on the information from our website have revealed several potential virulence factors, discussed herein. PMID:22815919

  11. New approaches for computer analysis of nucleic acid sequences.

    PubMed

    Karlin, S; Ghandour, G; Ost, F; Tavare, S; Korn, L J

    1983-09-01

    A new high-speed computer algorithm is outlined that ascertains within and between nucleic acid and protein sequences all direct repeats, dyad symmetries, and other structural relationships. Large repeats, repeats of high frequency, dyad symmetries of specified stem length and loop distance, and their distributions are determined. Significance of homologies is assessed by a hierarchy of permutation procedures. Applications are made to papovaviruses, the human papillomavirus HPV, lambda phage, the human and mouse mitochondrial genomes, and the human and mouse immunoglobulin kappa-chain genes. PMID:6577449

  12. Yeast prions and human prion-like proteins: sequence features and prediction methods.

    PubMed

    Cascarina, Sean M; Ross, Eric D

    2014-06-01

    Prions are self-propagating infectious protein isoforms. A growing number of prions have been identified in yeast, each resulting from the conversion of soluble proteins into an insoluble amyloid form. These yeast prions have served as a powerful model system for studying the causes and consequences of prion aggregation. Remarkably, a number of human proteins containing prion-like domains, defined as domains with compositional similarity to yeast prion domains, have recently been linked to various human degenerative diseases, including amyotrophic lateral sclerosis. This suggests that the lessons learned from yeast prions may help in understanding these human diseases. In this review, we examine what has been learned about the amino acid sequence basis for prion aggregation in yeast, and how this information has been used to develop methods to predict aggregation propensity. We then discuss how this information is being applied to understand human disease, and the challenges involved in applying yeast prediction methods to higher organisms.

  13. Improved peptide elution time prediction for reversed-phase liquid chromatography-MS by incorporating peptide sequence information

    SciTech Connect

    Petritis, Konstantinos; Kangas, Lars J.; Yan, Bo; Monroe, Matthew E.; Strittmatter, Eric F.; Qian, Weijun; Adkins, Joshua N.; Moore, Ronald J.; Xu, Ying; Lipton, Mary S.; Camp, David G.; Smith, Richard D.

    2006-07-15

    We describe an improved artificial neural network (ANN)-based method for predicting peptide retention times in reversed phase liquid chromatography. In addition to the peptide amino acid composition, this study investigated several other peptide descriptors to improve the predictive capability, such as peptide length, sequence, hydrophobicity and hydrophobic moment, and nearest neighbor amino acid, as well as peptide predicted structural configurations (i.e., helix, sheet, coil). An ANN architecture that consisted of 1052 input nodes, 24 hidden nodes, and 1 output node was used to fully consider the amino acid residue sequence in each peptide. The network was trained using {approx}345,000 non-redundant peptides identified from a total of 12,059 LC-MS/MS analyses of more than 20 different organisms, and the predictive capability of the model was tested using 1303 confidently identified peptides that were not included in the training set. The model demonstrated an average elution time precision of {approx}1.5% and was able to distinguish among isomeric peptides based upon the inclusion of peptide sequence information. The prediction power represents a significant improvement over our earlier report (Petritis et al., Anal. Chem. 2003, 75, 1039-1048) and other previously reported models.

  14. Deduced amino acid sequence of human pulmonary surfactant proteolipid: SPL(pVal)

    SciTech Connect

    Whitsett, J.A.; Glasser, S.W.; Korfhagen, T.R.; Weaver, T.E.; Clark, J.; Pilot-Matias, T.; Meuth, J.; Fox, J.L.

    1987-05-01

    Hydrophobic, proteolipid-like protein of Mr 6500 was isolated from ether/ethanol extracts of human, canine and bovine pulmonary surfactant. Amino acid composition of the protein demonstrated a remarkable abundance of hydrophobic residues, particularly valine and leucine. The N-terminal amino acid sequence of the human protein was determined: N-Leu-Ile-Pro-Cys-Cys-Pro-Val-Asn-Leu-Lys-Arg-Leu-Leu-Ile-Val4... An oligonucleotide probe was used to screen an adult human lung cDNA library and resulted in detection of cDNA clones with predicted amino acid sequence with close identity to the N-terminal amino acid sequence of the human peptide. SPL(pVal) was found within the reading frame of a larger peptide. SPL(pVal) results from proteolytic processing of a larger preprotein. Northern blot analysis detected in a single 1.0 kilobase SPL(pVal) RNA which was less abundant in fetal than in adult lung. Mixtures of purified canine and bovine SPL(pVal) and synthetic phospholipids display properties of rapid adsorption and surface tension lowering activity characteristic of surfactant. Human SPL(pVal) is a pulmonary surfactant proteolipid which may therefore be useful in combination with phospholipids and/or other surfactant proteins for the treatment of surfactant deficiency such as hyaline membrane disease in newborn infants.

  15. Predictive genomics: a cancer hallmark network framework for predicting tumor clinical phenotypes using genome sequencing data.

    PubMed

    Wang, Edwin; Zaman, Naif; Mcgee, Shauna; Milanese, Jean-Sébastien; Masoudi-Nejad, Ali; O'Connor-McCourt, Maureen

    2015-02-01

    Tumor genome sequencing leads to documenting thousands of DNA mutations and other genomic alterations. At present, these data cannot be analyzed adequately to aid in the understanding of tumorigenesis and its evolution. Moreover, we have little insight into how to use these data to predict clinical phenotypes and tumor progression to better design patient treatment. To meet these challenges, we discuss a cancer hallmark network framework for modeling genome sequencing data to predict cancer clonal evolution and associated clinical phenotypes. The framework includes: (1) cancer hallmarks that can be represented by a few molecular/signaling networks. 'Network operational signatures' which represent gene regulatory logics/strengths enable to quantify state transitions and measures of hallmark traits. Thus, sets of genomic alterations which are associated with network operational signatures could be linked to the state/measure of hallmark traits. The network operational signature transforms genotypic data (i.e., genomic alterations) to regulatory phenotypic profiles (i.e., regulatory logics/strengths), to cellular phenotypic profiles (i.e., hallmark traits) which lead to clinical phenotypic profiles (i.e., a collection of hallmark traits). Furthermore, the framework considers regulatory logics of the hallmark networks under tumor evolutionary dynamics and therefore also includes: (2) a self-promoting positive feedback loop that is dominated by a genomic instability network and a cell survival/proliferation network is the main driver of tumor clonal evolution. Surrounding tumor stroma and its host immune systems shape the evolutionary paths; (3) cell motility initiating metastasis is a byproduct of the above self-promoting loop activity during tumorigenesis; (4) an emerging hallmark network which triggers genome duplication dominates a feed-forward loop which in turn could act as a rate-limiting step for tumor formation; (5) mutations and other genomic alterations have

  16. Improving protein structure prediction using multiple sequence-based contact predictions

    PubMed Central

    Wu, Sitao; Szilagyi, Andras; Zhang, Yang

    2011-01-01

    Summary Although residue-residue contact maps dictate the topology of proteins, sequence-based ab initio contact predictions have been found little use in actual structure prediction due to the low accuracy. We developed a composite set of nine SVM-based contact predictors which are used in I-TASSER simulation in combination with sparse template contact restraints. When testing the strategy on 273 non-homologous targets, remarkable improvements of I-TASSER models were observed for both easy and hard targets, with P-value by student s t-test below 0.00001 and 0.001, respectively. In several cases, TM-score increases by >30%, which essentially converts “non-foldable” targets into “foldable” ones. In CASP9, I-TASSER employed ab initio contact predictions, and generated models for 26 FM targets with a GDT-score 16% and 44% higher than the second and third best servers from other groups, respectively. These findings demonstrate a new avenue to improve the accuracy of protein structure prediction especially for free-modeling targets. PMID:21827953

  17. Secondary Structure Prediction of Single Sequences Using RNAstructure.

    PubMed

    Xu, Zhenjiang Zech; Mathews, David H

    2016-01-01

    RNA secondary structure is often predicted using folding thermodynamics. RNAstructure is a software package that includes structure prediction by free energy minimization, prediction of base pairing probabilities, prediction of structures composed of highly probably base pairs, and prediction of structures with pseudoknots. A user-friendly graphical user interface is provided, and this interface works on Windows, Apple OS X, and Linux. This chapter provides protocols for using RNAstructure for structure prediction. PMID:27665590

  18. Predicting experimental properties of proteins from sequence by machine learning techniques.

    PubMed

    Smialowski, Pawel; Martin-Galiano, Antonio J; Cox, Jürgen; Frishman, Dmitrij

    2007-04-01

    Efficient target selection methods are an important prerequisite for increasing the success rate and reducing the cost of high-throughput structural genomics efforts. There is a high demand for sequence-based methods capable of predicting experimentally tractable proteins and filtering out potentially difficult targets at different stages of the structural genomic pipeline. Simple empirical rules based on anecdotal evidence are being increasingly superseded by rigorous machine-learning algorithms. Although the simplicity of less advanced methods makes them more human understandable, more sophisticated formalized algorithms possess superior classification power. The quickly growing corpus of experimental success and failure data gathered by structural genomics consortia creates a unique opportunity for retrospective data mining using machine learning techniques and results in increased quality of classifiers. For example, the current solubility prediction methods are reaching the accuracy of over 70%. Furthermore, automated feature selection leads to better insight into the nature of the correlation between amino acid sequence and experimental outcome. In this review we summarize methods for predicting experimental success in cloning, expression, soluble expression, purification and crystallization of proteins with a special focus on publicly available resources. We also describe experimental data repositories and machine learning techniques used for classification and feature selection. PMID:17430194

  19. Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information

    PubMed Central

    Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

    2016-01-01

    3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids. PMID:27467780

  20. [Prediction of lipases types by different scale pseudo-amino acid composition].

    PubMed

    Zhang, Guangya; Li, Hongchun; Gao, Jiaqiang; Fang, Baishan

    2008-11-01

    Lipases are widely used enzymes in biotechnology. Although they catalyze the same reaction, their sequences vary. Therefore, it is highly desired to develop a fast and reliable method to identify the types of lipases according to their sequences, or even just to confirm whether they are lipases or not. By proposing two scales based pseudo amino acid composition approaches to extract the features of the sequences, a powerful predictor based on k-nearest neighbor was introduced to address the problems. The overall success rates thus obtained by the 10-fold cross-validation test were shown as below: for predicting lipases and nonlipase, the success rates were 92.8%, 91.4% and 91.3%, respectively. For lipase types, the success rates were 92.3%, 90.3% and 89.7%, respectively. Among them, the Z scales based pseudo amino acid composition was the best, T scales was the second. They outperformed significantly than 6 other frequently used sequence feature extraction methods. The high success rates yielded for such a stringent dataset indicate predicting the types of lipases is feasible and the different scales pseudo amino acid composition might be a useful tool for extracting the features of protein sequences, or at lease can play a complementary role to many of the other existing approaches. PMID:19256347

  1. Stereochemical Sequence Ion Selectivity: Proline versus Pipecolic-acid-containing Protonated Peptides

    NASA Astrophysics Data System (ADS)

    Abutokaikah, Maha T.; Guan, Shanshan; Bythell, Benjamin J.

    2016-10-01

    Substitution of proline by pipecolic acid, the six-membered ring congener of proline, results in vastly different tandem mass spectra. The well-known proline effect is eliminated and amide bond cleavage C-terminal to pipecolic acid dominates instead. Why do these two ostensibly similar residues produce dramatically differing spectra? Recent evidence indicates that the proton affinities of these residues are similar, so are unlikely to explain the result [Raulfs et al., J. Am. Soc. Mass Spectrom. 25, 1705-1715 (2014)]. An additional hypothesis based on increased flexibility was also advocated. Here, we provide a computational investigation of the "pipecolic acid effect," to test this and other hypotheses to determine if theory can shed additional light on this fascinating result. Our calculations provide evidence for both the increased flexibility of pipecolic-acid-containing peptides, and structural changes in the transition structures necessary to produce the sequence ions. The most striking computational finding is inversion of the stereochemistry of the transition structures leading to "proline effect"-type amide bond fragmentation between the proline/pipecolic acid-congeners: R (proline) to S (pipecolic acid). Additionally, our calculations predict substantial stabilization of the amide bond cleavage barriers for the pipecolic acid congeners by reduction in deleterious steric interactions and provide evidence for the importance of experimental energy regime in rationalizing the spectra.

  2. Protein meta-functional signatures from combining sequence, structure, evolution, and amino acid property information.

    PubMed

    Wang, Kai; Horst, Jeremy A; Cheng, Gong; Nickle, David C; Samudrala, Ram

    2008-09-26

    Protein function is mediated by different amino acid residues, both their positions and types, in a protein sequence. Some amino acids are responsible for the stability or overall shape of the protein, playing an indirect role in protein function. Others play a functionally important role as part of active or binding sites of the protein. For a given protein sequence, the residues and their degree of functional importance can be thought of as a signature representing the function of the protein. We have developed a combination of knowledge- and biophysics-based function prediction approaches to elucidate the relationships between the structural and the functional roles of individual residues and positions. Such a meta-functional signature (MFS), which is a collection of continuous values representing the functional significance of each residue in a protein, may be used to study proteins of known function in greater detail and to aid in experimental characterization of proteins of unknown function. We demonstrate the superior performance of MFS in predicting protein functional sites and also present four real-world examples to apply MFS in a wide range of settings to elucidate protein sequence-structure-function relationships. Our results indicate that the MFS approach, which can combine multiple sources of information and also give biological interpretation to each component, greatly facilitates the understanding and characterization of protein function.

  3. Predicted Molecular Effects of Sequence Variants Link to System Level of Disease.

    PubMed

    Reeb, Jonas; Hecht, Maximilian; Mahlich, Yannick; Bromberg, Yana; Rost, Burkhard

    2016-08-01

    Developments in experimental and computational biology are advancing our understanding of how protein sequence variation impacts molecular protein function. However, the leap from the micro level of molecular function to the macro level of the whole organism, e.g. disease, remains barred. Here, we present new results emphasizing earlier work that suggested some links from molecular function to disease. We focused on non-synonymous single nucleotide variants, also referred to as single amino acid variants (SAVs). Building upon OMIA (Online Mendelian Inheritance in Animals), we introduced a curated set of 117 disease-causing SAVs in animals. Methods optimized to capture effects upon molecular function often correctly predict human (OMIM) and animal (OMIA) Mendelian disease-causing variants. We also predicted effects of human disease-causing variants in the mouse model, i.e. we put OMIM SAVs into mouse orthologs. Overall, fewer variants were predicted with effect in the model organism than in the original organism. Our results, along with other recent studies, demonstrate that predictions of molecular effects capture some important aspects of disease. Thus, in silico methods focusing on the micro level of molecular function can help to understand the macro system level of disease. PMID:27536940

  4. Predicted Molecular Effects of Sequence Variants Link to System Level of Disease

    PubMed Central

    Bromberg, Yana; Rost, Burkhard

    2016-01-01

    Developments in experimental and computational biology are advancing our understanding of how protein sequence variation impacts molecular protein function. However, the leap from the micro level of molecular function to the macro level of the whole organism, e.g. disease, remains barred. Here, we present new results emphasizing earlier work that suggested some links from molecular function to disease. We focused on non-synonymous single nucleotide variants, also referred to as single amino acid variants (SAVs). Building upon OMIA (Online Mendelian Inheritance in Animals), we introduced a curated set of 117 disease-causing SAVs in animals. Methods optimized to capture effects upon molecular function often correctly predict human (OMIM) and animal (OMIA) Mendelian disease-causing variants. We also predicted effects of human disease-causing variants in the mouse model, i.e. we put OMIM SAVs into mouse orthologs. Overall, fewer variants were predicted with effect in the model organism than in the original organism. Our results, along with other recent studies, demonstrate that predictions of molecular effects capture some important aspects of disease. Thus, in silico methods focusing on the micro level of molecular function can help to understand the macro system level of disease. PMID:27536940

  5. AMS 4.0: consensus prediction of post-translational modifications in protein sequences.

    PubMed

    Plewczynski, Dariusz; Basu, Subhadip; Saha, Indrajit

    2012-08-01

    We present here the 2011 update of the AutoMotif Service (AMS 4.0) that predicts the wide selection of 88 different types of the single amino acid post-translational modifications (PTM) in protein sequences. The selection of experimentally confirmed modifications is acquired from the latest UniProt and Phospho.ELM databases for training. The sequence vicinity of each modified residue is represented using amino acids physico-chemical features encoded using high quality indices (HQI) obtaining by automatic clustering of known indices extracted from AAindex database. For each type of the numerical representation, the method builds the ensemble of Multi-Layer Perceptron (MLP) pattern classifiers, each optimising different objectives during the training (for example the recall, precision or area under the ROC curve (AUC)). The consensus is built using brainstorming technology, which combines multi-objective instances of machine learning algorithm, and the data fusion of different training objects representations, in order to boost the overall prediction accuracy of conserved short sequence motifs. The performance of AMS 4.0 is compared with the accuracy of previous versions, which were constructed using single machine learning methods (artificial neural networks, support vector machine). Our software improves the average AUC score of the earlier version by close to 7 % as calculated on the test datasets of all 88 PTM types. Moreover, for the selected most-difficult sequence motifs types it is able to improve the prediction performance by almost 32 %, when compared with previously used single machine learning methods. Summarising, the brainstorming consensus meta-learning methodology on the average boosts the AUC score up to around 89 %, averaged over all 88 PTM types. Detailed results for single machine learning methods and the consensus methodology are also provided, together with the comparison to previously published methods and state-of-the-art software tools. The

  6. Correlations Between Amino Acids at Different Sites in Local Sequences of Protein Fragments with Given Structural Patterns

    NASA Astrophysics Data System (ADS)

    Lu, Wen; Liu, Hai-yan

    2007-02-01

    Ample evidence suggests that the local structures of peptide fragments in native proteins are to some extent encoded by their local sequences. Detecting such local correlations is important but it is still an open question what would be the most appropriate method. This is partly because conventional sequence analyses treat amino acid preferences at each site of a protein sequence independently, while it is often the inter-site interactions that bring about local sequence-structure correlations. Here a new scheme is introduced to capture the correlation between amino acid preferences at different sites for different local structure types. A library of nine-residue fragments is constructed, and the fragments are divided into clusters based on their local structures. For each local structure cluster or type, chi-square tests are used to identify correlated preferences of amino acid combinations at pairs of sites. A score function is constructed including both the single site amino acid preferences and the dual-site amino acid combination preferences, which can be used to identify whether a sequence fragment would have a strong tendency to form a particular local structure in native proteins. The results show that, given a local structure pattern, dual-site amino acid combinations contain different information from single site amino acid preferences. Representative examples show that many of the statistically identified correlations agree with previously-proposed heuristic rules about local sequence-structure correlations, or are consistent with physical-chemical interactions required to stabilize particular local structures. Results also show that such dual-site correlations in the score function significantly improves the Z-score matching a sequence fragment to its native local structure relative to non-native local structures, and certain local structure types are highly predictable from the local sequence alone if inter-site correlations are considered.

  7. The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences.

    PubMed

    White, S H

    1994-04-01

    This paper continues an examination of the hypothesis that modern proteins evolved from random heteropeptide sequences. In support of the hypothesis, White and Jacobs (1993, J Mol Evol 36:79-95) have shown that any sequence chosen randomly from a large collection of nonhomologous proteins has a 90% or better chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. The goal of the present study was to investigate the possibility that the random-origin hypothesis could explain the lengths of modern protein sequences without invoking specific mechanisms such as gene duplication or exon splicing. The sets of sequences examined were taken from the 1989 PIR database and consisted of 1,792 "super-family" proteins selected to have little sequence identity, 623 E. coli sequences, and 398 human sequences. The length distributions of the proteins could be described with high significance by either of two closely related probability density functions: The gamma distribution with parameter 2 or the distribution for the sum of two exponential random independent variables. A simple theory for the distributions was developed which assumes that (1) protoprotein sequences had exponentially distributed random independent lengths, (2) the length dependence of protein stability determined which of these protoproteins could fold into compact primitive proteins and thereby attain the potential for biochemical activity, (3) the useful protein sequences were preserved by the primitive genome, and (4) the resulting distribution of sequence lengths is reflected by modern proteins. The theory successfully predicts the two observed distributions which can be distinguished by the functional form of the dependence of protein stability on length. The theory leads to three interesting conclusions. First, it predicts that a tetra-nucleotide was the signal for primitive translation termination. This prediction is

  8. Uric acid excretion predicts increased aggression in urban adolescents.

    PubMed

    Mrug, Sylvie; Mrug, Michal

    2016-09-01

    Elevated levels of uric acid have been linked with impulsive and disinhibited behavior in clinical and community populations of adults, but no studies have examined uric acid in relation to adolescent aggression. This study examined the prospective role of uric acid in aggressive behavior among urban, low income adolescents, and whether this relationship varies by gender. A total of 84 adolescents (M age 13.36years; 50% male; 95% African American) self-reported on their physical aggression at baseline and 1.5years later. At baseline, the youth also completed a 12-h (overnight) urine collection at home which was used to measure uric acid excretion. After adjusting for baseline aggression and age, greater uric acid excretion predicted more frequent aggressive behavior at follow up, with no significant gender differences. The results suggest that lowering uric acid levels may help reduce youth aggression. PMID:27180134

  9. Heterogeneity of amino acid sequence in hippopotamus cytochrome c.

    PubMed

    Thompson, R B; Borden, D; Tarr, G E; Margoliash, E

    1978-12-25

    The amino acid sequences of chymotryptic and tryptic peptides of Hippopotamus amphibius cytochrome c were determined by a recent modification of the manual Edman sequential degradation procedure. They were ordered by comparison with the structure of the hog protein. The hippopotamus protein differs in three positions: serine, alanine, and glutamine replace alanine, glutamic acid, and lysine in positions 43, 92, and 100, respectively. Since the artiodactyl suborders diverged in the mid-Eocene some 50 million years ago, the fact that representatives of some of them show no differences in their cytochromes c (cow, sheep, and hog), while another exhibits as many as three such differences, verifies that even in relatively closely related lines of descent the rate at which cytochrome c changes in the course of evolution is not constant. Furthermore, 10.6% of the hippopotamus cytochrome c preparation was shown to contain isoleucine instead of valine at position 3, indicating that one of the four animals from which the protein was obtained was heterozygous in the cytochrome c gene. Such heterogeneity is a necessary condition of evolutionary variation and has not been previously observed in the cytochrome c of a wild mammalian population.

  10. Thermodynamic prediction of hydrogen production from mixed-acid fermentations.

    PubMed

    Forrest, Andrea K; Wales, Melinda E; Holtzapple, Mark T

    2011-10-01

    The MixAlco™ process biologically converts biomass to carboxylate salts that may be chemically converted to a wide variety of chemicals and fuels. The process utilizes lignocellulosic biomass as feedstock (e.g., municipal solid waste, sewage sludge, and agricultural residues), creating an economic basis for sustainable biofuels. This study provides a thermodynamic analysis of hydrogen yield from mixed-acid fermentations from two feedstocks: paper and bagasse. During batch fermentations, hydrogen production, acid production, and sugar digestion were analyzed to determine the energy selectivity of each system. To predict hydrogen production during continuous operation, this energy selectivity was then applied to countercurrent fermentations of the same systems. The analysis successfully predicted hydrogen production from the paper fermentation to within 11% and the bagasse fermentation to within 21% of the actual production. The analysis was able to faithfully represent hydrogen production and represents a step forward in understanding and predicting hydrogen production from mixed-acid fermentations. PMID:21875794

  11. EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models.

    PubMed

    Folkman, Lukas; Stantic, Bela; Sattar, Abdul; Zhou, Yaoqi

    2016-03-27

    Protein engineering and characterisation of non-synonymous single nucleotide variants (SNVs) require accurate prediction of protein stability changes (ΔΔGu) induced by single amino acid substitutions. Here, we have developed a new prediction method called Evolutionary, Amino acid, and Structural Encodings with Multiple Models (EASE-MM), which comprises five specialised support vector machine (SVM) models and makes the final prediction from a consensus of two models selected based on the predicted secondary structure and accessible surface area of the mutated residue. The new method is applicable to single-domain monomeric proteins and can predict ΔΔGu with a protein sequence and mutation as the only inputs. EASE-MM yielded a Pearson correlation coefficient of 0.53-0.59 in 10-fold cross-validation and independent testing and was able to outperform other sequence-based methods. When compared to structure-based energy functions, EASE-MM achieved a comparable or better performance. The application to a large dataset of human germline non-synonymous SNVs showed that the disease-causing variants tend to be associated with larger magnitudes of ΔΔGu predicted with EASE-MM. The EASE-MM web-server is available at http://sparks-lab.org/server/ease. PMID:26804571

  12. Human liver apolipoprotein B-100 cDNA: complete nucleic acid and derived amino acid sequence.

    PubMed Central

    Law, S W; Grant, S M; Higuchi, K; Hospattankar, A; Lackner, K; Lee, N; Brewer, H B

    1986-01-01

    Human apolipoprotein B-100 (apoB-100), the ligand on low density lipoproteins that interacts with the low density lipoprotein receptor and initiates receptor-mediated endocytosis and low density lipoprotein catabolism, has been cloned, and the complete nucleic acid and derived amino acid sequences have been determined. ApoB-100 cDNAs were isolated from normal human liver cDNA libraries utilizing immunoscreening as well as filter hybridization with radiolabeled apoB-100 oligodeoxynucleotides. The apoB-100 mRNA is 14.1 kilobases long encoding a mature apoB-100 protein of 4536 amino acids with a calculated amino acid molecular weight of 512,723. ApoB-100 contains 20 potential glycosylation sites, and 12 of a total of 25 cysteine residues are located in the amino-terminal region of the apolipoprotein providing a potential globular structure of the amino terminus of the protein. ApoB-100 contains relatively few regions of amphipathic helices, but compared to other human apolipoproteins it is enriched in beta-structure. The delineation of the entire human apoB-100 sequence will now permit a detailed analysis of the conformation of the protein, the low density lipoprotein receptor binding domain(s), and the structural relationship between apoB-100 and apoB-48 and will provide the basis for the study of genetic defects in apoB-100 in patients with dyslipoproteinemias. PMID:3464946

  13. Canine amino acid transport system Xc(-): cDNA sequence, distribution and cystine transport activity in lens epithelial cells.

    PubMed

    Maruo, Takuya; Kanemaki, Nobuyuki; Onda, Ken; Sato, Reiichiro; Ichihara, Nobuteru; Ochiai, Hideharu

    2014-04-01

    The cystine transport activity of a lens epithelial cell line originated from a canine mature cataract was investigated. The distinct cystine transport activity was observed, which was inhibited to 28% by extracellular 1 mM glutamate. The cDNA sequences of canine cysteine/glutamate exchanger (xCT) and 4F2hc were determined. The predicted amino acid sequences were 527 and 533 amino acid polypeptides, respectively. The amino acid sequences of canine xCT and 4F2hc showed high similarities (>80%) to those of humans. The expression of xCT in lens epithelial cell line was confirmed by western blot analysis. RT-PCR analysis revealed high level expression only in the brain, and it was below the detectable level in other tissues.

  14. Molecular cloning, coding nucleotides and the deduced amino acid sequence of P-450BM-1 from Bacillus megaterium.

    PubMed

    He, J S; Ruettinger, R T; Liu, H M; Fulco, A J

    1989-12-22

    The gene encoding barbiturate-inducible cytochrome P-450BM-1 from Bacillus megaterium ATCC 14581 has been cloned and sequenced. An open reading frame in the 1.9 kb of cloned DNA correctly predicted the NH2-terminal sequence of P-450BM-1 previously determined by protein sequencing, and, in toto, predicted a polypeptide of 410 amino acid residues with an Mr of 47,439. The sequence is most, but less than 27%, similar to that of P-450CAM from Pseudomonas putida, so that P-450BM-1 clearly belongs to a new P-450-gene family, distinct especially from that of the P-450 domain of P-450BM-3, a barbiturate-inducible single polypeptide cytochrome P-450:NADPH-P-450 reductase from the same strain of B. megaterium (Ruettinger, R.T., Wen, L.-P. and Fulco, A.J. (1989) J. Biol. Chem. 264, 10987-10995). PMID:2597681

  15. Nucleotide Sequence Analyses and Predicted Coding of Bunyavirus Genome RNA Species

    PubMed Central

    Clerx-van Haaster, Corrie M.; Akashi, Hiroomi; Auperin, David D.; Bishop, David H. L.

    1982-01-01

    We performed 3′ RNA sequence analyses of [32P]pCp-end-labeled La Crosse (LAC) virus, alternate LAC virus isolate L74, and snowshoe hare bunyavirus large (L), medium (M), and small (S) negative-stranded viral RNA species to determine the coding capabilities of these species. These analyses were confirmed by dideoxy primer extension studies in which we used a synthetic oligodeoxynucleotide primer complementary to the conserved 3′-terminal decanucleotide of the three viral RNA species (Clerx-van Haaster and Bishop, Virology 105:564-574, 1980). The deduced sequences predicted translation of two S-RNA gene products that were read in overlapping reading frames. So far, only single contiguous open reading frames have been identified for the viral M- and L-RNA species. For the negative-stranded M-RNA species of all three viruses, the single reading frame developed from the first 3′-proximal UAC triplet. Likewise, for the L-RNA of the alternate LAC isolate, a single open reading frame developed from the first 3′-proximal UAC triplet. The corresponding L-RNA sequences of prototype LAC and snowshoe hare viruses initiated open reading frames; however, for both viral L-RNA species there was a preceding 3′-proximal UAC triplet in another reading frame that was followed shortly afterward by a termination codon. A comparison of the sequence data obtained for snowshoe hare virus, LAC virus, and the alternate LAC virus isolate showed that the identified nucleotide substitutions were sufficient to account for some of the fingerprint differences in the L-, M-, and S-RNA species of the three viruses. Unlike the distribution of the L- and M-RNA substitutions, significantly fewer nucleotide substitutions occurred after the initial UAC triplet of the S-RNA species than before this triplet, implying that the overlapping genes of the S RNA provided a constraint against evolution by point mutation. The comparative sequence analyses predicted amino acid differences among the

  16. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Foley, B.; Korber, B.; Mellors, J.W.; Jeang, K.T.; Wain-Hobson, S.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  17. Verification of a Downscaling Sequence Applied to Medium Range Meteorological Predictions for Global Flood Prediction

    NASA Astrophysics Data System (ADS)

    Voisin, N.; Wood, A. W.; Lettenmaier, D. P.

    2007-12-01

    We describe a prototype system for medium range (up to two week lead) flood prediction in large rivers, which is intended for global implementation - particularly in river basins having limited in situ meteorological observations. The procedure draws from the experimental North American Land Data Assimilation System (NLDAS) and the University of Washington West-wide Seasonal Hydrologic Forecast System for streamflow prediction. Meteorological forecasts based on a numerical weather prediction model serve both as the forcing for hydrologic model initialization and forecasts for lead times up to fifteen days. The hydrologic component of the system is the Variable Infiltration Capacity (VIC) macroscale hydrology model. In the prototype, VIC is spun up for forecast initialization using daily ERA-40 precipitation, wind, and surface air temperature. In hindcast mode, VIC is driven by global NCEP ensemble 15-day re-forecasts (NOAA/ESRL) that are bias corrected with respect to ERA- 40 and spatially disaggregated using two higher spatial resolution satellite products: Global Precipitation Climatology Project (GPCP) 1DD daily precipitation and Tropical Rainfall Measuring System (TRMM) 3B42 precipitation are used to spatially disaggregate NCEP re-forecasts precipitation during the 15-day forecast period. The use of forecast models and satellite remote sensing data in this procedure reduces the need for in situ precipitation and other observations in parts of the world where surface networks are critically deficient, but where a global hydrologic forecast capability arguably would have the greatest value. The prototype system was implemented at one-half degree spatial resolution and tested during the 1979-August 2002 period. For the Mississippi R. Basin (where ample data for model evaluation exist) we evaluate the spatial disaggregation step in which observed precipitation products (NARR) are first aggregated to a coarser resolution (for the sole purpose of the evaluation) and

  18. Next generation sequencing in predicting gene function in podophyllotoxin biosynthesis.

    PubMed

    Marques, Joaquim V; Kim, Kye-Won; Lee, Choonseok; Costa, Michael A; May, Gregory D; Crow, John A; Davin, Laurence B; Lewis, Norman G

    2013-01-01

    Podophyllum species are sources of (-)-podophyllotoxin, an aryltetralin lignan used for semi-synthesis of various powerful and extensively employed cancer-treating drugs. Its biosynthetic pathway, however, remains largely unknown, with the last unequivocally demonstrated intermediate being (-)-matairesinol. Herein, massively parallel sequencing of Podophyllum hexandrum and Podophyllum peltatum transcriptomes and subsequent bioinformatics analyses of the corresponding assemblies were carried out. Validation of the assembly process was first achieved through confirmation of assembled sequences with those of various genes previously established as involved in podophyllotoxin biosynthesis as well as other candidate biosynthetic pathway genes. This contribution describes characterization of two of the latter, namely the cytochrome P450s, CYP719A23 from P. hexandrum and CYP719A24 from P. peltatum. Both enzymes were capable of converting (-)-matairesinol into (-)-pluviatolide by catalyzing methylenedioxy bridge formation and did not act on other possible substrates tested. Interestingly, the enzymes described herein were highly similar to methylenedioxy bridge-forming enzymes from alkaloid biosynthesis, whereas candidates more similar to lignan biosynthetic enzymes were catalytically inactive with the substrates employed. This overall strategy has thus enabled facile further identification of enzymes putatively involved in (-)-podophyllotoxin biosynthesis and underscores the deductive power of next generation sequencing and bioinformatics to probe and deduce medicinal plant biosynthetic pathways.

  19. Boolean Network Model Predicts Cell Cycle Sequence of Fission Yeast

    PubMed Central

    Davidich, Maria I.; Bornholdt, Stefan

    2008-01-01

    A Boolean network model of the cell-cycle regulatory network of fission yeast (Schizosaccharomyces Pombe) is constructed solely on the basis of the known biochemical interaction topology. Simulating the model in the computer faithfully reproduces the known activity sequence of regulatory proteins along the cell cycle of the living cell. Contrary to existing differential equation models, no parameters enter the model except the structure of the regulatory circuitry. The dynamical properties of the model indicate that the biological dynamical sequence is robustly implemented in the regulatory network, with the biological stationary state G1 corresponding to the dominant attractor in state space, and with the biological regulatory sequence being a strongly attractive trajectory. Comparing the fission yeast cell-cycle model to a similar model of the corresponding network in S. cerevisiae, a remarkable difference in circuitry, as well as dynamics is observed. While the latter operates in a strongly damped mode, driven by external excitation, the S. pombe network represents an auto-excited system with external damping. PMID:18301750

  20. Next Generation Sequencing in Predicting Gene Function in Podophyllotoxin Biosynthesis*

    PubMed Central

    Marques, Joaquim V.; Kim, Kye-Won; Lee, Choonseok; Costa, Michael A.; May, Gregory D.; Crow, John A.; Davin, Laurence B.; Lewis, Norman G.

    2013-01-01

    Podophyllum species are sources of (−)-podophyllotoxin, an aryltetralin lignan used for semi-synthesis of various powerful and extensively employed cancer-treating drugs. Its biosynthetic pathway, however, remains largely unknown, with the last unequivocally demonstrated intermediate being (−)-matairesinol. Herein, massively parallel sequencing of Podophyllum hexandrum and Podophyllum peltatum transcriptomes and subsequent bioinformatics analyses of the corresponding assemblies were carried out. Validation of the assembly process was first achieved through confirmation of assembled sequences with those of various genes previously established as involved in podophyllotoxin biosynthesis as well as other candidate biosynthetic pathway genes. This contribution describes characterization of two of the latter, namely the cytochrome P450s, CYP719A23 from P. hexandrum and CYP719A24 from P. peltatum. Both enzymes were capable of converting (−)-matairesinol into (−)-pluviatolide by catalyzing methylenedioxy bridge formation and did not act on other possible substrates tested. Interestingly, the enzymes described herein were highly similar to methylenedioxy bridge-forming enzymes from alkaloid biosynthesis, whereas candidates more similar to lignan biosynthetic enzymes were catalytically inactive with the substrates employed. This overall strategy has thus enabled facile further identification of enzymes putatively involved in (−)-podophyllotoxin biosynthesis and underscores the deductive power of next generation sequencing and bioinformatics to probe and deduce medicinal plant biosynthetic pathways. PMID:23161544

  1. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza.

    PubMed

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  2. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza

    PubMed Central

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  3. SeqAPASS (Sequence Alignment to Predict Across Species Susceptibility) software and documentation

    EPA Science Inventory

    SeqAPASS is a software application facilitates rapid and streamlined, yet transparent, comparisons of the similarity of toxicologically-significant molecular targets across species. The present application facilitates analysis of primary amino acid sequence similarity (including ...

  4. Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence.

    PubMed

    Rao, H B; Zhu, F; Yang, G B; Li, Z R; Chen, Y Z

    2011-07-01

    Sequence-derived structural and physicochemical features have been extensively used for analyzing and predicting structural, functional, expression and interaction profiles of proteins and peptides. PROFEAT has been developed as a web server for computing commonly used features of proteins and peptides from amino acid sequence. To facilitate more extensive studies of protein and peptides, numerous improvements and updates have been made to PROFEAT. We added new functions for computing descriptors of protein-protein and protein-small molecule interactions, segment descriptors for local properties of protein sequences, topological descriptors for peptide sequences and small molecule structures. We also added new feature groups for proteins and peptides (pseudo-amino acid composition, amphiphilic pseudo-amino acid composition, total amino acid properties and atomic-level topological descriptors) as well as for small molecules (atomic-level topological descriptors). Overall, PROFEAT computes 11 feature groups of descriptors for proteins and peptides, and a feature group of more than 400 descriptors for small molecules plus the derived features for protein-protein and protein-small molecule interactions. Our computational algorithms have been extensively tested and used in a number of published works for predicting proteins of specific structural or functional classes, protein-protein interactions, peptides of specific functions and quantitative structure activity relationships of small molecules. PROFEAT is accessible free of charge at http://bidd.cz3.nus.edu.sg/cgi-bin/prof/protein/profnew.cgi.

  5. Predicting RNA secondary structures from sequence and probing data.

    PubMed

    Lorenz, Ronny; Wolfinger, Michael T; Tanzer, Andrea; Hofacker, Ivo L

    2016-07-01

    RNA secondary structures have proven essential for understanding the regulatory functions performed by RNA such as microRNAs, bacterial small RNAs, or riboswitches. This success is in part due to the availability of efficient computational methods for predicting RNA secondary structures. Recent advances focus on dealing with the inherent uncertainty of prediction by considering the ensemble of possible structures rather than the single most stable one. Moreover, the advent of high-throughput structural probing has spurred the development of computational methods that incorporate such experimental data as auxiliary information.

  6. Natural vs. random protein sequences: Discovering combinatorics properties on amino acid words.

    PubMed

    Santoni, Daniele; Felici, Giovanni; Vergni, Davide

    2016-02-21

    Casual mutations and natural selection have driven the evolution of protein amino acid sequences that we observe at present in nature. The question about which is the dominant force of proteins evolution is still lacking of an unambiguous answer. Casual mutations tend to randomize protein sequences while, in order to have the correct functionality, one expects that selection mechanisms impose rigid constraints on amino acid sequences. Moreover, one also has to consider that the space of all possible amino acid sequences is so astonishingly large that it could be reasonable to have a well tuned amino acid sequence indistinguishable from a random one. In order to study the possibility to discriminate between random and natural amino acid sequences, we introduce different measures of association between pairs of amino acids in a sequence, and apply them to a dataset of 1047 natural protein sequences and 10,470 random sequences, carefully generated in order to preserve the relative length and amino acid distribution of the natural proteins. We analyze the multidimensional measures with machine learning techniques and show that, to a reasonable extent, natural protein sequences can be differentiated from random ones.

  7. Natural vs. random protein sequences: Discovering combinatorics properties on amino acid words.

    PubMed

    Santoni, Daniele; Felici, Giovanni; Vergni, Davide

    2016-02-21

    Casual mutations and natural selection have driven the evolution of protein amino acid sequences that we observe at present in nature. The question about which is the dominant force of proteins evolution is still lacking of an unambiguous answer. Casual mutations tend to randomize protein sequences while, in order to have the correct functionality, one expects that selection mechanisms impose rigid constraints on amino acid sequences. Moreover, one also has to consider that the space of all possible amino acid sequences is so astonishingly large that it could be reasonable to have a well tuned amino acid sequence indistinguishable from a random one. In order to study the possibility to discriminate between random and natural amino acid sequences, we introduce different measures of association between pairs of amino acids in a sequence, and apply them to a dataset of 1047 natural protein sequences and 10,470 random sequences, carefully generated in order to preserve the relative length and amino acid distribution of the natural proteins. We analyze the multidimensional measures with machine learning techniques and show that, to a reasonable extent, natural protein sequences can be differentiated from random ones. PMID:26656109

  8. Sequence and structure-based prediction of fructosyltransferase activity for functional subclassification of fungal GH32 enzymes.

    PubMed

    Trollope, Kim M; van Wyk, Niël; Kotjomela, Momo A; Volschenk, Heinrich

    2015-12-01

    Sucrolytic enzymes catalyse sucrose hydrolysis or the synthesis of fructooligosaccharides (FOSs), a prebiotic in human and animal nutrition. FOS synthesis capacity differs between sucrolytic enzymes. Amino-acid-sequence-based classification of FOS synthesizing enzymes would greatly facilitate the in silico identification of novel catalysts, as large amounts of sequence data lie untapped. The development of a bioinformatics tool to rapidly distinguish between high-level FOSs synthesizing predominantly sucrose hydrolysing enzymes from fungal genomic data is presented. Sequence comparison of functionally characterized enzymes displaying low- and high-level FOS synthesis revealed conserved motifs unique to each group. New light is shed on the sequence context of active site residues in three previously identified conserved motifs. We characterized two enzymes predicted to possess low- and high-level FOS synthesis activities based on their conserved motif sequences. FOS data for the enzymes confirmed our successful prediction of their FOS synthesis capacity. Structural comparison of enzymes displaying low- and high-level FOS synthesis identified steric hindrance between nystose and a long loop region present only in low-level FOS synthesizers. This loop is proposed to limit the synthesis of FOS species with higher degrees of polymerization, a phenomenon observed among enzymes displaying low-level FOS synthesis. Conserved sequence motifs surrounding catalytic residues and a distant structural determinant were identifiers of FOS synthesis capacity and allow for functional annotation of sucrolytic enzymes directly from amino acid sequence. The tool presented may also be useful to study the structure-function relationships of β-fructofuranosidases by identifying mutations present in a group of closely related enzymes displaying similar function.

  9. Sequence and structure-based prediction of fructosyltransferase activity for functional subclassification of fungal GH32 enzymes.

    PubMed

    Trollope, Kim M; van Wyk, Niël; Kotjomela, Momo A; Volschenk, Heinrich

    2015-12-01

    Sucrolytic enzymes catalyse sucrose hydrolysis or the synthesis of fructooligosaccharides (FOSs), a prebiotic in human and animal nutrition. FOS synthesis capacity differs between sucrolytic enzymes. Amino-acid-sequence-based classification of FOS synthesizing enzymes would greatly facilitate the in silico identification of novel catalysts, as large amounts of sequence data lie untapped. The development of a bioinformatics tool to rapidly distinguish between high-level FOSs synthesizing predominantly sucrose hydrolysing enzymes from fungal genomic data is presented. Sequence comparison of functionally characterized enzymes displaying low- and high-level FOS synthesis revealed conserved motifs unique to each group. New light is shed on the sequence context of active site residues in three previously identified conserved motifs. We characterized two enzymes predicted to possess low- and high-level FOS synthesis activities based on their conserved motif sequences. FOS data for the enzymes confirmed our successful prediction of their FOS synthesis capacity. Structural comparison of enzymes displaying low- and high-level FOS synthesis identified steric hindrance between nystose and a long loop region present only in low-level FOS synthesizers. This loop is proposed to limit the synthesis of FOS species with higher degrees of polymerization, a phenomenon observed among enzymes displaying low-level FOS synthesis. Conserved sequence motifs surrounding catalytic residues and a distant structural determinant were identifiers of FOS synthesis capacity and allow for functional annotation of sucrolytic enzymes directly from amino acid sequence. The tool presented may also be useful to study the structure-function relationships of β-fructofuranosidases by identifying mutations present in a group of closely related enzymes displaying similar function. PMID:26426731

  10. Characterization of Newcastle disease virus isolates by reverse transcription PCR coupled to direct nucleotide sequencing and development of sequence database for pathotype prediction and molecular epidemiological analysis.

    PubMed Central

    Seal, B S; King, D J; Bennett, J D

    1995-01-01

    Degenerate oligonucleotide primers were synthesized to amplify nucleotide sequences from portions of the fusion protein and matrix protein genes of Newcastle disease virus (NDV) genomic RNA that could be used diagnostically. These primers were used in a single-tube reverse transcription PCR of NDV genomic RNA coupled to direct nucleotide sequencing of the amplified product to characterize more than 30 NDV isolates. In agreement with previous reports, differences in the fusion protein cleavage sequence that correlated genotypically with virulence among various NDV pathotypes were detected. By using sequences generated from the matrix protein gene coding for the nuclear localization signal, lentogenic viruses were again grouped phylogenetically separate from other pathotypes. These techniques were applied to compare neurotropic velogenic viruses isolated from an outbreak of Newcastle disease in cormorants and turkeys. Cormorant NDV isolates and an NDV isolate from an infected turkey flock in North Dakota had the fusion protein cleavage sequence 109SRGRRQKRFVG119. The R-for-G substitution at position 110 may be unique for the cormorant-type isolates. Although the amino acid sequences from the fusion protein cleavage site were identical, nucleotide sequence data correlate the outbreak in turkeys to a cormorant virus isolate from Minnesota and not to a cormorant virus isolate from Michigan. On the basis of sequence information, the cormorant isolates are virulent viruses related to isolates of psittacine origin, possibly genotypically distinct from other velogenic NDV isolates. These techniques can be used reliably for Newcastle disease epidemiology and for prediction of pathotypes of NDV isolates without traditional live-bird inoculations. PMID:8567895

  11. Structure Prediction and Analysis of Neuraminidase Sequence Variants

    ERIC Educational Resources Information Center

    Thayer, Kelly M.

    2016-01-01

    Analyzing protein structure has become an integral aspect of understanding systems of biochemical import. The laboratory experiment endeavors to introduce protein folding to ascertain structures of proteins for which the structure is unavailable, as well as to critically evaluate the quality of the prediction obtained. The model system used is the…

  12. Robust prediction of B-factor profile from sequence using two-stage SVR based on random forest feature selection.

    PubMed

    Pan, Xiao-Yong; Shen, Hong-Bin

    2009-01-01

    B-factor is highly correlated with protein internal motion, which is used to measure the uncertainty in the position of an atom within a crystal structure. Although the rapid progress of structural biology in recent years makes more accurate protein structures available than ever, with the avalanche of new protein sequences emerging during the post-genomic Era, the gap between the known protein sequences and the known protein structures becomes wider and wider. It is urgent to develop automated methods to predict B-factor profile from the amino acid sequences directly, so as to be able to timely utilize them for basic research. In this article, we propose a novel approach, called PredBF, to predict the real value of B-factor. We firstly extract both global and local features from the protein sequences as well as their evolution information, then the random forests feature selection is applied to rank their importance and the most important features are inputted to a two-stage support vector regression (SVR) for prediction, where the initial predicted outputs from the 1(st) SVR are further inputted to the 2nd layer SVR for final refinement. Our results have revealed that a systematic analysis of the importance of different features makes us have deep insights into the different contributions of features and is very necessary for developing effective B-factor prediction tools. The two-layer SVR prediction model designed in this study further enhanced the robustness of predicting the B-factor profile. As a web server, PredBF is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/PredBF for academic use.

  13. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3

    PubMed Central

    Xiao, Jingfa; Hao, Lirui; Crowley, David E.; Zhang, Zhewen; Yu, Jun; Huang, Ning; Huo, Mingxin; Wu, Jiayan

    2015-01-01

    Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals. PMID:26301592

  14. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3.

    PubMed

    Wang, Xiaoyu; Chen, Meili; Xiao, Jingfa; Hao, Lirui; Crowley, David E; Zhang, Zhewen; Yu, Jun; Huang, Ning; Huo, Mingxin; Wu, Jiayan

    2015-01-01

    Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals. PMID:26301592

  15. DNA Sequencing and Predictions of the Cosmic Theory of Life

    NASA Astrophysics Data System (ADS)

    Wickramasinghe, N. Chandra

    The theory of cometary panspermia, developed by the late Sir Fred Hoyle and the present author argues that life originated cosmically as a unique event in one of a great multitude of comets or planetary bodies in the Universe. Life on Earth did not originate here but was introduced by impacting comets, and its further evolution was driven by the subsequent acquisition of cosmically derived genes. Explicit predictions of this theory published in 1979-1981, stating how the acquisition of new genes drives evolution, are compared with recent developments in relation to horizontal gene transfer, and the role of retroviruses in evolution. Precisely-stated predictions of the theory of cometary panspermia are shown to have been verified.

  16. DNA sequencing and predictions of the cosmic theory of life

    NASA Astrophysics Data System (ADS)

    Wickramasinghe, N. Chandra

    2013-01-01

    The theory of cometary panspermia, developed by the late Sir Fred Hoyle and the present author argues that life originated cosmically as a unique event in one of a great multitude of comets or planetary bodies in the Universe. Life on Earth did not originate here but was introduced by impacting comets, and its further evolution was driven by the subsequent acquisition of cosmically derived genes. Explicit predictions of this theory published in 1979-1981, stating how the acquisition of new genes drives evolution, are compared with recent developments in relation to horizontal gene transfer, and the role of retroviruses in evolution. Precisely-stated predictions of the theory of cometary panspermia are shown to have been verified.

  17. Paroxysmal atrial fibrillation prediction method with shorter HRV sequences.

    PubMed

    Boon, K H; Khalil-Hani, M; Malarvili, M B; Sia, C W

    2016-10-01

    This paper proposes a method that predicts the onset of paroxysmal atrial fibrillation (PAF), using heart rate variability (HRV) segments that are shorter than those applied in existing methods, while maintaining good prediction accuracy. PAF is a common cardiac arrhythmia that increases the health risk of a patient, and the development of an accurate predictor of the onset of PAF is clinical important because it increases the possibility to stabilize (electrically) and prevent the onset of atrial arrhythmias with different pacing techniques. We investigate the effect of HRV features extracted from different lengths of HRV segments prior to PAF onset with the proposed PAF prediction method. The pre-processing stage of the predictor includes QRS detection, HRV quantification and ectopic beat correction. Time-domain, frequency-domain, non-linear and bispectrum features are then extracted from the quantified HRV. In the feature selection, the HRV feature set and classifier parameters are optimized simultaneously using an optimization procedure based on genetic algorithm (GA). Both full feature set and statistically significant feature subset are optimized by GA respectively. For the statistically significant feature subset, Mann-Whitney U test is used to filter non-statistical significance features that cannot pass the statistical test at 20% significant level. The final stage of our predictor is the classifier that is based on support vector machine (SVM). A 10-fold cross-validation is applied in performance evaluation, and the proposed method achieves 79.3% prediction accuracy using 15-minutes HRV segment. This accuracy is comparable to that achieved by existing methods that use 30-minutes HRV segments, most of which achieves accuracy of around 80%. More importantly, our method significantly outperforms those that applied segments shorter than 30 minutes. PMID:27480743

  18. Buffalo (Bubalus bubalis) interleukin-2: sequence analysis reveals high nucleotide and amino acid identity with interleukin-2 of cattle and other ruminants.

    PubMed

    Sreekumar, E; Premraj, A; Saravanakumar, M; Rasool, T J

    2002-08-01

    A 4400-bp genomic sequence and a 332-bp truncated cDNA sequence of the interleukin-2 (IL-2) gene of Indian water buffalo (Bubalus bubalis) were amplified by polymerase chain reaction and cloned. The coding sequence of the buffalo IL-2 gene was assembled from the 5' end of the genomic clone and the truncated cDNA clone. This sequence had 98.5% nucleotide identity and 98% amino acid identity with cattle IL-2. Three amino acid substitutions were observed at positions 63, 124 and 135. Comparison of the predicted protein structure of buffalo IL-2 with that of human and cattle IL-2 did not reveal significant differences. The putative amino acids responsible for IL-2 receptor binding were conserved in buffalo, cattle and human IL-2. The amino acid sequence of buffalo IL-2 also showed very high identity with that of other ruminants, indicating functional cross-reactivity.

  19. Amino acid network for prediction of catalytic residues in enzymes: a comparison survey.

    PubMed

    Zhou, Jianhong; Yan, Wenying; Hu, Guang; Shen, Bairong

    2016-01-01

    Catalytic residues play a significant role in enzyme functions. With the recent accumulation of experimentally determined enzyme 3D structures and network theory on protein structures, the prediction of catalytic residues by amino acid network (AAN, where nodes are residues and links are residue interactions) has gained much interest. Computational methods of identifying catalytic residues are traditionally divided into two groups: sequence-based and structure-based methods. Two new structure- based methods are proposed in current advances: AAN and Elastic Network Model (ENM) of enzyme structures. By concentrating on AAN-based approach, we herein summarized network properties for predictions of catalytic residues. AAN attributes were showed responsible for performance improvement, and therefore the combination of AAN with previous sequence and structural information will be a promising direction for further improvement. Advantages and limitations of AAN-based methods, future perspectives on the application of AAN to the study of protein structure-function relationships are discussed.

  20. Affinity regression predicts the recognition code of nucleic acid binding proteins

    PubMed Central

    Pelossof, Raphael; Singh, Irtisha; Yang, Julie L.; Weirauch, Matthew T.; Hughes, Timothy R.; Leslie, Christina S.

    2016-01-01

    Predicting the affinity profiles of nucleic acid-binding proteins directly from the protein sequence is a major unsolved problem. We present a statistical approach for learning the recognition code of a family of transcription factors (TFs) or RNA-binding proteins (RBPs) from high-throughput binding assays. Our method, called affinity regression, trains on protein binding microarray (PBM) or RNA compete experiments to learn an interaction model between proteins and nucleic acids, using only protein domain and probe sequences as inputs. By training on mouse homeodomain PBM profiles, our model correctly identifies residues that confer DNA-binding specificity and accurately predicts binding motifs for an independent set of divergent homeodomains. Similarly, learning from RNA compete profiles for diverse RBPs, our model can predict the binding affinities of held-out proteins and identify key RNA-binding residues. More broadly, we envision applying our method to model and predict biological interactions in any setting where there is a high-throughput ‘affinity’ readout. PMID:26571099

  1. Prediction of multi-drug resistance transporters using a novel sequence analysis method

    PubMed Central

    McDermott, Jason E.; Bruillard, Paul; Overall, Christopher C.; Gosink, Luke; Lindemann, Stephen R.

    2015-01-01

    There are many examples of groups of proteins that have similar function, but the determinants of functional specificity may be hidden by lack of sequence similarity, or by large groups of similar sequences with different functions. Transporters are one such protein group in that the general function, transport, can be easily inferred from the sequence, but the substrate specificity can be impossible to predict from sequence with current methods. In this paper we describe a linguistic-based approach to identify functional patterns from groups of unaligned protein sequences and its application to predict multi-drug resistance transporters (MDRs) from bacteria. We first show that our method can recreate known patterns from PROSITE for several motifs from unaligned sequences. We then show that the method, MDRpred, can predict MDRs with greater accuracy and positive predictive value than a collection of currently available family-based models from the Pfam database. Finally, we apply MDRpred to a large collection of protein sequences from an environmental microbiome study to make novel predictions about drug resistance in a potential environmental reservoir. PMID:26913187

  2. Prediction of influenza B vaccine effectiveness from sequence data.

    PubMed

    Pan, Yidan; Deem, Michael W

    2016-08-31

    Influenza is a contagious respiratory illness that causes significant human morbidity and mortality, affecting 5-15% of the population in a typical epidemic season. Human influenza epidemics are caused by types A and B, with roughly 25% of human cases due to influenza B. Influenza B is a single-stranded RNA virus with a high mutation rate, and both prior immune history and vaccination put significant pressure on the virus to evolve. Due to the high rate of viral evolution, the influenza B vaccine component of the annual influenza vaccine is updated, roughly every other year in recent years. To predict when an update to the vaccine is needed, an estimate of expected vaccine effectiveness against a range of viral strains is required. We here introduce a method to measure antigenic distance between the influenza B vaccine and circulating viral strains. The measure correlates well with effectiveness of the influenza B component of the annual vaccine in humans between 1979 and 2014. We discuss how this measure of antigenic distance may be used in the context of annual influenza vaccine design and prediction of vaccine effectiveness. PMID:27473305

  3. Detection and isolation of nucleic acid sequences using a bifunctional hybridization probe

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2000-01-01

    A method for detecting and isolating a target sequence in a sample of nucleic acids is provided using a bifunctional hybridization probe capable of hybridizing to the target sequence that includes a detectable marker and a first complexing agent capable of forming a binding pair with a second complexing agent. A kit is also provided for detecting a target sequence in a sample of nucleic acids using a bifunctional hybridization probe according to this method.

  4. First draft genome sequencing of indole acetic acid producing and plant growth promoting fungus Preussia sp. BSL10.

    PubMed

    Khan, Abdul Latif; Asaf, Sajjad; Khan, Abdur Rahim; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung

    2016-05-10

    Preussia sp. BSL10, family Sporormiaceae, was actively producing phytohormone (indole-3-acetic acid) and extra-cellular enzymes (phosphatases and glucosidases). The fungus was also promoting the growth of arid-land tree-Boswellia sacra. Looking at such prospects of this fungus, we sequenced its draft genome for the first time. The Illumina based sequence analysis reveals an approximate genome size of 31.4Mbp for Preussia sp. BSL10. Based on ab initio gene prediction, total 32,312 coding sequences were annotated consisting of 11,967 coding genes, pseudogenes, and 221 tRNA genes. Furthermore, 321 carbohydrate-active enzymes were predicted and classified into many functional families. PMID:26995610

  5. Complete amino acid sequence of a Lolium perenne (perennial rye grass) pollen allergen, Lol p II.

    PubMed

    Ansari, A A; Shenbagamurthi, P; Marsh, D G

    1989-07-01

    The complete amino acid sequence of a Lolium perenne (rye grass) pollen allergen, Lol p II was determined by automated Edman degradation of the protein and selected fragments. Cleavage of the protein by enzymatic and chemical techniques established an unambiguous sequence for the protein. Lol p II contains 97 amino acid residues, with a calculated molecular weight of 10,882. The protein lacks cysteine and glutamine and shows no evidence of glycosylation. Theoretical predictions by Fraga's (Fraga, S. (1982) Can. J. Chem. 60, 2606-2610) and Hopp and Woods' (Hopp, T. P., and Woods, K. R. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 3824-3828) methods indicate the presence of four hydrophilic regions, which may contribute to sequential or parts of conformational B-cell epitopes. Analysis of amphipathic regions by Berzofsky's method indicates the presence of a highly amphipathic region, which may contain, or contribute to, an Ia/T-cell epitope. This latter segment of Lol p II was found to be highly homologous with an antibody-binding segment of the major rye allergen Lol p I and may explain why immune responsiveness to both the allergens is associated with HLA-DR3.

  6. Colocalisation of predicted exonic splicing enhancers in BRCA2 with reported sequence variants.

    PubMed

    Pettigrew, Christopher A; Wayte, Nicola; Wronski, Ania; Lovelock, Paul K; Spurdle, Amanda B; Brown, Melissa A

    2008-07-01

    Disruption of the breast cancer susceptibility gene BRCA2 is associated with increased risk of developing breast and ovarian cancer. Over 1800 sequence changes in BRCA2 have been reported, although for many the pathogenicity is unclear. Classifying these changes remains a challenge, as they may disrupt regulatory sequences as well as the primary protein coding sequence. Sequence changes located in the splice site consensus sequences often disrupt splicing, however sequence changes located within exons are also able to alter splicing patterns. Unfortunately, the presence of these exonic splicing enhancers (ESEs) and the functional effect of variants within ESEs it is currently difficult to predict. We have previously developed a method of predicting which sequence changes within exons are likely to affect splicing, using BRCA1 as an example. In this paper, we have predicted ESEs in BRCA2 using the web-based tool ESEfinder and incorporated the same series of filters (increased threshold, 125 nt limit and evolutionary conservation of the motif) in order to identify predicted ESEs that are more likely to be functional. Initially 1114 ESEs were predicted for BRCA2, however after all the filters were included, this figure was reduced to 31, 3% of the original number of predicted ESEs. Reported unclassified sequence variants in BRCA2 were found to colocalise to 55% (17/31) of these conserved ESEs, while polymorphisms colocalised to 0 of the conserved ESEs. In summary, we have identified a subset of unclassified sequence variants in BRCA2 that may adversely affect splicing and thereby contribute to BRCA2 disruption.

  7. Sequence of the cDNA and 5'-flanking region for human acid alpha-glucosidase, detection of an intron in the 5' untranslated leader sequence, definition of 18-bp polymorphisms, and differences with previous cDNA and amino acid sequences.

    PubMed

    Martiniuk, F; Mehler, M; Tzall, S; Meredith, G; Hirschhorn, R

    1990-03-01

    Acid maltase or acid alpha-glucosidase (GAA) is a lysosomal enzyme that hydrolyzes glycogen to glucose and is deficient in glycogen storage disease type II. Previously, we isolated a partial cDNA (1.9 kb) for human GAA; we have now used this cDNA to isolate and determine sequence in longer cDNAs from four additional independent cDNA libraries. Primer extension studies indicated that the mRNA extended approximately 200 bp 5' of the cDNA sequence obtained. Therefore, we isolated a genomic fragment containing 5' cDNA sequences that overlapped the previous cDNA sequence and extended an additional 24 bp to an initiation codon within a Kozak consensus sequence. The sequence of the genomic clone revealed an intron-exon junction 32 bp 5' to the ATG, indicating that the 5' leader sequence was interrupted by an intron. The remaining 186 bp of 5' untranslated sequence was identified approximately 3 kb upstream. The promoter region upstream from the start site of transcription was GC rich and contained areas of homology to Sp1 binding sites but no identifiable CAAT or TATA box. The combined data gave a nucleotide sequence of 2,856 bp for the coding region from the ATG to a stop codon, predicting a protein of 952 amino acids. The 3' untranslated region contained 555 bp with a polyadenylation signal at 3,385 bp followed by 16 bp prior to a poly(A) tail. This sequence of the GAA coding region differs from that reported by Hoefsloot et al. (1988) in three areas that change a total of 42 amino acids. Direct determination of the amino acid sequence in one of these areas confirmed the nucleotide sequence reported here but also disagreed with the directly determined amino acid sequence reported by Hoefsloot et al. (1988). At two other areas, changes in base pairs predicted new restriction sites that were identified in cDNAs from several independent libraries. The amino acid changes in all three ares increased the homology to rabbit-human isomaltase. Therefore, we believe that our

  8. Amino acid sequence of horseshoe crab, Tachypleus tridentatus, striated muscle troponin C.

    PubMed

    Kobayashi, T; Kagami, O; Takagi, T; Konishi, K

    1989-05-01

    The amino acid sequence of troponin C obtained from horseshoe crab, Tachypleus tridentatus, striated muscle was determined by sequence analysis and alignments of chemically and enzymatically cleaved peptides. Troponin C is composed of 153 amino acid residues with a blocked N-terminus and contains no tryptophan or cysteine residue. The site I, one of the four Ca2+-binding sites, is considered to have lost its ability to bind Ca2+ owing to the replacements of certain amino acid residues.

  9. Predicting effects of noncoding variants with deep learning–based sequence model

    PubMed Central

    Zhou, Jian; Troyanskaya, Olga G

    2016-01-01

    Identifying functional effects of noncoding variants is a major challenge in human genetics. To predict the noncoding-variant effects de novo from sequence, we developed a deep learning–based algorithmic framework, DeepSEA (http://deepsea.princeton.edu/), that directly learns a regulatory sequence code from large-scale chromatin-profiling data, enabling prediction of chromatin effects of sequence alterations with single-nucleotide sensitivity. We further used this capability to improve prioritization of functional variants including expression quantitative trait loci (eQTLs) and disease-associated variants. PMID:26301843

  10. Isolation and amino acid sequences of squirrel monkey (Saimiri sciurea) insulin and glucagon.

    PubMed Central

    Yu, J H; Eng, J; Yalow, R S

    1990-01-01

    It was reported two decades ago that insulin was not detectable in the glucose-stimulated state in Saimiri sciurea, the New World squirrel monkey, by a radioimmunoassay system developed with guinea pig anti-pork insulin antibody and labeled pork insulin. With the same system, reasonable levels were observed in rhesus monkeys and chimpanzees. This suggested that New World monkeys, like the New World hystricomorph rodents such as the guinea pig and the coypu, might have insulins whose sequences differ markedly from those of Old World mammals. In this report we describe the purification and amino acid sequences of squirrel monkey insulin and glucagon. We demonstrate that the substitutions at B29, B27, A2, A4, and A17 of squirrel monkey insulin are identical with those previously found in another New World primate, the owl monkey (Aotus trivirgatus). The immunologic cross-reactivity of this insulin in our immunoassay system is only a few percent of that of human insulin. Squirrel monkey glucagon is identical with the usual glucagon found in Old World mammals, which predicts that the glucagons of other New World monkeys would not differ from the usual Old World mammalian glucagon. It appears that the peptides of the New World monkeys have diverged less from those of the Old World mammals than have those of the New World hystricomorph rodents. The striking improvements in peptide purification and sequencing have the potential for adding new information concerning the evolutionary divergence of species. PMID:2263627

  11. Disjoint combinations profiling (DCP): a new method for the prediction of antibody CDR conformation from sequence.

    PubMed

    Nikoloudis, Dimitris; Pitts, Jim E; Saldanha, José W

    2014-01-01

    The accurate prediction of the conformation of Complementarity-Determining Regions (CDRs) is important in modelling antibodies for protein engineering applications. Specifically, the Canonical paradigm has proved successful in predicting the CDR conformation in antibody variable regions. It relies on canonical templates which detail allowed residues at key positions in the variable region framework or in the CDR itself for 5 of the 6 CDRs. While no templates have as yet been defined for the hypervariable CDR-H3, instead, reliable sequence rules have been devised for predicting the base of the CDR-H3 loop. Here a new method termed Disjoint Combinations Profiling (DCP) is presented, which contributes a considerable advance in the prediction of CDR conformations. This novel method is explained and compared with canonical templates and sequence rules in a 3-way blind prediction. DCP achieved 93% accuracy over 951 blind predictions and showed an improvement in cumulative accuracy compared to predictions with canonical templates or sequence rules. In addition to its overall improvement in prediction accuracy, it is suggested that DCP is open to better implementations in the future and that it can improve as more antibody structures are deposited in the databank. In contrast, it is argued that canonical templates and sequence rules may have reached their peak.

  12. Sequence Prediction With Sparse Distributed Hyperdimensional Coding Applied to the Analysis of Mobile Phone Use Patterns.

    PubMed

    Rasanen, Okko J; Saarinen, Jukka P

    2016-09-01

    Modeling and prediction of temporal sequences is central to many signal processing and machine learning applications. Prediction based on sequence history is typically performed using parametric models, such as fixed-order Markov chains ( n -grams), approximations of high-order Markov processes, such as mixed-order Markov models or mixtures of lagged bigram models, or with other machine learning techniques. This paper presents a method for sequence prediction based on sparse hyperdimensional coding of the sequence structure and describes how higher order temporal structures can be utilized in sparse coding in a balanced manner. The method is purely incremental, allowing real-time online learning and prediction with limited computational resources. Experiments with prediction of mobile phone use patterns, including the prediction of the next launched application, the next GPS location of the user, and the next artist played with the phone media player, reveal that the proposed method is able to capture the relevant variable-order structure from the sequences. In comparison with the n -grams and the mixed-order Markov models, the sparse hyperdimensional predictor clearly outperforms its peers in terms of unweighted average recall and achieves an equal level of weighted average recall as the mixed-order Markov chain but without the batch training of the mixed-order model.

  13. Amino acid sequence analysis and characterization of a ribonuclease from starfish Asterias amurensis.

    PubMed

    Motoyoshi, Naomi; Kobayashi, Hiroko; Itagaki, Tadashi; Inokuchi, Norio

    2016-09-01

    The aim of this study was to phylogenetically characterize the location of the RNase T2 enzyme in the starfish (Asterias amurensis). We isolated an RNase T2 ribonuclease (RNase Aa) from the ovaries of starfish and determined its amino acid sequence by protein chemistry and cloning cDNA encoding RNase Aa. The isolated protein had 231 amino acid residues, a predicted molecular mass of 25,906 Da, and an optimal pH of 5.0. RNase Aa preferentially released guanylic acid from the RNA. The catalytic sites of the RNase T2 family are conserved in RNase Aa; furthermore, the distribution of the cysteine residues in RNase Aa is similar to that in other animal and plant T2 RNases. RNase Aa is cleaved at two points: 21 residues from the N-terminus and 29 residues from the C-terminus; however, both fragments may remain attached to the protein via disulfide bridges, leading to the maintenance of its conformation, as suggested by circular dichroism spectrum analysis. The phylogenetic analysis revealed that starfish RNase Aa is evolutionarily an intermediate between protozoan and oyster RNases. PMID:26920046

  14. Trichomonas vaginalis acidic phospholipase A2: isolation and partial amino acid sequence.

    PubMed

    Escobedo-Guajardo, Brenda L; González-Salazar, Francisco; Palacios-Corona, Rebeca; Torres de la Cruz, Víctor M; Morales-Vallarta, Mario; Mata-Cárdenas, Benito D; Garza-González, Jesús N; Rivera-Silva, Gerardo; Vargas-Villarreal, Javier

    2013-12-01

    Sexually transmitted diseases are a major cause of acute disease worldwide, and trichomoniasis is the most common and curable disease, generating more than 170 million cases annually worldwide. Trichomonas vaginalis is the causal agent of trichomoniasis and has the ability to destroy in vitro cell monolayers of the vaginal mucosa, where the phospholipases A2 (PLA2) have been reported as potential virulence factors. These enzymes have been partially characterized from the subcellular fraction S30 of pathogenic T. vaginalis strains. The main objective of this study was to purify a phospholipase A2 from T. vaginalis, make a partial characterization, obtain a partial amino acid sequence, and determine its enzymatic participation as hemolytic factor causing lysis of erythrocytes. Trichomonas S30, RF30 and UFF30 sub-fractions from GT-15 strain have the capacity to hydrolyze [2-(14)C-PA]-PC at pH 6.0. Proteins from the UFF30 sub-fraction were separated by affinity chromatography into two eluted fractions with detectable PLA A2 activity. The EDTA-eluted fraction was analyzed by HPLC using on-line HPLC-tandem mass spectrometry and two protein peaks were observed at 8.2 and 13 kDa. Peptide sequences were identified from the proteins present in the eluted EDTA UFF30 fraction; bioinformatic analysis using Protein Link Global Server charged with T. vaginalis protein database suggests that eluted peptides correspond a putative ubiquitin protein in the 8.2 kDa fraction and a phospholipase preserved in the 13 kDa fraction. The EDTA-eluted fraction hydrolyzed [2-(14)C-PA]-PC lyses erythrocytes from Sprague-Dawley in a time and dose-dependent manner. The acidic hemolytic activity decreased by 84% with the addition of 100 μM of Rosenthal's inhibitor. PMID:24338313

  15. Trichomonas vaginalis acidic phospholipase A2: isolation and partial amino acid sequence.

    PubMed

    Escobedo-Guajardo, Brenda L; González-Salazar, Francisco; Palacios-Corona, Rebeca; Torres de la Cruz, Víctor M; Morales-Vallarta, Mario; Mata-Cárdenas, Benito D; Garza-González, Jesús N; Rivera-Silva, Gerardo; Vargas-Villarreal, Javier

    2013-12-01

    Sexually transmitted diseases are a major cause of acute disease worldwide, and trichomoniasis is the most common and curable disease, generating more than 170 million cases annually worldwide. Trichomonas vaginalis is the causal agent of trichomoniasis and has the ability to destroy in vitro cell monolayers of the vaginal mucosa, where the phospholipases A2 (PLA2) have been reported as potential virulence factors. These enzymes have been partially characterized from the subcellular fraction S30 of pathogenic T. vaginalis strains. The main objective of this study was to purify a phospholipase A2 from T. vaginalis, make a partial characterization, obtain a partial amino acid sequence, and determine its enzymatic participation as hemolytic factor causing lysis of erythrocytes. Trichomonas S30, RF30 and UFF30 sub-fractions from GT-15 strain have the capacity to hydrolyze [2-(14)C-PA]-PC at pH 6.0. Proteins from the UFF30 sub-fraction were separated by affinity chromatography into two eluted fractions with detectable PLA A2 activity. The EDTA-eluted fraction was analyzed by HPLC using on-line HPLC-tandem mass spectrometry and two protein peaks were observed at 8.2 and 13 kDa. Peptide sequences were identified from the proteins present in the eluted EDTA UFF30 fraction; bioinformatic analysis using Protein Link Global Server charged with T. vaginalis protein database suggests that eluted peptides correspond a putative ubiquitin protein in the 8.2 kDa fraction and a phospholipase preserved in the 13 kDa fraction. The EDTA-eluted fraction hydrolyzed [2-(14)C-PA]-PC lyses erythrocytes from Sprague-Dawley in a time and dose-dependent manner. The acidic hemolytic activity decreased by 84% with the addition of 100 μM of Rosenthal's inhibitor.

  16. A nucleic acid sequence-based amplification system for detection of Listeria monocytogenes hlyA sequences.

    PubMed Central

    Blais, B W; Turner, G; Sooknanan, R; Malek, L T

    1997-01-01

    A nucleic acid sequence-based amplification system primarily targeting mRNA from the Listeria monocytogenes hlyA gene was developed. This system enabled the detection of low numbers (< 10 CFU/g) of L. monocytogenes cells inoculated into a variety of dairy and egg products after 48 h of enrichment in modified listeria enrichment broth. PMID:8979357

  17. Reaction sequences in simulated neutralized current acid waste slurry during processing with formic acid

    SciTech Connect

    Smith, H.D.; Wiemers, K.D.; Langowski, M.H.; Powell, M.R.; Larson, D.E.

    1993-11-01

    The Hanford Waste Vitrification Plant (HWVP) is being designed for the Department of Energy to immobilize high-level and transuranic wastes as glass for permanent disposal. Pacific Northwest Laboratory is supporting the HWVP design activities by conducting laboratory-scale studies using a HWVP simulated waste slurry. Conditions which affect the slurry processing chemistry were evaluated in terms of offgas composition and peak generation rate and changes in slurry composition. A standard offgas profile defined in terms of three reaction phases, decomposition of H{sub 2}CO{sub 3}, destruction of NO{sub 2}{sup {minus}}, and production of H{sub 2} and NH{sub 3} was used as a baseline against which changes were evaluated. The test variables include nitrite concentration, acid neutralization capacity, temperature, and formic acid addition rate. Results to date indicate that pH is an important parameter influencing the N{sub 2}O/NO{sub x} generation ratio; nitrite can both inhibit and activate rhodium as a catalyst for formic acid decomposition to CO{sub 2} and H{sub 2}; and a separate reduced metal phase forms in the reducing environment. These data are being compiled to provide a basis for predicting the HWVP feed processing chemistry as a function of feed composition and operation variables, recommending criteria for chemical adjustments, and providing guidelines with respect to important control parameters to consider during routine and upset plant operation.

  18. Amino acid sequence and structural properties of protein p12, an African swine fever virus attachment protein.

    PubMed Central

    Alcamí, A; Angulo, A; López-Otín, C; Muñoz, M; Freije, J M; Carrascosa, A L; Viñuela, E

    1992-01-01

    The gene encoding the African swine fever virus protein p12, which is involved in virus attachment to the host cell, has been mapped and sequenced in the genome of the Vero-adapted virus strain BA71V. The determination of the N-terminal amino acid sequence and the hybridization of oligonucleotide probes derived from this sequence to cloned restriction fragments allowed the mapping of the gene in fragment EcoRI-O, located in the central region of the viral genome. The DNA sequence of an EcoRI-XbaI fragment showed an open reading frame which is predicted to encode a polypeptide of 61 amino acids. The expression of this open reading frame in rabbit reticulocyte lysates and in Escherichia coli gave rise to a 12-kDa polypeptide that was immunoprecipitated with a monoclonal antibody specific for protein p12. The hydrophilicity profile indicated the existence of a stretch of 22 hydrophobic residues in the central part that may anchor the protein in the virus envelope. Three forms of the protein with apparent molecular masses of 17, 12, and 10 kDa in sodium dodecyl sulfate-polyacrylamide gel electrophoresis have been observed, depending on the presence of 2-mercaptoethanol and alkylation with 4-vinylpyridine, indicating that disulfide bonds are responsible for the multimerization of the protein. This result was in agreement with the existence of a cysteine-rich domain in the C-terminal region of the predicted amino acid sequence. The protein was synthesized at late times of infection, and no posttranslational modifications such as glycosylation, phosphorylation, or fatty acid acylation were detected. Images PMID:1583732

  19. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.

  20. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-03-24

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.

  1. Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto-encoder deep neural network.

    PubMed

    Lyons, James; Dehzangi, Abdollah; Heffernan, Rhys; Sharma, Alok; Paliwal, Kuldip; Sattar, Abdul; Zhou, Yaoqi; Yang, Yuedong

    2014-10-30

    Because a nearly constant distance between two neighbouring Cα atoms, local backbone structure of proteins can be represented accurately by the angle between C(αi-1)-C(αi)-C(αi+1) (θ) and a dihedral angle rotated about the C(αi)-C(αi+1) bond (τ). θ and τ angles, as the representative of structural properties of three to four amino-acid residues, offer a description of backbone conformations that is complementary to φ and ψ angles (single residue) and secondary structures (>3 residues). Here, we report the first machine-learning technique for sequence-based prediction of θ and τ angles. Predicted angles based on an independent test have a mean absolute error of 9° for θ and 34° for τ with a distribution on the θ-τ plane close to that of native values. The average root-mean-square distance of 10-residue fragment structures constructed from predicted θ and τ angles is only 1.9Å from their corresponding native structures. Predicted θ and τ angles are expected to be complementary to predicted ϕ and ψ angles and secondary structures for using in model validation and template-based as well as template-free structure prediction. The deep neural network learning technique is available as an on-line server called Structural Property prediction with Integrated DEep neuRal network (SPIDER) at http://sparks-lab.org.

  2. AGenDA: gene prediction by cross-species sequence comparison.

    PubMed

    Taher, Leila; Rinner, Oliver; Garg, Saurabh; Sczyrba, Alexander; Morgenstern, Burkhard

    2004-07-01

    Automatic gene prediction is one of the major challenges in computational sequence analysis. Traditional approaches to gene finding rely on statistical models derived from previously known genes. By contrast, a new class of comparative methods relies on comparing genomic sequences from evolutionary related organisms to each other. These methods are based on the concept of phylogenetic footprinting: they exploit the fact that functionally important regions in genomic sequences are usually more conserved than non-functional regions. We created a WWW-based software program for homology-based gene prediction at BiBiServ (Bielefeld Bioinformatics Server). Our tool takes pairs of evolutionary related genomic sequences as input data, e.g. from human and mouse. The server runs CHAOS and DIALIGN to create an alignment of the input sequences and subsequently searches for conserved splicing signals and start/stop codons near regions of local sequence conservation. Genes are predicted based on local homology information and splice signals. The server returns predicted genes together with a graphical representation of the underlying alignment. The program is available at http://bibiserv.TechFak.Uni-Bielefeld.DE/agenda/.

  3. The amino acid sequence of elephant (Elephas maximus) myoglobin and the phylogeny of Proboscidea.

    PubMed

    Dene, H; Goodman, M; Romero-Herrera, A E

    1980-02-13

    The complete amino acid sequence of skeletal myoglobin from the Asian elephant (Elephas maximus) is reported. The functional significance of variations seen when this sequence is compared with that of sperm whale myoglobin is explored in the light of the crystallographic model available for the latter molecule. The phylogenetic implications of the elephant myoglobin amino acid sequence are evaluated by using the maximum parsimony technique. A similar analysis is also presented which incorporates all of the proteins sequenced from the elephant. These results are discussed with respect to current views on proboscidean phylogeny.

  4. EST-PAC a web package for EST annotation and protein sequence prediction

    PubMed Central

    Strahm, Yvan; Powell, David; Lefèvre, Christophe

    2006-01-01

    With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST) from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST) annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1) searching local or remote biological databases for sequence similarities using Blast services, 2) predicting protein coding sequence from EST data and, 3) annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics. PMID:17147782

  5. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

    PubMed Central

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.

    2013-01-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147

  6. Facile Analysis and Sequencing of Linear and Branched Peptide Boronic Acids by MALDI Mass Spectrometry

    PubMed Central

    Crumpton, Jason; Zhang, Wenyu; Santos, Webster

    2011-01-01

    Interest in peptides incorporating boronic acid moieties is increasing due to their potential as therapeutics/diagnostics for a variety of diseases such as cancer. The utility of peptide boronic acids may be expanded with access to vast libraries that can be deconvoluted rapidly and economically. Unfortunately, current detection protocols using mass spectrometry are laborious and confounded by boronic acid trimerization, which requires time consuming analysis of dehydration products. These issues are exacerbated when the peptide sequence is unknown, as with de novo sequencing, and especially when multiple boronic acid moieties are present. Thus, a rapid, reliable and simple method for peptide identification is of utmost importance. Herein, we report the identification and sequencing of linear and branched peptide boronic acids containing up to five boronic acid groups by matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS). Protocols for preparation of pinacol boronic esters were adapted for efficient MALDI analysis of peptides. Additionally, a novel peptide boronic acid detection strategy was developed in which 2,5-dihydroxybenzoic acid (DHB) served as both matrix and derivatizing agent in a convenient, in situ, on-plate esterification. Finally, we demonstrate that DHB-modified peptide boronic acids from a single bead can be analyzed by MALDI-MSMS analysis, validating our approach for the identification and sequencing of branched peptide boronic acid libraries. PMID:21449540

  7. Evolution of an Enzyme from a Noncatalytic Nucleic Acid Sequence.

    PubMed

    Gysbers, Rachel; Tram, Kha; Gu, Jimmy; Li, Yingfu

    2015-01-01

    The mechanism by which enzymes arose from both abiotic and biological worlds remains an unsolved natural mystery. We postulate that an enzyme can emerge from any sequence of any functional polymer under permissive evolutionary conditions. To support this premise, we have arbitrarily chosen a 50-nucleotide DNA fragment encoding for the Bos taurus (cattle) albumin mRNA and subjected it to test-tube evolution to derive a catalytic DNA (DNAzyme) with RNA-cleavage activity. After only a few weeks, a DNAzyme with significant catalytic activity has surfaced. Sequence comparison reveals that seven nucleotides are responsible for the conversion of the noncatalytic sequence into the enzyme. Deep sequencing analysis of DNA pools along the evolution trajectory has identified individual mutations as the progressive drivers of the molecular evolution. Our findings demonstrate that an enzyme can indeed arise from a sequence of a functional polymer via permissive molecular evolution, a mechanism that may have been exploited by nature for the creation of the enormous repertoire of enzymes in the biological world today. PMID:26091540

  8. Severe accident source term characteristics for selected Peach Bottom sequences predicted by the MELCOR Code

    SciTech Connect

    Carbajo, J.J.

    1993-09-01

    The purpose of this report is to compare in-containment source terms developed for NUREG-1159, which used the Source Term Code Package (STCP), with those generated by MELCOR to identify significant differences. For this comparison, two short-term depressurized station blackout sequences (with a dry cavity and with a flooded cavity) and a Loss-of-Coolant Accident (LOCA) concurrent with complete loss of the Emergency Core Cooling System (ECCS) were analyzed for the Peach Bottom Atomic Power Station (a BWR-4 with a Mark I containment). The results indicate that for the sequences analyzed, the two codes predict similar total in-containment release fractions for each of the element groups. However, the MELCOR/CORBH Package predicts significantly longer times for vessel failure and reduced energy of the released material for the station blackout sequences (when compared to the STCP results). MELCOR also calculated smaller releases into the environment than STCP for the station blackout sequences.

  9. Computer Simulation of the Determination of Amino Acid Sequences in Polypeptides

    ERIC Educational Resources Information Center

    Daubert, Stephen D.; Sontum, Stephen F.

    1977-01-01

    Describes a computer program that generates a random string of amino acids and guides the student in determining the correct sequence of a given protein by using experimental analytic data for that protein. (MLH)

  10. αIIbβ3 variants defined by next-generation sequencing: Predicting variants likely to cause Glanzmann thrombasthenia

    PubMed Central

    Buitrago, Lorena; Rendon, Augusto; Liang, Yupu; Simeoni, Ilenia; Negri, Ana; Filizola, Marta; Ouwehand, Willem H.; Coller, Barry S.; Alessi, Marie-Christine; Ballmaier, Matthias; Bariana, Tadbir; Bellissimo, Daniel; Bertoli, Marta; Bray, Paul; Bury, Loredana; Carrell, Robin; Cattaneo, Marco; Collins, Peter; French, Deborah; Favier, Remi; Freson, Kathleen; Furie, Bruce; Germeshausen, Manuela; Ghevaert, Cedric; Gomez, Keith; Goodeve, Anne; Gresele, Paolo; Guerrero, Jose; Hampshire, Dan J.; Hadinnapola, Charaka; Heemskerk, Johan; Henskens, Yvonne; Hill, Marian; Hogg, Nancy; Johnsen, Jill; Kahr, Walter; Kerr, Ron; Kunishima, Shinji; Laffan, Michael; Natwani, Amit; Neerman-Arbez, Marguerite; Nurden, Paquita; Nurden, Alan; Ormiston, Mark; Othman, Maha; Ouwehand, Willem; Perry, David; Vilk, Shoshana Ravel; Reitsma, Pieter; Rondina, Matthew; Simeoni, Ilenia; Smethurst, Peter; Stephens, Jonathan; Stevenson, William; Szkotak, Artur; Turro, Ernest; Van Geet, Christel; Vries, Minka; Ward, June; Waye, John; Westbury, Sarah; Whiteheart, Sidney; Wilcox, David; Zhang, Bi

    2015-01-01

    Next-generation sequencing is transforming our understanding of human genetic variation but assessing the functional impact of novel variants presents challenges. We analyzed missense variants in the integrin αIIbβ3 receptor subunit genes ITGA2B and ITGB3 identified by whole-exome or -genome sequencing in the ThromboGenomics project, comprising ∼32,000 alleles from 16,108 individuals. We analyzed the results in comparison with 111 missense variants in these genes previously reported as being associated with Glanzmann thrombasthenia (GT), 20 associated with alloimmune thrombocytopenia, and 5 associated with aniso/macrothrombocytopenia. We identified 114 novel missense variants in ITGA2B (affecting ∼11% of the amino acids) and 68 novel missense variants in ITGB3 (affecting ∼9% of the amino acids). Of the variants, 96% had minor allele frequencies (MAF) < 0.1%, indicating their rarity. Based on sequence conservation, MAF, and location on a complete model of αIIbβ3, we selected three novel variants that affect amino acids previously associated with GT for expression in HEK293 cells. αIIb P176H and β3 C547G severely reduced αIIbβ3 expression, whereas αIIb P943A partially reduced αIIbβ3 expression and had no effect on fibrinogen binding. We used receiver operating characteristic curves of combined annotation-dependent depletion, Polyphen 2-HDIV, and sorting intolerant from tolerant to estimate the percentage of novel variants likely to be deleterious. At optimal cut-off values, which had 69–98% sensitivity in detecting GT mutations, between 27% and 71% of the novel αIIb or β3 missense variants were predicted to be deleterious. Our data have implications for understanding the evolutionary pressure on αIIbβ3 and highlight the challenges in predicting the clinical significance of novel missense variants. PMID:25827233

  11. Predicting Mendelian Disease-Causing Non-Synonymous Single Nucleotide Variants in Exome Sequencing Studies

    PubMed Central

    Bao, Su-Ying; Yang, Wanling; Ho, Shu-Leong; Song, Yong-Qiang; Sham, Pak C.

    2013-01-01

    Exome sequencing is becoming a standard tool for mapping Mendelian disease-causing (or pathogenic) non-synonymous single nucleotide variants (nsSNVs). Minor allele frequency (MAF) filtering approach and functional prediction methods are commonly used to identify candidate pathogenic mutations in these studies. Combining multiple functional prediction methods may increase accuracy in prediction. Here, we propose to use a logit model to combine multiple prediction methods and compute an unbiased probability of a rare variant being pathogenic. Also, for the first time we assess the predictive power of seven prediction methods (including SIFT, PolyPhen2, CONDEL, and logit) in predicting pathogenic nsSNVs from other rare variants, which reflects the situation after MAF filtering is done in exome-sequencing studies. We found that a logit model combining all or some original prediction methods outperforms other methods examined, but is unable to discriminate between autosomal dominant and autosomal recessive disease mutations. Finally, based on the predictions of the logit model, we estimate that an individual has around 5% of rare nsSNVs that are pathogenic and carries ∼22 pathogenic derived alleles at least, which if made homozygous by consanguineous marriages may lead to recessive diseases. PMID:23341771

  12. Predicting mendelian disease-causing non-synonymous single nucleotide variants in exome sequencing studies.

    PubMed

    Li, Miao-Xin; Kwan, Johnny S H; Bao, Su-Ying; Yang, Wanling; Ho, Shu-Leong; Song, Yong-Qiang; Sham, Pak C

    2013-01-01

    Exome sequencing is becoming a standard tool for mapping Mendelian disease-causing (or pathogenic) non-synonymous single nucleotide variants (nsSNVs). Minor allele frequency (MAF) filtering approach and functional prediction methods are commonly used to identify candidate pathogenic mutations in these studies. Combining multiple functional prediction methods may increase accuracy in prediction. Here, we propose to use a logit model to combine multiple prediction methods and compute an unbiased probability of a rare variant being pathogenic. Also, for the first time we assess the predictive power of seven prediction methods (including SIFT, PolyPhen2, CONDEL, and logit) in predicting pathogenic nsSNVs from other rare variants, which reflects the situation after MAF filtering is done in exome-sequencing studies. We found that a logit model combining all or some original prediction methods outperforms other methods examined, but is unable to discriminate between autosomal dominant and autosomal recessive disease mutations. Finally, based on the predictions of the logit model, we estimate that an individual has around 5% of rare nsSNVs that are pathogenic and carries ~22 pathogenic derived alleles at least, which if made homozygous by consanguineous marriages may lead to recessive diseases. PMID:23341771

  13. The amino acid sequence of monal pheasant lysozyme and its activity.

    PubMed

    Araki, T; Matsumoto, T; Torikata, T

    1998-10-01

    The amino acid sequence of monal pheasant lysozyme and its activity were analyzed. Carboxymethylated lysozyme was digested with trypsin and the resulting peptides were sequenced. The established amino acid sequence had one amino acid substitution at position 102 (Arg to Gly) comparing with Indian peafowl lysozyme and four amino acid substitutions at positions 3 (Phe to Tyr), 15 (His to Leu), 41 (Gln to His), and 121 (Gln to His) with chicken lysozyme. Analysis of the time-courses of reaction using N-acetylglucosamine pentamer as a substrate showed a difference of binding free energy change (-0.4 kcal/mol) at subsites A between monal pheasant and Indian peafowl lysozyme. This was assumed to be caused by the amino acid substitution at subsite A with loss of a positive charge at position 102 (Arg102 to Gly).

  14. The amino acid sequence of monal pheasant lysozyme and its activity.

    PubMed

    Araki, T; Matsumoto, T; Torikata, T

    1998-10-01

    The amino acid sequence of monal pheasant lysozyme and its activity were analyzed. Carboxymethylated lysozyme was digested with trypsin and the resulting peptides were sequenced. The established amino acid sequence had one amino acid substitution at position 102 (Arg to Gly) comparing with Indian peafowl lysozyme and four amino acid substitutions at positions 3 (Phe to Tyr), 15 (His to Leu), 41 (Gln to His), and 121 (Gln to His) with chicken lysozyme. Analysis of the time-courses of reaction using N-acetylglucosamine pentamer as a substrate showed a difference of binding free energy change (-0.4 kcal/mol) at subsites A between monal pheasant and Indian peafowl lysozyme. This was assumed to be caused by the amino acid substitution at subsite A with loss of a positive charge at position 102 (Arg102 to Gly). PMID:9836434

  15. cDNA-derived amino acid sequences of myoglobins from nine species of whales and dolphins.

    PubMed

    Iwanami, Kentaro; Mita, Hajime; Yamamoto, Yasuhiko; Fujise, Yoshihiro; Yamada, Tadasu; Suzuki, Tomohiko

    2006-10-01

    We determined the myoglobin (Mb) cDNA sequences of nine cetaceans, of which six are the first reports of Mb sequences: sei whale (Balaenoptera borealis), Bryde's whale (Balaenoptera edeni), pygmy sperm whale (Kogia breviceps), Stejneger's beaked whale (Mesoplodon stejnegeri), Longman's beaked whale (Indopacetus pacificus), and melon-headed whale (Peponocephala electra), and three confirm the previously determined chemical amino acid sequences: sperm whale (Physeter macrocephalus), common minke whale (Balaenoptera acutorostrata) and pantropical spotted dolphin (Stenella attenuata). We found two types of Mb in the skeletal muscle of pantropical spotted dolphin: Mb I with the same amino acid sequence as that deposited in the protein database, and Mb II, which differs at two amino acid residues compared with Mb I. Using an alignment of the amino acid or cDNA sequences of cetacean Mb, we constructed a phylogenetic tree by the NJ method. Clustering of cetacean Mb amino acid and cDNA sequences essentially follows the classical taxonomy of cetaceans, suggesting that Mb sequence data is valid for classification of cetaceans at least to the family level. PMID:16962803

  16. Studies on monotreme proteins. VII. Amino acid sequence of myoglobin from the platypus, Ornithoryhynchus anatinus.

    PubMed

    Fisher, W K; Thompson, E O

    1976-03-01

    Myoglobin isolated from skeletal muscle of the platypus contains 153 amino acid residues. The complete amino acid sequence has been determined following cleavage with cyanogen bromide and further digestion of the four fragments with trypsin, chymotrypsin, pepsin and thermolysin. Sequences of the purified peptides were determined by the dansyl-Edman procedure. The amino acid sequence showed 25 differences from human myoglobin and 24 from kangaroo myoglobin. Amino acid sequences in myoglobins are more conserved than sequences in the alpha- and beta-globin chains, and platypus myoglobin shows a similar number of variations in sequence to kangaroo myoglobin when compared with myoglobin of other species. The date of divergence of the platypus from other mammals was estimated at 102 +/- 31 million years, based on the number of amino acid differences between species and allowing for mutations during the evolutionary period. This estimate differs widely from the estimate given by similar treatment of the alpha- and beta-chain sequences and a constant rate of mutation of globin chains is not supported. PMID:962722

  17. SGP-1: prediction and validation of homologous genes based on sequence alignments.

    PubMed

    Wiehe, T; Gebauer-Jung, S; Mitchell-Olds, T; Guigó, R

    2001-09-01

    Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of depends little on species-specific properties such as codon usage or the nucleotide distribution. may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors.

  18. DINAMelt web server for nucleic acid melting prediction

    PubMed Central

    Markham, Nicholas R.; Zuker, Michael

    2005-01-01

    The DINAMelt web server simulates the melting of one or two single-stranded nucleic acids in solution. The goal is to predict not just a melting temperature for a hybridized pair of nucleic acids, but entire equilibrium melting profiles as a function of temperature. The two molecules are not required to be complementary, nor must the two strand concentrations be equal. Competition among different molecular species is automatically taken into account. Calculations consider not only the heterodimer, but also the two possible homodimers, as well as the folding of each single-stranded molecule. For each of these five molecular species, free energies are computed by summing Boltzmann factors over every possible hybridized or folded state. For temperatures within a user-specified range, calculations predict species mole fractions together with the free energy, enthalpy, entropy and heat capacity of the ensemble. Ultraviolet (UV) absorbance at 260 nm is simulated using published extinction coefficients and computed base pair probabilities. All results are available as text files and plots are provided for species concentrations, heat capacity and UV absorbance versus temperature. This server is connected to an active research program and should evolve as new theory and software are developed. The server URL is . PMID:15980540

  19. Complete genome sequence of Enterococcus mundtii QU 25, an efficient L-(+)-lactic acid-producing bacterium.

    PubMed

    Shiwa, Yuh; Yanase, Hiroaki; Hirose, Yuu; Satomi, Shohei; Araya-Kojima, Tomoko; Watanabe, Satoru; Zendo, Takeshi; Chibazakura, Taku; Shimizu-Kadota, Mariko; Yoshikawa, Hirofumi; Sonomoto, Kenji

    2014-08-01

    Enterococcus mundtii QU 25, a non-dairy bacterial strain of ovine faecal origin, can ferment both cellobiose and xylose to produce l-lactic acid. The use of this strain is highly desirable for economical l-lactate production from renewable biomass substrates. Genome sequence determination is necessary for the genetic improvement of this strain. We report the complete genome sequence of strain QU 25, primarily determined using Pacific Biosciences sequencing technology. The E. mundtii QU 25 genome comprises a 3 022 186-bp single circular chromosome (GC content, 38.6%) and five circular plasmids: pQY182, pQY082, pQY039, pQY024, and pQY003. In all, 2900 protein-coding sequences, 63 tRNA genes, and 6 rRNA operons were predicted in the QU 25 chromosome. Plasmid pQY024 harbours genes for mundticin production. We found that strain QU 25 produces a bacteriocin, suggesting that mundticin-encoded genes on plasmid pQY024 were functional. For lactic acid fermentation, two gene clusters were identified-one involved in the initial metabolism of xylose and uptake of pentose and the second containing genes for the pentose phosphate pathway and uptake of related sugars. This is the first complete genome sequence of an E. mundtii strain. The data provide insights into lactate production in this bacterium and its evolution among enterococci.

  20. DNApi: A De Novo Adapter Prediction Algorithm for Small RNA Sequencing Data

    PubMed Central

    Tsuji, Junko; Weng, Zhiping

    2016-01-01

    With the rapid accumulation of publicly available small RNA sequencing datasets, third-party meta-analysis across many datasets is becoming increasingly powerful. Although removing the 3´ adapter is an essential step for small RNA sequencing analysis, the adapter sequence information is not always available in the metadata. The information can be also erroneous even when it is available. In this study, we developed DNApi, a lightweight Python software package that predicts the 3´ adapter sequence de novo and provides the user with cleansed small RNA sequences ready for down stream analysis. Tested on 539 publicly available small RNA libraries accompanied with 3´ adapter sequences in their metadata, DNApi shows near-perfect accuracy (98.5%) with fast runtime (~2.85 seconds per library) and efficient memory usage (~43 MB on average). In addition to 3´ adapter prediction, it is also important to classify whether the input small RNA libraries were already processed, i.e. the 3´ adapters were removed. DNApi perfectly judged that given another batch of datasets, 192 publicly available processed libraries were “ready-to-map” small RNA sequence. DNApi is compatible with Python 2 and 3, and is available at https://github.com/jnktsj/DNApi. The 731 small RNA libraries used for DNApi evaluation were from human tissues and were carefully and manually collected. This study also provides readers with the curated datasets that can be integrated into their studies. PMID:27736901

  1. Multiple Genome Sequences of Important Beer-Spoiling Lactic Acid Bacteria

    PubMed Central

    Geissler, Andreas J.; Vogel, Rudi F.

    2016-01-01

    Seven strains of important beer-spoiling lactic acid bacteria were sequenced using single-molecule real-time sequencing. Complete genomes were obtained for strains of Lactobacillus paracollinoides, Lactobacillus lindneri, and Pediococcus claussenii. The analysis of these genomes emphasizes the role of plasmids as the genomic foundation of beer-spoiling ability. PMID:27795248

  2. All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences

    PubMed Central

    Hayat, Sikander; Sander, Chris; Marks, Debora S.

    2015-01-01

    Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand–strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases. PMID:25858953

  3. Maps, codes, and sequence elements: can we predict the protein output from an alternatively spliced locus?

    PubMed

    Sharma, Shalini; Black, Douglas L

    2006-11-22

    Alternative splicing choices are governed by splicing regulatory protein interactions with splicing silencer and enhancer elements present in the pre-mRNA. However, the prediction of these choices from genomic sequence is difficult, in part because the regulators can act as either enhancers or silencers. A recent study describes how for a particular neuronal splicing regulatory protein, Nova, the location of its binding sites is highly predictive of the protein's effect on an exon's splicing.

  4. Sequence-specific flexibility organization of splicing flanking sequence and prediction of splice sites in the human genome.

    PubMed

    Zuo, Yongchun; Zhang, Pengfei; Liu, Li; Li, Tao; Peng, Yong; Li, Guangpeng; Li, Qianzhong

    2014-09-01

    More and more reported results of nucleosome positioning and histone modifications showed that DNA structure play a well-established role in splicing. In this study, a set of DNA geometric flexibility parameters originated from molecular dynamics (MD) simulations were introduced to discuss the structure organization around splice sites at the DNA level. The obtained profiles of specific flexibility/stiffness around splice sites indicated that the DNA physical-geometry deformation could be used as an alternative way to describe the splicing junction region. In combination with structural flexibility as discriminatory parameter, we developed a hybrid computational model for predicting potential splicing sites. And the better prediction performance was achieved when the benchmark dataset evaluated. Our results showed that the mechanical deformability character of a splice junction is closely correlated with both the splice site strength and structural information in its flanking sequences.

  5. RPI-Pred: predicting ncRNA-protein interaction using sequence and structural information

    PubMed Central

    Suresh, V.; Liu, Liang; Adjeroh, Donald; Zhou, Xiaobo

    2015-01-01

    RNA-protein complexes are essential in mediating important fundamental cellular processes, such as transport and localization. In particular, ncRNA-protein interactions play an important role in post-transcriptional gene regulation like mRNA localization, mRNA stabilization, poly-adenylation, splicing and translation. The experimental methods to solve RNA-protein interaction prediction problem remain expensive and time-consuming. Here, we present the RPI-Pred (RNA-protein interaction predictor), a new support-vector machine-based method, to predict protein-RNA interaction pairs, based on both the sequences and structures. The results show that RPI-Pred can correctly predict RNA-protein interaction pairs with ∼94% prediction accuracy when using sequence and experimentally determined protein and RNA structures, and with ∼83% when using sequences and predicted protein and RNA structures. Further, our proposed method RPI-Pred was superior to other existing ones by predicting more experimentally validated ncRNA-protein interaction pairs from different organisms. Motivated by the improved performance of RPI-Pred, we further applied our method for reliable construction of ncRNA-protein interaction networks. The RPI-Pred is publicly available at: http://ctsb.is.wfubmc.edu/projects/rpi-pred. PMID:25609700

  6. Genotypic tropism prediction from paired cell and plasma using single and replicate sequences.

    PubMed

    Coelho, Luana Portes Ozório; Ferreira, João Leandro de Paula; Cabral, Gabriela Bastos; Guimarães, Paula Morena de Souza; Brigido, Luis Fernando de Macedo

    2014-07-01

    HIV-1 tropism determination is necessary prior to CCR5 antagonist use as antiretroviral therapy. Genotypic prediction of coreceptor use is a practical alternative to phenotypic tests. Cell DNA and plasma RNA-based prediction has shown discordance in many studies. We evaluate paired cell and plasma either as single or replicate V3 sequences to assess prediction comparability. The HIV-1 partial env region was sequenced and tropism was predicted using geno2pheno and position-specific scoring matrices (PSSM). Nucleotide ambiguities at V3 were quantified and genetic distance (Protdist) was determined using BioEdit. Wilcoxon signed-rank test, t tests, and Spearman correlation were performed with Prism GraphPad5.0. Results are expressed as medians, with a level of significance of p<0.05, two tailed. Single (n=28) or replicate (n=26) paired cell/plasma sequences were obtained from 54 patients. Although the clonalfalse-positive rate (FPR) value from both compartments strongly correlated (r=0.86 p<0.0001), discordance in tropism prediction was observed in both singles and replicates using geno2pheno or PSSM. Applying clonalFPR(10%) 46% (25/54) were X4 tropic, with a plasma/cell discordance of 11% in singles and 23% in replicates. Genetic distance (p<0.0001) and clonalFPR value dispersion (p=0.003) were significantly higher among replicate sequences from cells. Discordance of viral tropism prediction is not uncommon and the use of replicates does not decrease its occurrence, but improves X4 sensitivity. Sequences from provirus had greater genetic distance and dispersion of clonalFPR values. This may suggest that DNA replicate assays may better represent the diversity of HIV-1 variants, but the clinical significance of these findings needs further evaluation.

  7. Amino acid sequence homology between Piv, an essential protein in site-specific DNA inversion in Moraxella lacunata, and transposases of an unusual family of insertion elements.

    PubMed Central

    Lenich, A G; Glasgow, A C

    1994-01-01

    Deletion analysis of the subcloned DNA inversion region of Moraxella lacunata indicates that Piv is the only M. lacunata-encoded factor required for site-specific inversion of the tfpQ/tfpI pilin segment. The predicted amino acid sequence of Piv shows significant homology solely with the transposases/integrases of a family of insertion sequence elements, suggesting that Piv is a novel site-specific recombinase. Images PMID:8021196

  8. Draft Genome Sequences of Two Novel Acidimicrobiaceae Members from an Acid Mine Drainage Biofilm Metagenome

    PubMed Central

    Pinto, Ameet J.; Sharp, Jonathan O.; Yoder, Michael J.

    2016-01-01

    Bacteria belonging to the family Acidimicrobiaceae are frequently encountered in heavy metal-contaminated acidic environments. However, their phylogenetic and metabolic diversity is poorly resolved. We present draft genome sequences of two novel and phylogenetically distinct Acidimicrobiaceae members assembled from an acid mine drainage biofilm metagenome. PMID:26769942

  9. Complete Genome Sequence of Streptomyces clavuligerus F613-1, an Industrial Producer of Clavulanic Acid.

    PubMed

    Cao, Guangxiang; Zhong, Chuanqing; Zong, Gongli; Fu, Jiafang; Liu, Zhong; Zhang, Guimin; Qin, Ronghuo

    2016-01-01

    Streptomyces clavuligerus strain F613-1 is an industrial strain with high-yield clavulanic acid production. In this study, the complete genome sequence of S. clavuligerus strain F613-1 was determined, including one linear chromosome and one linear plasmid, carrying numerous sets of genes involving in the biosynthesis of clavulanic acid.

  10. Complete Genome Sequence of Streptomyces clavuligerus F613-1, an Industrial Producer of Clavulanic Acid.

    PubMed

    Cao, Guangxiang; Zhong, Chuanqing; Zong, Gongli; Fu, Jiafang; Liu, Zhong; Zhang, Guimin; Qin, Ronghuo

    2016-01-01

    Streptomyces clavuligerus strain F613-1 is an industrial strain with high-yield clavulanic acid production. In this study, the complete genome sequence of S. clavuligerus strain F613-1 was determined, including one linear chromosome and one linear plasmid, carrying numerous sets of genes involving in the biosynthesis of clavulanic acid. PMID:27660792

  11. Complete Genome Sequence of Streptomyces clavuligerus F613-1, an Industrial Producer of Clavulanic Acid

    PubMed Central

    Zhong, Chuanqing; Zong, Gongli; Fu, Jiafang; Liu, Zhong; Zhang, Guimin; Qin, Ronghuo

    2016-01-01

    Streptomyces clavuligerus strain F613-1 is an industrial strain with high-yield clavulanic acid production. In this study, the complete genome sequence of S. clavuligerus strain F613-1 was determined, including one linear chromosome and one linear plasmid, carrying numerous sets of genes involving in the biosynthesis of clavulanic acid. PMID:27660792

  12. Parvalbumins from coelacanth muscle. III. Amino acid sequence of the major component.

    PubMed

    Jauregui-Adell, J; Pechere, J F

    1978-09-26

    The primary structure of the major parvalbumin (pI = 4.52) from coelacanth muscle (Latimeria chalumnae) has been determined. Sequence analysis of the tryptic peptides, in some cases obtained with beta-trypsin, accounts for the total amino acid content of the protein. Chymotryptic peptides provide appropriate sequence overlaps, to complete the localization of the tryptic peptides. Examination of the amino acid sequence of this protein shows the typical structure of a beta-parvalbumin. Its position in the dendrogram of related calcium-binding proteins corresponds to that usually accepted for crossopterygians.

  13. Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction.

    PubMed

    Huang, Ying; Chen, Shi-Yi; Deng, Feilong

    2016-01-01

    In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools. PMID:27536341

  14. Linguistic and Spatial Skills Predict Early Arithmetic Development via Counting Sequence Knowledge

    ERIC Educational Resources Information Center

    Zhang, Xiao; Koponen, Tuire; Räsänen, Pekka; Aunola, Kaisa; Lerkkanen, Marja-Kristiina; Nurmi, Jari-Erik

    2014-01-01

    Utilizing a longitudinal sample of Finnish children (ages 6-10), two studies examined how early linguistic (spoken vs. written) and spatial skills predict later development of arithmetic, and whether counting sequence knowledge mediates these associations. In Study 1 (N = 1,880), letter knowledge and spatial visualization, measured in…

  15. Applying a Predict-Observe-Explain Sequence in Teaching of Buoyant Force

    ERIC Educational Resources Information Center

    Radovanovic, Jelena; Slisko, Josip

    2013-01-01

    An active learning sequence based on the predict-observe-explain teaching strategy is applied to a lesson on buoyant force. The results obtained clearly justify the use of this teaching method and suggest devising a series of activities to enable more effective removal of students' commonly held alternative conceptions regarding floating and…

  16. Predicting the accuracy of multiple sequence alignment algorithms by using computational intelligent techniques

    PubMed Central

    Ortuño, Francisco M.; Valenzuela, Olga; Pomares, Hector; Rojas, Fernando; Florido, Javier P.; Urquiza, Jose M.

    2013-01-01

    Multiple sequence alignments (MSAs) have become one of the most studied approaches in bioinformatics to perform other outstanding tasks such as structure prediction, biological function analysis or next-generation sequencing. However, current MSA algorithms do not always provide consistent solutions, since alignments become increasingly difficult when dealing with low similarity sequences. As widely known, these algorithms directly depend on specific features of the sequences, causing relevant influence on the alignment accuracy. Many MSA tools have been recently designed but it is not possible to know in advance which one is the most suitable for a particular set of sequences. In this work, we analyze some of the most used algorithms presented in the bibliography and their dependences on several features. A novel intelligent algorithm based on least square support vector machine is then developed to predict how accurate each alignment could be, depending on its analyzed features. This algorithm is performed with a dataset of 2180 MSAs. The proposed system first estimates the accuracy of possible alignments. The most promising methodologies are then selected in order to align each set of sequences. Since only one selected algorithm is run, the computational time is not excessively increased. PMID:23066102

  17. Amino acid sequence of anionic peroxidase from the windmill palm tree Trachycarpus fortunei.

    PubMed

    Baker, Margaret R; Zhao, Hongwei; Sakharov, Ivan Yu; Li, Qing X

    2014-12-10

    Palm peroxidases are extremely stable and have uncommon substrate specificity. This study was designed to fill in the knowledge gap about the structures of a peroxidase from the windmill palm tree Trachycarpus fortunei. The complete amino acid sequence and partial glycosylation were determined by MALDI-top-down sequencing of native windmill palm tree peroxidase (WPTP), MALDI-TOF/TOF MS/MS of WPTP tryptic peptides, and cDNA sequencing. The propeptide of WPTP contained N- and C-terminal signal sequences which contained 21 and 17 amino acid residues, respectively. Mature WPTP was 306 amino acids in length, and its carbohydrate content ranged from 21% to 29%. Comparison to closely related royal palm tree peroxidase revealed structural features that may explain differences in their substrate specificity. The results can be used to guide engineering of WPTP and its novel applications.

  18. Prediction of high-risk types of human papillomaviruses using statistical model of protein "sequence space".

    PubMed

    Wang, Cong; Hai, Yabing; Liu, Xiaoqing; Liu, Nanfang; Yao, Yuhua; He, Pingan; Dai, Qi

    2015-01-01

    Discrimination of high-risk types of human papillomaviruses plays an important role in the diagnosis and remedy of cervical cancer. Recently, several computational methods have been proposed based on protein sequence-based and structure-based information, but the information of their related proteins has not been used until now. In this paper, we proposed using protein "sequence space" to explore this information and used it to predict high-risk types of HPVs. The proposed method was tested on 68 samples with known HPV types and 4 samples without HPV types and further compared with the available approaches. The results show that the proposed method achieved the best performance among all the evaluated methods with accuracy 95.59% and F1-score 90.91%, which indicates that protein "sequence space" could potentially be used to improve prediction of high-risk types of HPVs.

  19. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations.

    PubMed

    Abascal, Federico; Zardoya, Rafael; Telford, Maximilian J

    2010-07-01

    We present TranslatorX, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. Many comparisons between biological sequences (nucleic acids and proteins) involve the construction of multiple alignments. Alignments represent a statement regarding the homology between individual nucleotides or amino acids within homologous genes. As protein-coding DNA sequences evolve as triplets of nucleotides (codons) and it is known that sequence similarity degrades more rapidly at the DNA than at the amino acid level, alignments are generally more accurate when based on amino acids than on their corresponding nucleotides. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. The TranslatorX server is freely available at http://translatorx.co.uk.

  20. Complete amino acid sequence and structure characterization of the taste-modifying protein, miraculin.

    PubMed

    Theerasilp, S; Hitotsuya, H; Nakajo, S; Nakaya, K; Nakamura, Y; Kurihara, Y

    1989-04-25

    The taste-modifying protein, miraculin, has the unusual property of modifying sour taste into sweet taste. The complete amino acid sequence of miraculin purified from miracle fruits by a newly developed method (Theerasilp, S., and Kurihara, Y. (1988) J. Biol. Chem. 263, 11536-11539) was determined by an automatic Edman degradation method. Miraculin was a single polypeptide with 191 amino acid residues. The calculated molecular weight based on the amino acid sequence and the carbohydrate content (13.9%) was 24,600. Asn-42 and Asn-186 were linked N-glycosidically to carbohydrate chains. High homology was found between the amino acid sequences of miraculin and soybean trypsin inhibitor. PMID:2708331

  1. Homology of amino acid sequences of rat liver cathepsins B and H with that of papain.

    PubMed Central

    Takio, K; Towatari, T; Katunuma, N; Teller, D C; Titani, K

    1983-01-01

    The amino acid sequences of rat liver lysosomal thiol endopeptidases, cathepsins B and H, are presented and compared with that of the plant thiol protease papain. The 252-residue sequence of cathepsin B and the 220-residue sequence of cathepsin H were determined largely by automated Edman degradation of their intact polypeptide chains and of the two chains of each enzyme generated by limited proteolysis. Subfragments of the chains were produced by enzymatic digestion and by chemical cleavage of methionyl and tryptophanyl bonds. Comparison of the amino acid sequences of cathepsins B and H with each other and with that of papain demonstrates a striking homology among their primary structures. Sequence identity is extremely high in regions which, according to the three-dimensional structure of papain, constitute the catalytic site. The results not only reveal the first structural features of mammalian thiol endopeptidases but also provide insight into the evolutionary relationships among plant and mammalian thiol proteases. PMID:6574504

  2. Comprehensive red blood cell and platelet antigen prediction from whole genome sequencing: proof of principle

    PubMed Central

    Westhoff, Connie M.; Uy, Jon Michael; Aguad, Maria; Smeland‐Wagman, Robin; Kaufman, Richard M.; Rehm, Heidi L.; Green, Robert C.; Silberstein, Leslie E.

    2015-01-01

    BACKGROUND There are 346 serologically defined red blood cell (RBC) antigens and 33 serologically defined platelet (PLT) antigens, most of which have known genetic changes in 45 RBC or six PLT genes that correlate with antigen expression. Polymorphic sites associated with antigen expression in the primary literature and reference databases are annotated according to nucleotide positions in cDNA. This makes antigen prediction from next‐generation sequencing data challenging, since it uses genomic coordinates. STUDY DESIGN AND METHODS The conventional cDNA reference sequences for all known RBC and PLT genes that correlate with antigen expression were aligned to the human reference genome. The alignments allowed conversion of conventional cDNA nucleotide positions to the corresponding genomic coordinates. RBC and PLT antigen prediction was then performed using the human reference genome and whole genome sequencing (WGS) data with serologic confirmation. RESULTS Some major differences and alignment issues were found when attempting to convert the conventional cDNA to human reference genome sequences for the following genes: ABO, A4GALT, RHD, RHCE, FUT3, ACKR1 (previously DARC), ACHE, FUT2, CR1, GCNT2, and RHAG. However, it was possible to create usable alignments, which facilitated the prediction of all RBC and PLT antigens with a known molecular basis from WGS data. Traditional serologic typing for 18 RBC antigens were in agreement with the WGS‐based antigen predictions, providing proof of principle for this approach. CONCLUSION Detailed mapping of conventional cDNA annotated RBC and PLT alleles can enable accurate prediction of RBC and PLT antigens from whole genomic sequencing data. PMID:26634332

  3. FrameD: a flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences

    PubMed Central

    Schiex, Thomas; Gouzy, Jérôme; Moisan, Annick; de Oliveira, Yannick

    2003-01-01

    We describe FrameD, a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial GC rich genomes, the gene model used in FrameD also allows to predict genes in the presence of frameshifts and partially undetermined sequences which makes it also very suitable for gene prediction and frameshift correction in unfinished sequences such as EST and EST cluster sequences. Like recent eukaryotic gene prediction programs, FrameD also includes the ability to take into account protein similarity information both in its prediction and its graphical output. Its performances are evaluated on different bacterial genomes. The web site (http://genopole.toulouse.inra.fr/bioinfo/FrameD/FD) allows direct prediction, sequence correction and translation and the ability to learn new models for new organisms. PMID:12824407

  4. FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences.

    PubMed

    Schiex, Thomas; Gouzy, Jérôme; Moisan, Annick; de Oliveira, Yannick

    2003-07-01

    We describe FrameD, a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial GC rich genomes, the gene model used in FrameD also allows to predict genes in the presence of frameshifts and partially undetermined sequences which makes it also very suitable for gene prediction and frameshift correction in unfinished sequences such as EST and EST cluster sequences. Like recent eukaryotic gene prediction programs, FrameD also includes the ability to take into account protein similarity information both in its prediction and its graphical output. Its performances are evaluated on different bacterial genomes. The web site (http://genopole.toulouse.inra.fr/bioinfo/FrameD/FD) allows direct prediction, sequence correction and translation and the ability to learn new models for new organisms. PMID:12824407

  5. Learning to Predict miRNA-mRNA Interactions from AGO CLIP Sequencing and CLASH Data

    PubMed Central

    Lu, Yuheng; Leslie, Christina S.

    2016-01-01

    Recent technologies like AGO CLIP sequencing and CLASH enable direct transcriptome-wide identification of AGO binding and miRNA target sites, but the most widely used miRNA target prediction algorithms do not exploit these data. Here we use discriminative learning on AGO CLIP and CLASH interactions to train a novel miRNA target prediction model. Our method combines two SVM classifiers, one to predict miRNA-mRNA duplexes and a second to learn a binding model of AGO’s local UTR sequence preferences and positional bias in 3’UTR isoforms. The duplex SVM model enables the prediction of non-canonical target sites and more accurately resolves miRNA interactions from AGO CLIP data than previous methods. The binding model is trained using a multi-task strategy to learn context-specific and common AGO sequence preferences. The duplex and common AGO binding models together outperform existing miRNA target prediction algorithms on held-out binding data. Open source code is available at https://bitbucket.org/leslielab/chimiric. PMID:27438777

  6. NEP: web server for epitope prediction based on antibody neutralization of viral strains with diverse sequences.

    PubMed

    Chuang, Gwo-Yu; Liou, David; Kwong, Peter D; Georgiev, Ivelin S

    2014-07-01

    Delineation of the antigenic site, or epitope, recognized by an antibody can provide clues about functional vulnerabilities and resistance mechanisms, and can therefore guide antibody optimization and epitope-based vaccine design. Previously, we developed an algorithm for antibody-epitope prediction based on antibody neutralization of viral strains with diverse sequences and validated the algorithm on a set of broadly neutralizing HIV-1 antibodies. Here we describe the implementation of this algorithm, NEP (Neutralization-based Epitope Prediction), as a web-based server. The users must supply as input: (i) an alignment of antigen sequences of diverse viral strains; (ii) neutralization data for the antibody of interest against the same set of antigen sequences; and (iii) (optional) a structure of the unbound antigen, for enhanced prediction accuracy. The prediction results can be downloaded or viewed interactively on the antigen structure (if supplied) from the web browser using a JSmol applet. Since neutralization experiments are typically performed as one of the first steps in the characterization of an antibody to determine its breadth and potency, the NEP server can be used to predict antibody-epitope information at no additional experimental costs. NEP can be accessed on the internet at http://exon.niaid.nih.gov/nep. PMID:24782517

  7. NEP: web server for epitope prediction based on antibody neutralization of viral strains with diverse sequences.

    PubMed

    Chuang, Gwo-Yu; Liou, David; Kwong, Peter D; Georgiev, Ivelin S

    2014-07-01

    Delineation of the antigenic site, or epitope, recognized by an antibody can provide clues about functional vulnerabilities and resistance mechanisms, and can therefore guide antibody optimization and epitope-based vaccine design. Previously, we developed an algorithm for antibody-epitope prediction based on antibody neutralization of viral strains with diverse sequences and validated the algorithm on a set of broadly neutralizing HIV-1 antibodies. Here we describe the implementation of this algorithm, NEP (Neutralization-based Epitope Prediction), as a web-based server. The users must supply as input: (i) an alignment of antigen sequences of diverse viral strains; (ii) neutralization data for the antibody of interest against the same set of antigen sequences; and (iii) (optional) a structure of the unbound antigen, for enhanced prediction accuracy. The prediction results can be downloaded or viewed interactively on the antigen structure (if supplied) from the web browser using a JSmol applet. Since neutralization experiments are typically performed as one of the first steps in the characterization of an antibody to determine its breadth and potency, the NEP server can be used to predict antibody-epitope information at no additional experimental costs. NEP can be accessed on the internet at http://exon.niaid.nih.gov/nep.

  8. Learning to Predict miRNA-mRNA Interactions from AGO CLIP Sequencing and CLASH Data.

    PubMed

    Lu, Yuheng; Leslie, Christina S

    2016-07-01

    Recent technologies like AGO CLIP sequencing and CLASH enable direct transcriptome-wide identification of AGO binding and miRNA target sites, but the most widely used miRNA target prediction algorithms do not exploit these data. Here we use discriminative learning on AGO CLIP and CLASH interactions to train a novel miRNA target prediction model. Our method combines two SVM classifiers, one to predict miRNA-mRNA duplexes and a second to learn a binding model of AGO's local UTR sequence preferences and positional bias in 3'UTR isoforms. The duplex SVM model enables the prediction of non-canonical target sites and more accurately resolves miRNA interactions from AGO CLIP data than previous methods. The binding model is trained using a multi-task strategy to learn context-specific and common AGO sequence preferences. The duplex and common AGO binding models together outperform existing miRNA target prediction algorithms on held-out binding data. Open source code is available at https://bitbucket.org/leslielab/chimiric. PMID:27438777

  9. Plasma long-chain free fatty acids predict mammalian longevity

    PubMed Central

    Jové, Mariona; Naudí, Alba; Aledo, Juan Carlos; Cabré, Rosanna; Ayala, Victoria; Portero-Otin, Manuel; Barja, Gustavo; Pamplona, Reinald

    2013-01-01

    Membrane lipid composition is an important correlate of the rate of aging of animals and, therefore, the determination of their longevity. In the present work, the use of high-throughput technologies allowed us to determine the plasma lipidomic profile of 11 mammalian species ranging in maximum longevity from 3.5 to 120 years. The non-targeted approach revealed a specie-specific lipidomic profile that accurately predicts the animal longevity. The regression analysis between lipid species and longevity demonstrated that the longer the longevity of a species, the lower is its plasma long-chain free fatty acid (LC-FFA) concentrations, peroxidizability index, and lipid peroxidation-derived products content. The inverse association between longevity and LC-FFA persisted after correction for body mass and phylogenetic interdependence. These results indicate that the lipidomic signature is an optimized feature associated with animal longevity, emerging LC-FFA as a potential biomarker of longevity. PMID:24284984

  10. Plasma long-chain free fatty acids predict mammalian longevity.

    PubMed

    Jové, Mariona; Naudí, Alba; Aledo, Juan Carlos; Cabré, Rosanna; Ayala, Victoria; Portero-Otin, Manuel; Barja, Gustavo; Pamplona, Reinald

    2013-11-28

    Membrane lipid composition is an important correlate of the rate of aging of animals and, therefore, the determination of their longevity. In the present work, the use of high-throughput technologies allowed us to determine the plasma lipidomic profile of 11 mammalian species ranging in maximum longevity from 3.5 to 120 years. The non-targeted approach revealed a specie-specific lipidomic profile that accurately predicts the animal longevity. The regression analysis between lipid species and longevity demonstrated that the longer the longevity of a species, the lower is its plasma long-chain free fatty acid (LC-FFA) concentrations, peroxidizability index, and lipid peroxidation-derived products content. The inverse association between longevity and LC-FFA persisted after correction for body mass and phylogenetic interdependence. These results indicate that the lipidomic signature is an optimized feature associated with animal longevity, emerging LC-FFA as a potential biomarker of longevity.

  11. Complete cDNA and derived amino acid sequence of human factor V.

    PubMed Central

    Jenny, R J; Pittman, D D; Toole, J J; Kriz, R W; Aldape, R A; Hewick, R M; Kaufman, R J; Mann, K G

    1987-01-01

    cDNA clones encoding human factor V have been isolated from an oligo(dT)-primed human fetal liver cDNA library prepared with vector Charon 21A. The cDNA sequence of factor V from three overlapping clones includes a 6672-base-pair (bp) coding region, a 90-bp 5' untranslated region, and a 163-bp 3' untranslated region within which is a poly(A) tail. The deduced amino acid sequence consists of 2224 amino acids inclusive of a 28-amino acid leader peptide. Direct comparison with human factor VIII reveals considerable homology between proteins in amino acid sequence and domain structure: a triplicated A domain and duplicated C domain show approximately equal to 40% identity with the corresponding domains in factor VIII. As in factor VIII, the A domains of factor V share approximately 40% amino acid-sequence homology with the three highly conserved domains in ceruloplasmin. The B domain of factor V contains 35 tandem and approximately 9 additional semiconserved repeats of nine amino acids of the form Asp-Leu-Ser-Gln-Thr-Thr/Asn-Leu-Ser-Pro and 2 additional semiconserved repeats of 17 amino acids. Factor V contains 37 potential N-linked glycosylation sites, 25 of which are in the B domain, and a total of 19 cysteine residues. Images PMID:3110773

  12. Complete cDNA and derived amino acid sequence of human factor V

    SciTech Connect

    Jenny, R.J.; Pittman, D.D.; Toole, J.J.; Kriz, R.W.; Aldape, R.A.; Hewick, R.M.; Kaufman, R.J.; Mann, K.G.

    1987-07-01

    cDNA clones encoding human factor V have been isolated from an oligo(dT)-primed human fetal liver cDNA library prepared with vector Charon 21A. The cDNA sequence of factor V from three overlapping clones includes a 6672-base-pair (bp) coding region, a 90-bp 5' untranslated region, and a 163-bp 3' untranslated region within which is a poly(A)tail. The deduced amino acid sequence consists of 2224 amino acids inclusive of a 28-amino acid leader peptide. Direct comparison with human factor VIII reveals considerable homology between proteins in amino acid sequence and domain structure: a triplicated A domain and duplicated C domain show approx. 40% identity with the corresponding domains in factor VIII. As in factor VIII, the A domains of factor V share approx. 40% amino acid-sequence homology with the three highly conserved domains in ceruloplasmin. The B domain of factor V contains 35 tandem and approx. 9 additional semiconserved repeats of nine amino acids of the form Asp-Leu-Ser-Gln-Thr-Thr/Asn-Leu-Ser-Pro and 2 additional semiconserved repeats of 17 amino acids. Factor V contains 37 potential N-linked glycosylation sites, 25 of which are in the B domain, and a total of 19 cysteine residues.

  13. DFLpred: High-throughput prediction of disordered flexible linker regions in protein sequences

    PubMed Central

    Meng, Fanchi; Kurgan, Lukasz

    2016-01-01

    Motivation: Disordered flexible linkers (DFLs) are disordered regions that serve as flexible linkers/spacers in multi-domain proteins or between structured constituents in domains. They are different from flexible linkers/residues because they are disordered and longer. Availability of experimentally annotated DFLs provides an opportunity to build high-throughput computational predictors of these regions from protein sequences. To date, there are no computational methods that directly predict DFLs and they can be found only indirectly by filtering predicted flexible residues with predictions of disorder. Results: We conceptualized, developed and empirically assessed a first-of-its-kind sequence-based predictor of DFLs, DFLpred. This method outputs propensity to form DFLs for each residue in the input sequence. DFLpred uses a small set of empirically selected features that quantify propensities to form certain secondary structures, disordered regions and structured regions, which are processed by a fast linear model. Our high-throughput predictor can be used on the whole-proteome scale; it needs <1 h to predict entire proteome on a single CPU. When assessed on an independent test dataset with low sequence-identity proteins, it secures area under the receiver operating characteristic curve equal 0.715 and outperforms existing alternatives that include methods for the prediction of flexible linkers, flexible residues, intrinsically disordered residues and various combinations of these methods. Prediction on the complete human proteome reveals that about 10% of proteins have a large content of over 30% DFL residues. We also estimate that about 6000 DFL regions are long with ≥30 consecutive residues. Availability and implementation: http://biomine.ece.ualberta.ca/DFLpred/. Contact: lkurgan@vcu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307636

  14. Amino acid sequence and domain structure of entactin. Homology with epidermal growth factor precursor and low density lipoprotein receptor

    PubMed Central

    1988-01-01

    Entactin (nidogen), a 150-kD sulfated glycoprotein, is a major component of basement membranes and forms a highly stable noncovalent complex with laminin. The complete amino acid sequence of mouse entactin has been derived from sequencing of cDNA clones. The 5.9-kb cDNA contains a 3,735-bp open reading frame followed by a 3'- untranslated region of 2.2 kb. The open reading frame encodes a 1,245- residue polypeptide with an unglycosylated Mr of 136,500, a 28-residue signal peptide, two Asn-linked glycosylation sites, and two potential Ca2+-binding sites. Analysis of the deduced amino acid sequence predicts that the molecule consists of two globular domains of 70 and 36 kD separated by a cysteine-rich domain of 28 kD. The COOH-terminal globular domain shows homology to the EGF precursor and the low density lipoprotein receptor. Entactin contains six EGF-type cysteine-rich repeat units and one copy of a cysteine-repeat motif found in thyroglobulin. The Arg-Gly-Asp cell recognition sequence is present in one of the EGF-type repeats, and a synthetic peptide from the putative cell-binding site of entactin was found to promote the attachment of mouse mammary tumor cells. PMID:3264556

  15. Multiscale approach to the predictability of earthquakes and of synthetic SOC sequences

    NASA Astrophysics Data System (ADS)

    Peresan, A.; Panza, G. F.

    2003-04-01

    The power-law scaling expressed by the Gutenberg-Richter (GR) law is the main argument in favour of the Self-Organised Criticality (SOC) of seismic phenomena. Nevertheless the limits of validity of the GR law and the phenomenology reproduced by the SOC models, as well as their consequences for earthquake predictability, still remain quite undefined. According to the Multiscale Seismicity (MS) model, the GR law describes adequately only the ensemble of earthquakes that are geometrically small with respect to the dimensions of the analysed region. The MS model and its implications for intermediate-term medium-range earthquake predictions are thus examined, considering both the seismicity observed in the Italian territory and the synthetic sequences of events generated by a SOC model. The predictability of the large events is evaluated by means of the algorithms CN and M8, based on a quantitative analysis of the seismic flow within a delimited region, which allow for the prediction of the earthquakes with magnitude greater than a fixed threshold Mo. Considering the application of CN and M8 to the Italian territory, we show that, in agreement with the MS model, these algorithms make use of the information carried by small and moderate earthquakes, following the GR law, to predict the strong earthquakes, which are infrequent and often arbitrarily considered characteristic events inside the regions delimited for prediction purposes. Similarly, the application of the algorithm CN for the prediction of the largest events in the synthetic SOC sequences, indicates that a certain predictability can be attained, when the MS model is taken into account. These results suggest that the similarity between the seismic flow and the SOC sequences goes beyond the average features of scale-invariance. In fact, while the GR law describes an average feature of seismicity, CN algorithm is checking for the deviations from such trend, which may characterise the sequence of events before the

  16. Amino acid sequence heterogeneity of the chromosomal encoded Borrelia burgdorferi sensu lato major antigen P100.

    PubMed

    Fellinger, W; Farencena, A; Redl, B; Sambri, V; Cevenini, R; Stöffler, G

    1995-04-01

    The entire nucleotide sequence of the chromosomal encoded major antigen p100 of the European Borrelia garinii isolate B29 was determined and the deduced amino acid sequence was compared to the homologous antigen p83 of the North American Borrelia burgdorferi sensu stricto strain B31 and the p100 of the European Borrelia afzelii (group VS461) strain PKo. p100 of strain B29 shows 87% amino acid sequence identity to strain B31 and 79.2% to strain PKo, p100 of strain B31 and PKo shows 62.5% identity to each other. In addition, partial nucleotide sequences of the most heterogeneous region of the p100 gene of two other Borrelia garinii isolates (PBi and VS286) have been determined and the deduced amino acid sequences were compared with all p100 of Borrelia garinii published so far. We found an amino acid sequence identity between 88.6 and 100% within the same genospecies. The N-terminal part of the p100 proteins is highly conserved whereas a striking heterogeneous region within the C-terminal part of the proteins was observed.

  17. Transmembrane helix prediction using amino acid property features and latent semantic analysis

    PubMed Central

    Ganapathiraju, Madhavi; Balakrishnan, N; Reddy, Raj; Klein-Seetharaman, Judith

    2008-01-01

    Background Prediction of transmembrane (TM) helices by statistical methods suffers from lack of sufficient training data. Current best methods use hundreds or even thousands of free parameters in their models which are tuned to fit the little data available for training. Further, they are often restricted to the generally accepted topology "cytoplasmic-transmembrane-extracellular" and cannot adapt to membrane proteins that do not conform to this topology. Recent crystal structures of channel proteins have revealed novel architectures showing that the above topology may not be as universal as previously believed. Thus, there is a need for methods that can better predict TM helices even in novel topologies and families. Results Here, we describe a new method "TMpro" to predict TM helices with high accuracy. To avoid overfitting to existing topologies, we have collapsed cytoplasmic and extracellular labels to a single state, non-TM. TMpro is a binary classifier which predicts TM or non-TM using multiple amino acid properties (charge, polarity, aromaticity, size and electronic properties) as features. The features are extracted from sequence information by applying the framework used for latent semantic analysis of text documents and are input to neural networks that learn the distinction between TM and non-TM segments. The model uses only 25 free parameters. In benchmark analysis TMpro achieves 95% segment F-score corresponding to 50% reduction in error rate compared to the best methods not requiring an evolutionary profile of a protein to be known. Performance is also improved when applied to more recent and larger high resolution datasets PDBTM and MPtopo. TMpro predictions in membrane proteins with unusual or disputed TM structure (K+ channel, aquaporin and HIV envelope glycoprotein) are discussed. Conclusion TMpro uses very few free parameters in modeling TM segments as opposed to the very large number of free parameters used in state-of-the-art membrane

  18. PAIRpred: partner-specific prediction of interacting residues from sequence and structure.

    PubMed

    Minhas, Fayyaz ul Amir Afsar; Geiss, Brian J; Ben-Hur, Asa

    2014-07-01

    We present a novel partner-specific protein-protein interaction site prediction method called PAIRpred. Unlike most existing machine learning binding site prediction methods, PAIRpred uses information from both proteins in a protein complex to predict pairs of interacting residues from the two proteins. PAIRpred captures sequence and structure information about residue pairs through pairwise kernels that are used for training a support vector machine classifier. As a result, PAIRpred presents a more detailed model of protein binding, and offers state of the art accuracy in predicting binding sites at the protein level as well as inter-protein residue contacts at the complex level. We demonstrate PAIRpred's performance on Docking Benchmark 4.0 and recent CAPRI targets. We present a detailed performance analysis outlining the contribution of different sequence and structure features, together with a comparison to a variety of existing interface prediction techniques. We have also studied the impact of binding-associated conformational change on prediction accuracy and found PAIRpred to be more robust to such structural changes than existing schemes. As an illustration of the potential applications of PAIRpred, we provide a case study in which PAIRpred is used to analyze the nature and specificity of the interface in the interaction of human ISG15 protein with NS1 protein from influenza A virus. Python code for PAIRpred is available at http://combi.cs.colostate.edu/supplements/pairpred/. PMID:24243399

  19. Discriminative Prediction of A-To-I RNA Editing Events from DNA Sequence

    PubMed Central

    Sun, Jiangming; Singh, Pratibha; Bagge, Annika; Valtat, Bérengère; Vikman, Petter; Spégel, Peter; Mulder, Hindrik

    2016-01-01

    RNA editing is a post-transcriptional alteration of RNA sequences that, via insertions, deletions or base substitutions, can affect protein structure as well as RNA and protein expression. Recently, it has been suggested that RNA editing may be more frequent than previously thought. A great impediment, however, to a deeper understanding of this process is the paramount sequencing effort that needs to be undertaken to identify RNA editing events. Here, we describe an in silico approach, based on machine learning, that ameliorates this problem. Using 41 nucleotide long DNA sequences, we show that novel A-to-I RNA editing events can be predicted from known A-to-I RNA editing events intra- and interspecies. The validity of the proposed method was verified in an independent experimental dataset. Using our approach, 203 202 putative A-to-I RNA editing events were predicted in the whole human genome. Out of these, 9% were previously reported. The remaining sites require further validation, e.g., by targeted deep sequencing. In conclusion, the approach described here is a useful tool to identify potential A-to-I RNA editing events without the requirement of extensive RNA sequencing. PMID:27764195

  20. PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach.

    PubMed

    Chatterjee, Piyali; Basu, Subhadip; Zubek, Julian; Kundu, Mahantapas; Nasipuri, Mita; Plewczynski, Dariusz

    2016-04-01

    The prediction of domain/linker residues in protein sequences is a crucial task in the functional classification of proteins, homology-based protein structure prediction, and high-throughput structural genomics. In this work, a novel consensus-based machine-learning technique was applied for residue-level prediction of the domain/linker annotations in protein sequences using ordered/disordered regions along protein chains and a set of physicochemical properties. Six different classifiers-decision tree, Gaussian naïve Bayes, linear discriminant analysis, support vector machine, random forest, and multilayer perceptron-were exhaustively explored for the residue-level prediction of domain/linker regions. The protein sequences from the curated CATH database were used for training and cross-validation experiments. Test results obtained by applying the developed PDP-CON tool to the mutually exclusive, independent proteins of the CASP-8, CASP-9, and CASP-10 databases are reported. An n-star quality consensus approach was used to combine the results yielded by different classifiers. The average PDP-CON accuracy and F-measure values for the CASP targets were found to be 0.86 and 0.91, respectively. The dataset, source code, and all supplementary materials for this work are available at https://cmaterju.org/cmaterbioinfo/ for noncommercial use.

  1. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1997-01-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided.

  2. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1997-04-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided. 7 figs.

  3. Amino acid sequence and some properties of phytolacain G, a cysteine protease from growing fruit of pokeweed, Phytolacca americana.

    PubMed

    Uchikoba, T; Arima, K; Yonezawa, H; Shimada, M; Kaneda, M

    2000-10-18

    A protease, phytolacain G, has been found to appear on CM-Sepharose ion-exchange chromatography of greenish small-size fruits of pokeweed, Phytolacca americana L, from ca. 2 weeks after flowering, and increases during fruit enlargement. Reddish ripe fruit of the pokeweed contained both phytolacain G and R. The molecular mass of phytolacain G was estimated to be 25.5 kDa by SDS-PAGE. Its amino acid sequence was reconstructed by automated sequence analysis of the peptides obtained after cleavage with Achromobacter protease I, chymotrypsin, and cyanogen bromide. The enzyme is composed of 216 amino acid residues, of which it shares 152 identical amino acid residues (70%) with phytolacain R, 126 (58%) with melain G, 108 (50%) with papain, 106 (49%) with actinidain, and 96 (44%) with stem bromelain. The amino acid residues forming the substrate binding S(2) pocket of papain, Tyr67, Pro68, Trp69, Val133, and Phe207, were predicted to be replaced by Trp, Met, His, Ala, and Ser in phytolacain G, respectively. As a consequence of these substitutions, the S(2) pocket is expected to be less hydrophobic in phytolacain G than in papain.

  4. The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity

    PubMed Central

    Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H.; Allen, Andrew S.; Goldstein, David B.

    2015-01-01

    Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance

  5. Ligation with nucleic acid sequence-based amplification.

    PubMed

    Ong, Carmichael; Tai, Warren; Sarma, Aartik; Opal, Steven M; Artenstein, Andrew W; Tripathi, Anubhav

    2012-01-01

    This work presents a novel method for detecting nucleic acid targets using a ligation step along with an isothermal, exponential amplification step. We use an engineered ssDNA with two variable regions on the ends, allowing us to design the probe for optimal reaction kinetics and primer binding. This two-part probe is ligated by T4 DNA Ligase only when both parts bind adjacently to the target. The assay demonstrates that the expected 72-nt RNA product appears only when the synthetic target, T4 ligase, and both probe fragments are present during the ligation step. An extraneous 38-nt RNA product also appears due to linear amplification of unligated probe (P3), but its presence does not cause a false-positive result. In addition, 40 mmol/L KCl in the final amplification mix was found to be optimal. It was also found that increasing P5 in excess of P3 helped with ligation and reduced the extraneous 38-nt RNA product. The assay was also tested with a single nucleotide polymorphism target, changing one base at the ligation site. The assay was able to yield a negative signal despite only a single-base change. Finally, using P3 and P5 with longer binding sites results in increased overall sensitivity of the reaction, showing that increasing ligation efficiency can improve the assay overall. We believe that this method can be used effectively for a number of diagnostic assays. PMID:22449695

  6. The amino acid sequence of mitogenic lectin-B from the roots of pokeweed (Phytolacca americana).

    PubMed

    Yamaguchi, K; Yurino, N; Kino, M; Ishiguro, M; Funatsu, G

    1997-04-01

    The complete amino acid sequence of pokeweed lectin-B (PL-B) has been analyzed by first sequencing seven lysylendopeptidase peptides derived from the reduced and S-pyridylethylated PL-B and then connecting them by analyzing the arginylendopeptidase peptides from the reduced and S-carboxymethylated PL-B. PL-B consists of 295 amino acid residues and two oligosaccharides linked to Asn96 and Asn139, and has a molecular mass of 34,493 Da. PL-B is composed of seven repetitive chitin-binding domains having 48-79% sequence homology with each other. Twelve amino acid residues including eight cysteine residues in these domains are absolutely conserved in all other chitin-binding domains of plant lectins and class I chitinases. Also, it was strongly suggested that the extremely high hemagglutinating and mitogenic activities of PL-B may be ascribed to its seven-domain structure.

  7. ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids.

    PubMed

    Ashkenazy, Haim; Erez, Elana; Martz, Eric; Pupko, Tal; Ben-Tal, Nir

    2010-07-01

    It is informative to detect highly conserved positions in proteins and nucleic acid sequence/structure since they are often indicative of structural and/or functional importance. ConSurf (http://consurf.tau.ac.il) and ConSeq (http://conseq.tau.ac.il) are two well-established web servers for calculating the evolutionary conservation of amino acid positions in proteins using an empirical Bayesian inference, starting from protein structure and sequence, respectively. Here, we present the new version of the ConSurf web server that combines the two independent servers, providing an easier and more intuitive step-by-step interface, while offering the user more flexibility during the process. In addition, the new version of ConSurf calculates the evolutionary rates for nucleic acid sequences. The new version is freely available at: http://consurf.tau.ac.il/.

  8. Site-directed gene mutation at mixed sequence targets by psoralen-conjugated pseudo-complementary peptide nucleic acids

    PubMed Central

    Kim, Ki-Hyun; Nielsen, Peter E.; Glazer, Peter M.

    2007-01-01

    Sequence-specific DNA-binding molecules such as triple helix-forming oligonucleotides (TFOs) provide a means for inducing site-specific mutagenesis and recombination at chromosomal sites in mammalian cells. However, the utility of TFOs is limited by the requirement for homopurine stretches in the target duplex DNA. Here, we report the use of pseudo-complementary peptide nucleic acids (pcPNAs) for intracellular gene targeting at mixed sequence sites. Due to steric hindrance, pcPNAs are unable to form pcPNA–pcPNA duplexes but can bind to complementary DNA sequences by Watson–Crick pairing via double duplex-invasion complex formation. We show that psoralen-conjugated pcPNAs can deliver site-specific photoadducts and mediate targeted gene modification within both episomal and chromosomal DNA in mammalian cells without detectable off-target effects. Most of the induced psoralen-pcPNA mutations were single-base substitutions and deletions at the predicted pcPNA-binding sites. The pcPNA-directed mutagenesis was found to be dependent on PNA concentration and UVA dose and required matched pairs of pcPNAs. Neither of the individual pcPNAs alone had any effect nor did complementary PNA pairs of the same sequence. These results identify pcPNAs as new tools for site-specific gene modification in mammalian cells without purine sequence restriction, thereby providing a general strategy for designing gene targeting molecules. PMID:17977869

  9. Amino acid repeats cause extraordinary coding sequence variation in the social amoeba Dictyostelium discoideum.

    PubMed

    Scala, Clea; Tian, Xiangjun; Mehdiabadi, Natasha J; Smith, Margaret H; Saxer, Gerda; Stephens, Katie; Buzombo, Prince; Strassmann, Joan E; Queller, David C

    2012-01-01

    Protein sequences are normally the most conserved elements of genomes owing to purifying selection to maintain their functions. We document an extraordinary amount of within-species protein sequence variation in the model eukaryote Dictyostelium discoideum stemming from triplet DNA repeats coding for long strings of single amino acids. D. discoideum has a very large number of such strings, many of which are polyglutamine repeats, the same sequence that causes various human neurological disorders in humans, like Huntington's disease. We show here that D. discoideum coding repeat loci are highly variable among individuals, making D. discoideum a candidate for the most variable proteome. The coding repeat loci are not significantly less variable than similar non-coding triplet repeats. This pattern is consistent with these amino-acid repeats being largely non-functional sequences evolving primarily by mutation and drift. PMID:23029418

  10. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  11. Shark myoglobins. II. Isolation, characterization and amino acid sequence of myoglobin from Galeorhinus japonicus.

    PubMed

    Suzuki, T; Suzuki, T; Yata, T

    1985-01-01

    Native oxymyoglobin (MbO2) was isolated from red muscle of G. japonicus by chromatographic separation from metmyoglobin (metMb) on DEAE-cellulose and the amino acid sequence of the major chain was determined with the aid of sequence homology with that of G. australis. It was shown to differ in amino acid sequence from that of G. australis by 10 replacements, to be acetylated at the amino terminus and to contain glutamine at the distal (E7) residue. It was also shown to have a spectrum very similar to that of mammalian MbO2. However, the pH-dependence for the autoxidation of MbO2 was seen to be quite different from that of sperm whale (Physeter catodon) MbO2. Although the sequence homology between sperm whale and G. japonicus myoglobins is about 40%, their hydropathy profiles were very similar, indicating that they have a similar geometry in their globin folding.

  12. PETcofold: predicting conserved interactions and structures of two multiple alignments of RNA sequences

    PubMed Central

    Seemann, Stefan E.; Richter, Andreas S.; Gesell, Tanja; Backofen, Rolf; Gorodkin, Jan

    2011-01-01

    Motivation: Predicting RNA–RNA interactions is essential for determining the function of putative non-coding RNAs. Existing methods for the prediction of interactions are all based on single sequences. Since comparative methods have already been useful in RNA structure determination, we assume that conserved RNA–RNA interactions also imply conserved function. Of these, we further assume that a non-negligible amount of the existing RNA–RNA interactions have also acquired compensating base changes throughout evolution. We implement a method, PETcofold, that can take covariance information in intra-molecular and inter-molecular base pairs into account to predict interactions and secondary structures of two multiple alignments of RNA sequences. Results: PETcofold's ability to predict RNA–RNA interactions was evaluated on a carefully curated dataset of 32 bacterial small RNAs and their targets, which was manually extracted from the literature. For evaluation of both RNA–RNA interaction and structure prediction, we were able to extract only a few high-quality examples: one vertebrate small nucleolar RNA and four bacterial small RNAs. For these we show that the prediction can be improved by our comparative approach. Furthermore, PETcofold was evaluated on controlled data with phylogenetically simulated sequences enriched for covariance patterns at the interaction sites. We observed increased performance with increased amounts of covariance. Availability: The program PETcofold is available as source code and can be downloaded from http://rth.dk/resources/petcofold. Contact: gorodkin@rth.dk; backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21088024

  13. Functional Divergence in the Genus Oenococcus as Predicted by Genome Sequencing of the Newly-Described Species, Oenococcus kitaharae

    PubMed Central

    Borneman, Anthony R.; McCarthy, Jane M.; Chambers, Paul J.; Bartowsky, Eveline J.

    2012-01-01

    Oenococcus kitaharae is only the second member of the genus Oenococcus to be identified and is the closest relative of the industrially important wine bacterium Oenococcus oeni. To provide insight into this new species, the genome of the type strain of O. kitaharae, DSM 17330, was sequenced. Comparison of the sequenced genomes of both species show that the genome of O. kitaharae DSM 17330 contains many genes with predicted functions in cellular defence (bacteriocins, antimicrobials, restriction-modification systems and a CRISPR locus) which are lacking in O. oeni. The two genomes also appear to differentially encode several metabolic pathways associated with amino acid biosynthesis and carbohydrate utilization and which have direct phenotypic consequences. This would indicate that the two species have evolved different survival techniques to suit their particular environmental niches. O. oeni has adapted to survive in the harsh, but predictable, environment of wine that provides very few competitive species. However O. kitaharae appears to have adapted to a growth environment in which biological competition provides a significant selective pressure by accumulating biological defence molecules, such as bacteriocins and restriction-modification systems, throughout its genome. PMID:22235313

  14. Conversion of amino-acid sequence in proteins to classical music: search for auditory patterns

    PubMed Central

    2007-01-01

    We have converted genome-encoded protein sequences into musical notes to reveal auditory patterns without compromising musicality. We derived a reduced range of 13 base notes by pairing similar amino acids and distinguishing them using variations of three-note chords and codon distribution to dictate rhythm. The conversion will help make genomic coding sequences more approachable for the general public, young children, and vision-impaired scientists. PMID:17477882

  15. Prediction of protein structural features from sequence data based on Shannon entropy and Kolmogorov complexity.

    PubMed

    Bywater, Robert Paul

    2015-01-01

    While the genome for a given organism stores the information necessary for the organism to function and flourish it is the proteins that are encoded by the genome that perhaps more than anything else characterize the phenotype for that organism. It is therefore not surprising that one of the many approaches to understanding and predicting protein folding and properties has come from genomics and more specifically from multiple sequence alignments. In this work I explore ways in which data derived from sequence alignment data can be used to investigate in a predictive way three different aspects of protein structure: secondary structures, inter-residue contacts and the dynamics of switching between different states of the protein. In particular the use of Kolmogorov complexity has identified a novel pathway towards achieving these goals.

  16. Linguistic and spatial skills predict early arithmetic development via counting sequence knowledge.

    PubMed

    Zhang, Xiao; Koponen, Tuire; Räsänen, Pekka; Aunola, Kaisa; Lerkkanen, Marja-Kristiina; Nurmi, Jari-Erik

    2014-01-01

    Utilizing a longitudinal sample of Finnish children (ages 6-10), two studies examined how early linguistic (spoken vs. written) and spatial skills predict later development of arithmetic, and whether counting sequence knowledge mediates these associations. In Study 1 (N = 1,880), letter knowledge and spatial visualization, measured in kindergarten, predicted the level of arithmetic in first grade, and later growth through third grade. Study 2 (n = 378) further showed that these associations were mediated by counting sequence knowledge measured in first grade. These studies add to the literature by demonstrating the importance of written language for arithmetic development. The findings are consistent with the hypothesis that linguistic and spatial skills can improve arithmetic development by enhancing children's number-related knowledge. PMID:24148144

  17. Visible sensing of nucleic acid sequences using a genetically encodable unmodified mRNA probe.

    PubMed

    Narita, Atsushi; Ogawa, Kazumasa; Sando, Shinsuke; Aoyama, Yasuhiro

    2006-01-01

    We previously reported a molecular beacon-mRNA (MB-mRNA) strategy for nucleic acid detection/sensing in a cell-free translation system using unmodified RNA as a probe. Here in this presentation, we report that a combination with RNase H activity, which induces an additional process of irreversible cleavage of MB-domain, achieves an improved sequence selectivity (one nucleotide selectivity) and an enhanced sensitivity. This improved system finally enabled visible sensing of target nucleic acid sequence at a single nucleotide resolution under isothermal conditions.

  18. Amino acid and cDNA sequences of lysozyme from Hyalophora cecropia

    PubMed Central

    Engström, Å.; Xanthopoulos, K. G.; Boman, H. G.; Bennich, H.

    1985-01-01

    The amino acid and cDNA sequences of lysozyme from the giant silk moth Hyalophora cecropia have been determined. This enzyme is one of several immune proteins produced by the diapausing pupae after injection of bacteria. Cecropia lysozyme is composed of 120 amino acids, has a mol. wt. of 13.8 kd and shows great similarity with vertebrate lysozymes of the chicken type. The amino acid residues responsible for the catalytic activity and for the binding of substrate are essentially conserved. Three allelic variants of the Cecropia enzyme are identified. A comparison of the chicken and the Cecropia lysozymes shows that there is a 40% identity at both the amino acid and the nucleotide level. Some evolutionary aspects of the sequence data are discussed. PMID:16453632

  19. Depositional sequence analysis and sedimentologic modeling for improved prediction of Pennsylvanian reservoirs

    SciTech Connect

    Watney, W.L.

    1992-01-01

    The objectives of this research are to: (1) assist producers in locating and producing petroleum not currently being produced because of technological problems or the inability to identify details of reservoir compartmentalization (2) to decrease risk in field development, (3) accelerate the retrieval and analysis of baseline geoscience information for initial reservoir description. The interdisciplinary data sought in this research will be used to resolve specific problems in correlation of strata and to establish the mechanisms responsible for the Upper Pennsylvanian stratigraphic architecture in the Midcontinent. The data will better constrain ancillary problems related to the validation of depositional sequence and subsequence correlation, subsidence patterns, sedimentation rates, sea-level changes, and the relationship of sedimentary sequences to basement terrains. The geoscientific information, including data from field studies, surface and near-surface reservoir analogues, and regional data base development, will also be used for development of geologic computer process-based simulation models tailored to specific depositional sequences for use in improving prediction of reservoir characteristics. Accomplishments for this quarter are described for the following tasks: depositional sequence characterization; computer modeling; and reservoir development, prediction, and play potential. 11 figs.

  20. Dynamic Bayesian Networks to predict sequences of organ failures in patients admitted to ICU.

    PubMed

    Sandri, Micol; Berchialla, Paola; Baldi, Ileana; Gregori, Dario; De Blasi, Roberto Alberto

    2014-04-01

    Multi Organ Dysfunction Syndrome (MODS) represents a continuum of physiologic derangements and is the major cause of death in the Intensive Care Unit (ICU). Scoring systems for organ failure have become an integral part of critical care practice and play an important role in ICU-based research by tracking disease progression and facilitating patient stratification based on evaluation of illness severity during ICU stay. In this study a Dynamic Bayesian Network (DBN) was applied to model SOFA severity score changes in 79 adult critically ill patients consecutively admitted to the general ICU of the Sant'Andrea University hospital (Rome, Italy) from September 2010 to March 2011, with the aim to identify the most probable sequences of organs failures in the first week after the ICU admission. Approximately 56% of patients were admitted into the ICU with lung failure and about 27% of patients with heart failure. Results suggest that, given the first organ failure at the ICU admission, a sequence of organ failures can be predicted with a certain degree of probability. Sequences involving heart, lung, hematologic system and liver turned out to be the more likely to occur, with slightly different probabilities depending on the day of the week they occur. DBNs could be successfully applied for modeling temporal systems in critical care domain. Capability to predict sequences of likely organ failures makes DBNs a promising prognostic tool, intended to help physicians in undertaking therapeutic decisions in a patient-tailored approach.

  1. dnaMATE: a consensus melting temperature prediction server for short DNA sequences.

    PubMed

    Panjkovich, Alejandro; Norambuena, Tomás; Melo, Francisco

    2005-07-01

    An accurate and robust large-scale melting temperature prediction server for short DNA sequences is dispatched. The server calculates a consensus melting temperature value using the nearest-neighbor model based on three independent thermodynamic data tables. The consensus method gives an accurate prediction of melting temperature, as it has been recently demonstrated in a benchmark performed using all available experimental data for DNA sequences within the length range of 16-30 nt. This constitutes the first web server that has been implemented to perform a large-scale calculation of melting temperatures in real time (up to 5000 DNA sequences can be submitted in a single run). The expected accuracy of calculations carried out by this server in the range of 50-600 mM monovalent salt concentration is that 89% of the melting temperature predictions will have an error or deviation of <5 degrees C from experimental data. The server can be freely accessed at http://dna.bio.puc.cl/tm.html. The standalone executable versions of this software for LINUX, Macintosh and Windows platforms are also freely available at the same web site. Detailed further information supporting this server is available at the same web site referenced above.

  2. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure.

    PubMed

    Capra, John A; Laskowski, Roman A; Thornton, Janet M; Singh, Mona; Funkhouser, Thomas A

    2009-12-01

    Identifying a protein's functional sites is an important step towards characterizing its molecular function. Numerous structure- and sequence-based methods have been developed for this problem. Here we introduce ConCavity, a small molecule binding site prediction algorithm that integrates evolutionary sequence conservation estimates with structure-based methods for identifying protein surface cavities. In large-scale testing on a diverse set of single- and multi-chain protein structures, we show that ConCavity substantially outperforms existing methods for identifying both 3D ligand binding pockets and individual ligand binding residues. As part of our testing, we perform one of the first direct comparisons of conservation-based and structure-based methods. We find that the two approaches provide largely complementary information, which can be combined to improve upon either approach alone. We also demonstrate that ConCavity has state-of-the-art performance in predicting catalytic sites and drug binding pockets. Overall, the algorithms and analysis presented here significantly improve our ability to identify ligand binding sites and further advance our understanding of the relationship between evolutionary sequence conservation and structural and functional attributes of proteins. Data, source code, and prediction visualizations are available on the ConCavity web site (http://compbio.cs.princeton.edu/concavity/).

  3. In silico comparative analysis of DNA and amino acid sequences for prion protein gene.

    PubMed

    Kim, Y; Lee, J; Lee, C

    2008-01-01

    Genetic variability might contribute to species specificity of prion diseases in various organisms. In this study, structures of the prion protein gene (PRNP) and its amino acids were compared among species of which sequence data were available. Comparisons of PRNP DNA sequences among 12 species including human, chimpanzee, monkey, bovine, ovine, dog, mouse, rat, wallaby, opossum, chicken and zebrafish allowed us to identify candidate regulatory regions in intron 1 and 3'-untranslated region (UTR) in addition to the coding region. Highly conserved putative binding sites for transcription factors, such as heat shock factor 2 (HSF2) and myocite enhancer factor 2 (MEF2), were discovered in the intron 1. In 3'-UTR, the functional sequence (ATTAAA) for nucleus-specific polyadenylation was found in all the analysed species. The functional sequence (TTTTTAT) for maturation-specific polyadenylation was identically observed only in ovine, and one or two nucleotide mismatches in the other species. A comparison of the amino acid sequences in 53 species revealed a large sequence identity. Especially the octapeptide repeat region was observed in all the species but frog and zebrafish. Functional changes and susceptibility to prion diseases with various isoforms of prion protein could be caused by numeric variability and conformational changes discovered in the repeat sequences.

  4. Cloning, sequence analysis and three-dimensional structure prediction of DNA pol I from thermophilic Geobacillus sp. MKK isolated from an Iranian hot spring.

    PubMed

    Khalaj-Kondori, Mohammad; Sadeghizadeh, Majid; Khajeh, Khosro; Naderi-Manesh, Hossein; Ahadi, Ali Mohammad; Emamzadeh, Abdorahman

    2007-08-01

    Molecular phylogenetic analysis of a novel thermophilic eubacterium isolated from an Iranian hot spring using 16S rDNA sequence showed that the new isolate belongs to genera Geobacillus. DNA pol I gene from this isolate was amplified, cloned, sequenced, and the three-dimensional (3D) structure of deduced amino acid sequence was predicted. Sequence analysis revealed the gene is 2,631 bp long, encodes a protein of 876 amino acids with a calculated molecular mass of 99 kDa, and belongs to family A DNA polymerases. Comparison of 3'-5'exonuclease domain of Klenow fragment (KF) with corresponding region of newly identified DNA pol I (MF), the large fragment of Bacillus stearothermophilus DNA pol I (BF) and Klentaq1, revealed not only deletions in three regions compared to KF, but that three of the four critical metal-binding residues in KF (Asp355, Glu357, Asp424, and Asp501) are altered in MF as well. Predicted 3D structure and sequence alignments between MF and BF showed that all critical residues in the polymerase active site are conserved. PMID:18025581

  5. Antibody-specific model of amino acid substitution for immunological inferences from alignments of antibody sequences.

    PubMed

    Mirsky, Alexander; Kazandjian, Linda; Anisimova, Maria

    2015-03-01

    Antibodies are glycoproteins produced by the immune system as a dynamically adaptive line of defense against invading pathogens. Very elegant and specific mutational mechanisms allow B lymphocytes to produce a large and diversified repertoire of antibodies, which is modified and enhanced throughout all adulthood. One of these mechanisms is somatic hypermutation, which stochastically mutates nucleotides in the antibody genes, forming new sequences with different properties and, eventually, higher affinity and selectivity to the pathogenic target. As somatic hypermutation involves fast mutation of antibody sequences, this process can be described using a Markov substitution model of molecular evolution. Here, using large sets of antibody sequences from mice and humans, we infer an empirical amino acid substitution model AB, which is specific to antibody sequences. Compared with existing general amino acid models, we show that the AB model provides significantly better description for the somatic evolution of mice and human antibody sequences, as demonstrated on large next generation sequencing (NGS) antibody data. General amino acid models are reflective of conservation at the protein level due to functional constraints, with most frequent amino acids exchanges taking place between residues with the same or similar physicochemical properties. In contrast, within the variable part of antibody sequences we observed an elevated frequency of exchanges between amino acids with distinct physicochemical properties. This is indicative of a sui generis mutational mechanism, specific to antibody somatic hypermutation. We illustrate this property of antibody sequences by a comparative analysis of the network modularity implied by the AB model and general amino acid substitution models. We recommend using the new model for computational studies of antibody sequence maturation, including inference of alignments and phylogenetic trees describing antibody somatic hypermutation in

  6. Structure- and Sequence-Based Function Prediction for Non-Homologous Proteins

    PubMed Central

    Sael, Lee; Chitale, Meghana; Kihara, Daisuke

    2012-01-01

    The structural genomics projects have been accumulating an increasing number of protein structures, many of which remain functionally unknown. In parallel effort to experimental methods, computational methods are expected to make a significant contribution for functional elucidation of such proteins. However, conventional computational methods that transfer functions from homologous proteins do not help much for these uncharacterized protein structures because they do not have apparent structural or sequence similarity with the known proteins. Here, we briefly review two avenues of computational function prediction methods, i.e. structure-based methods and sequence-based methods. The focus is on our recently developments of local structure-based methods and sequence-based methods, which can effectively extract function information from distantly related proteins. Two structure-based methods, Pocket-Surfer and Patch-Surfer, identify similar known ligand binding sites for pocket regions in a query protein without using global protein fold similarity information. Two sequence-based methods, PFP and ESG, make use of weakly similar sequences that are conventionally discarded in homology based function annotation. Combined together with experimental methods we hope that computational methods will make leading contribution in functional elucidation of the protein structures. PMID:22270458

  7. Complete amino acid sequence of human plasma Zn-. cap alpha. /sub 2/-glycoprotein and its homology to histocompatibility antigens

    SciTech Connect

    Araki, T.; Gejyo, F.; Takagaki, K.; Haupt, H.; Schwick, H.G.; Buergi, W.; Marti, T.; Schaller, J.; Rickli, E.; Brossmer, R.

    1988-02-01

    In the present study the complete amino acid sequence of human plasma Zn-..cap alpha../sub 2/-glycoprotein was determined. This protein whose biological function is unknown consists of a single polypeptide chain of 276 amino acid residues including 8 tryptophan residues and has a pyroglutamyl residue at the amino terminus. The location of the two disulfide bonds in the polypeptide chain was also established. The three glycans, whose structure was elucidated with the aid of 500 MHz /sup 1/H NMR spectroscopy, were sialylated N-biantennas. The molecular weight calculated from the polypeptide and carbohydrate structure is 38,478, which is close to the reported value of approx. = 41,000 based on physicochemical measurements. The predicted secondary structure appeared to comprised of 23% ..cap alpha..-helix, 27% ..beta..-sheet, and 22% ..beta..-turns. The three N-glycans were found to be located in ..beta..-turn regions. An unexpected finding was made by computer analysis of the sequence data; this revealed that Zn-..cap alpha../sub 2/-glycoprotein is closely related to antigens of the major histocompatibility complex in amino acid sequence and in domain structure. There was an unusually high degree of sequence homology with the ..cap alpha.. chains of class I histocompatibility antigens. Moreover, this plasma protein was shown to be a member of the immunoglobulin gene superfamily. Zn-..cap alpha../sub 2/-glycoprotein appears to be truncated secretory major histocompatibility complex-related molecule, and it may have a role in the expression of the immune response.

  8. The four ingredients of single-sequence RNA secondary structure prediction. A unifying perspective

    PubMed Central

    Rivas, Elena

    2013-01-01

    Any method for RNA secondary structure prediction is determined by four ingredients. The architecture is the choice of features implemented by the model (such as stacked basepairs, loop length distributions, etc.). The architecture determines the number of parameters in the model. The scoring scheme is the nature of those parameters (whether thermodynamic, probabilistic, or weights). The parameterization stands for the specific values assigned to the parameters. These three ingredients are referred to as “the model.” The fourth ingredient is the folding algorithms used to predict plausible secondary structures given the model and the sequence of a structural RNA. Here, I make several unifying observations drawn from looking at more than 40 years of methods for RNA secondary structure prediction in the light of this classification. As a final observation, there seems to be a performance ceiling that affects all methods with complex architectures, a ceiling that impacts all scoring schemes with remarkable similarity. This suggests that modeling RNA secondary structure by using intrinsic sequence-based plausible “foldability” will require the incorporation of other forms of information in order to constrain the folding space and to improve prediction accuracy. This could give an advantage to probabilistic scoring systems since a probabilistic framework is a natural platform to incorporate different sources of information into one single inference problem. PMID:23695796

  9. Using machine learning to predict gene expression and discover sequence motifs

    NASA Astrophysics Data System (ADS)

    Li, Xuejing

    Recently, large amounts of experimental data for complex biological systems have become available. We use tools and algorithms from machine learning to build data-driven predictive models. We first present a novel algorithm to discover gene sequence motifs associated with temporal expression patterns of genes. Our algorithm, which is based on partial least squares (PLS) regression, is able to directly model the flow of information, from gene sequence to gene expression, to learn cis regulatory motifs and characterize associated gene expression patterns. Our algorithm outperforms traditional computational methods e.g. clustering in motif discovery. We then present a study of extending a machine learning model for transcriptional regulation predictive of genetic regulatory response to Caenorhabditis elegans. We show meaningful results both in terms of prediction accuracy on the test experiments and biological information extracted from the regulatory program. The model discovers DNA binding sites ab initio. We also present a case study where we detect a signal of lineage-specific regulation. Finally we present a comparative study on learning predictive models for motif discovery, based on different boosting algorithms: Adaptive Boosting (AdaBoost), Linear Programming Boosting (LPBoost) and Totally Corrective Boosting (TotalBoost). We evaluate and compare the performance of the three boosting algorithms via both statistical and biological validation, for hypoxia response in Saccharomyces cerevisiae.

  10. The four ingredients of single-sequence RNA secondary structure prediction. A unifying perspective.

    PubMed

    Rivas, Elena

    2013-07-01

    Any method for RNA secondary structure prediction is determined by four ingredients. The architecture is the choice of features implemented by the model (such as stacked basepairs, loop length distributions, etc.). The architecture determines the number of parameters in the model. The scoring scheme is the nature of those parameters (whether thermodynamic, probabilistic, or weights). The parameterization stands for the specific values assigned to the parameters. These three ingredients are referred to as "the model." The fourth ingredient is the folding algorithms used to predict plausible secondary structures given the model and the sequence of a structural RNA. Here, I make several unifying observations drawn from looking at more than 40 years of methods for RNA secondary structure prediction in the light of this classification. As a final observation, there seems to be a performance ceiling that affects all methods with complex architectures, a ceiling that impacts all scoring schemes with remarkable similarity. This suggests that modeling RNA secondary structure by using intrinsic sequence-based plausible "foldability" will require the incorporation of other forms of information in order to constrain the folding space and to improve prediction accuracy. This could give an advantage to probabilistic scoring systems since a probabilistic framework is a natural platform to incorporate different sources of information into one single inference problem.

  11. Five-descriptor model to predict the chromatographic sequence of natural compounds.

    PubMed

    Hou, Shuying; Wang, Jinhua; Li, Zhangming; Wang, Yang; Wang, Ying; Yang, Songling; Xu, Jia; Zhu, Wenliang

    2016-03-01

    Despite the recent introduction of mass detection techniques, ultraviolet detection is still widely applied in the field of the chromatographic analysis of natural medicines. Here, a neural network cascade model consisting of nine small artificial neural network units was innovatively developed to predict the chromatographic sequence of natural compounds by integrating five molecular descriptors as the input. A total of 117 compounds of known structure were collected for model building. The order of appearance of each compound was determined in gradient chromatography. Strong linear correlation was found between the predicted and actual chromatographic position orders (Spearman's rho = 0.883, p < 0.0001). Application of the model to the external validation set of nine natural compounds was shown to dramatically increase the prediction accuracy of the real chromatographic order of multiple compounds. A case study shows that chromatographic sequence prediction based on a neural network cascade facilitated compound identification in the chromatographic fingerprint of Radix Salvia miltiorrhiza. For natural medicines of known compound composition, our method provides a feasible means for identifying the constituents of interest when only ultraviolet detection is available.

  12. The hygroscopic properties of dicarboxylic and multifunctional acids: measurements and UNIFAC predictions.

    PubMed

    Peng, C; Chan, M N; Chan, C K

    2001-11-15

    = (1-aw)-0.163, was developed to represent the hygroscopicity of these acids. Water activity predictions from calculations using the UNIFAC model were found to agree with the measured water activity data to within 40% for most of the acids but the deviations were as large as about 100% for malic acid and tartaric acid. We modified the functional group interaction parameters of the COOH(-H20, OH-H20, and OH-COOH pairs by fitting the UNIFAC model with the measured data. The modified UNIFAC model improves the agreement of predictions and measurements to within 38% for all the acids studied.

  13. Analyses of mitochondrial amino acid sequence datasets support the proposal that specimens of Hypodontus macropi from three species of macropodid hosts represent distinct species

    PubMed Central

    2013-01-01

    Background Hypodontus macropi is a common intestinal nematode of a range of kangaroos and wallabies (macropodid marsupials). Based on previous multilocus enzyme electrophoresis (MEE) and nuclear ribosomal DNA sequence data sets, H. macropi has been proposed to be complex of species. To test this proposal using independent molecular data, we sequenced the whole mitochondrial (mt) genomes of individuals of H. macropi from three different species of hosts (Macropus robustus robustus, Thylogale billardierii and Macropus [Wallabia] bicolor) as well as that of Macropicola ocydromi (a related nematode), and undertook a comparative analysis of the amino acid sequence datasets derived from these genomes. Results The mt genomes sequenced by next-generation (454) technology from H. macropi from the three host species varied from 13,634 bp to 13,699 bp in size. Pairwise comparisons of the amino acid sequences predicted from these three mt genomes revealed differences of 5.8% to 18%. Phylogenetic analysis of the amino acid sequence data sets using Bayesian Inference (BI) showed that H. macropi from the three different host species formed distinct, well-supported clades. In addition, sliding window analysis of the mt genomes defined variable regions for future population genetic studies of H. macropi in different macropodid hosts and geographical regions around Australia. Conclusions The present analyses of inferred mt protein sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. PMID:24261823

  14. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies

    PubMed Central

    Dong, Chengliang; Wei, Peng; Jian, Xueqiu; Gibbs, Richard; Boerwinkle, Eric; Wang, Kai; Liu, Xiaoming

    2015-01-01

    Accurate deleteriousness prediction for nonsynonymous variants is crucial for distinguishing pathogenic mutations from background polymorphisms in whole exome sequencing (WES) studies. Although many deleteriousness prediction methods have been developed, their prediction results are sometimes inconsistent with each other and their relative merits are still unclear in practical applications. To address these issues, we comprehensively evaluated the predictive performance of 18 current deleteriousness-scoring methods, including 11 function prediction scores (PolyPhen-2, SIFT, MutationTaster, Mutation Assessor, FATHMM, LRT, PANTHER, PhD-SNP, SNAP, SNPs&GO and MutPred), 3 conservation scores (GERP++, SiPhy and PhyloP) and 4 ensemble scores (CADD, PON-P, KGGSeq and CONDEL). We found that FATHMM and KGGSeq had the highest discriminative power among independent scores and ensemble scores, respectively. Moreover, to ensure unbiased performance evaluation of these prediction scores, we manually collected three distinct testing datasets, on which no current prediction scores were tuned. In addition, we developed two new ensemble scores that integrate nine independent scores and allele frequency. Our scores achieved the highest discriminative power compared with all the deleteriousness prediction scores tested and showed low false-positive prediction rate for benign yet rare nonsynonymous variants, which demonstrated the value of combining information from multiple orthologous approaches. Finally, to facilitate variant prioritization in WES studies, we have pre-computed our ensemble scores for 87 347 044 possible variants in the whole-exome and made them publicly available through the ANNOVAR software and the dbNSFP database. PMID:25552646

  15. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies.

    PubMed

    Dong, Chengliang; Wei, Peng; Jian, Xueqiu; Gibbs, Richard; Boerwinkle, Eric; Wang, Kai; Liu, Xiaoming

    2015-04-15

    Accurate deleteriousness prediction for nonsynonymous variants is crucial for distinguishing pathogenic mutations from background polymorphisms in whole exome sequencing (WES) studies. Although many deleteriousness prediction methods have been developed, their prediction results are sometimes inconsistent with each other and their relative merits are still unclear in practical applications. To address these issues, we comprehensively evaluated the predictive performance of 18 current deleteriousness-scoring methods, including 11 function prediction scores (PolyPhen-2, SIFT, MutationTaster, Mutation Assessor, FATHMM, LRT, PANTHER, PhD-SNP, SNAP, SNPs&GO and MutPred), 3 conservation scores (GERP++, SiPhy and PhyloP) and 4 ensemble scores (CADD, PON-P, KGGSeq and CONDEL). We found that FATHMM and KGGSeq had the highest discriminative power among independent scores and ensemble scores, respectively. Moreover, to ensure unbiased performance evaluation of these prediction scores, we manually collected three distinct testing datasets, on which no current prediction scores were tuned. In addition, we developed two new ensemble scores that integrate nine independent scores and allele frequency. Our scores achieved the highest discriminative power compared with all the deleteriousness prediction scores tested and showed low false-positive prediction rate for benign yet rare nonsynonymous variants, which demonstrated the value of combining information from multiple orthologous approaches. Finally, to facilitate variant prioritization in WES studies, we have pre-computed our ensemble scores for 87 347 044 possible variants in the whole-exome and made them publicly available through the ANNOVAR software and the dbNSFP database. PMID:25552646

  16. Quantitative detection of Aspergillus spp. by real-time nucleic acid sequence-based amplification.

    PubMed

    Zhao, Yanan; Perlin, David S

    2013-01-01

    Rapid and quantitative detection of Aspergillus from clinical samples may facilitate an early diagnosis of invasive pulmonary aspergillosis (IPA). As nucleic acid-based detection is a viable option, we demonstrate that Aspergillus burdens can be rapidly and accurately detected by a novel real-time nucleic acid assay other than qPCR by using the combination of nucleic acid sequence-based amplification (NASBA) and the molecular beacon (MB) technology. Here, we detail a real-time NASBA assay to determine quantitative Aspergillus burdens in lungs and bronchoalveolar lavage (BAL) fluids of rats with experimental IPA.

  17. Predicting and improving the protein sequence alignment quality by support vector regression

    PubMed Central

    Lee, Minho; Jeong, Chan-seok; Kim, Dongsup

    2007-01-01

    Background For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significantly depending on our choice of various alignment parameters such as gap opening penalty and gap extension penalty. Because the accuracy of sequence alignment is typically measured by comparing it with its corresponding structure alignment, there is no good way of evaluating alignment accuracy without knowing the structure of a query protein, which is obviously not available at the time of structure prediction. Moreover, there is no universal alignment parameter option that would always yield the optimal alignment. Results In this work, we develop a method to predict the quality of the alignment between a query and a template. We train the support vector regression (SVR) models to predict the MaxSub scores as a measure of alignment quality. The alignment between a query protein and a template of length n is transformed into a (n + 1)-dimensional feature vector, then it is used as an input to predict the alignment quality by the trained SVR model. Performance of our work is evaluated by various measures including Pearson correlation coefficient between the observed and predicted MaxSub scores. Result shows high correlation coefficient of 0.945. For a pair of query and template, 48 alignments are generated by changing alignment options. Trained SVR models are then applied to predict the MaxSub scores of those and to select the best alignment option which is chosen specifically to the query-template pair. This adaptive selection procedure results in 7.4% improvement of MaxSub scores, compared to those when the single best parameter option is used for all query-template pairs. Conclusion The present work demonstrates that the alignment quality can be

  18. Draft Genome Sequence of the Butyric Acid Producer Clostridium tyrobutyricum Strain CIP I-776 (IFP923)

    PubMed Central

    Clément, Benjamin; Lopes Ferreira, Nicolas

    2016-01-01

    Here, we report the draft genome sequence of Clostridium tyrobutyricum CIP I-776 (IFP923), an efficient producer of butyric acid. The genome consists of a single chromosome of 3.19 Mb and provides useful data concerning the metabolic capacities of the strain. PMID:26941139

  19. Amino acid sequence of the encephalitogenic basic protein from human myelin

    PubMed Central

    Carnegie, P. R.

    1971-01-01

    Myelin from the central nervous system contains an unusual basic protein, which can induce experimental autoimmune encephalomyelitis. The basic protein from human brain was digested with trypsin and other enzymes and the sequence of the 170 amino acids was determined. The localization of the encephalitogenic determinants was described. Possible roles for the protein in the structure and function of myelin are discussed. PMID:4108501

  20. A novel baseline hepatitis B virus sequencing-based strategy for predicting adefovir antiviral response.

    PubMed

    Wang, Yu-Wei; Shan, Xuefeng; Huang, Yao; Deng, Haijun; Huang, Wen-Xiang; Zhang, Da-Zhi; Chen, Juan; Tang, Ni; Shan, You-Lan; Guo, Jin-Jun; Huang, Ailong

    2015-07-01

    Adefovir dipivoxil (ADV) is used as first-line monotherapy or rescue therapy in chronic hepatitis B (CHB) patients. In this study, we sought to identify nucleotide changes in the reverse transcriptase (RT) of hepatitis B virus (HBV) at baseline and explore their predictive value for ADV antiviral response. Ultra-deep pyrosequencing (UDPS) was utilized to determine HBV genetic variability within the RT region at baseline and during a 48-week ADV therapy. According to the viral load at the end of ADV treatment, all patients were classified into responders (HBV DNA level reduction of ⩾ 3 log 10 IU/mL) and suboptimal responders (HBV DNA level reduction of <3 log 10 IU/mL). Based on UDPS data at baseline, we identified 11 nucleotide substitutions whose combination frequency was significantly associated with the antiviral response among 36 CHB patients in the study group. However, the baseline distribution and frequency of rt181 and rt236 substitutions known to confer ADV resistance was a poor predictor for the antiviral response. Compared with baseline serum HBeAg, HBV-DNA and ALT levels, the baseline HBV sequence-based model showed higher predictive accuracy for ADV response. In an independent cohort of 31 validation patients with CHB, the sequence-based model provided greater predictive potency than the HBeAg/HBV-DNA/ALT and the HBeAg/HBV-DNA/ALT/sequence combinations. Taken together, we confirm the presence of ADV resistance variants in treatment-naïve patients and firstly unravel the predictive value of the baseline mutations in the HBV RT region for ADV antiviral response.

  1. Many amino acid substitution variants identified in DNA repair genes during human population screenings are predicted to impact protein function

    SciTech Connect

    Xi, T; Jones, I M; Mohrenweiser, H W

    2003-11-03

    Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant From Tolerant (SIFT) classified 226 of 508 variants (44%) as ''Intolerant''. Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as ''Probably or Possibly Damaging''. Another 9-15% of the variants were classed as ''Potentially Intolerant or Damaging''. The results from the two algorithms are highly associated, with concordance in predicted impact observed for {approx}62% of the variants. Twenty one to thirty one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as ''Tolerant'' or ''Benign''. Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.

  2. Sequence-specific formation of d-amino acids in a monoclonal antibody during light exposure.

    PubMed

    Mozziconacci, Olivier; Schöneich, Christian

    2014-11-01

    The photoirradiation of a monoclonal antibody 1 (mAb1) at λ = 254 nm and λmax = 305 nm resulted in the sequence-specific generation of d-Val, d-Tyr, and potentially d-Ala and d-Arg, in the heavy chain sequence [95-101] YCARVVY. d-Amino acid formation is most likely the product of reversible intermediary carbon-centered radical formation at the (α)C-positions of the respective amino acids ((α)C(•) radicals) through the action of Cys thiyl radicals (CysS(•)). The latter can be generated photochemically either through direct homolysis of cystine or through photoinduced electron transfer from Trp and/or Tyr residues. The potential of mAb1 sequences to undergo epimerization was first evaluated through covalent H/D exchange during photoirradiation in D2O, and proteolytic peptides exhibiting deuterium incorporation were monitored by HPLC-MS/MS analysis. Subsequently, mAb1 was photoirradiated in H2O, and peptides, for which deuterium incorporation in D2O had been documented, were purified by HPLC and subjected to hydrolysis and amino acid analysis. Importantly, not all peptide sequences which incorporated deuterium during photoirradiation in D2O also exhibited photoinduced d-amino acid formation. For example, the heavy chain sequence [12-18] VQPGGSL showed significant deuterium incorporation during photoirradiation in D2O, but no photoinduced formation of d-amino acids was detected. Instead this sequence contained ca. 22% d-Val in both a photoirradiated and a control sample. This observation could indicate that d-Val may have been generated either during production and/or storage or during sample preparation. While sample preparation did not lead to the formation of d-Val or other d-amino acids in the control sample for the heavy chain sequence [95-101] YCARVVY, we may have to consider that during hydrolysis N-terminal residues (such as in VQPGGSL) may be more prone to epimerization. We conclude that the photoinduced, radical-dependent formation of d-amino acids

  3. The complete amino acid sequence of chitinase-B from the leaves of pokeweed (Phytolacca americana).

    PubMed

    Tanigawa, M; Yamagami, T; Funatsu, G

    1995-05-01

    The complete amino acid sequence of pokeweed leaf chitinase-B (PLC-B) has been determined by first sequencing all 19 tryptic peptides derived from the reduced and S-carboxymethylated (RCm-) PLC-B and then connecting them by analyzing the chymotryptic peptides from three fragments produced by cyanogen bromide cleavage of RCm-PLC-B. PLC-B consists of 274 amino acid residues and has a molecular mass of 29,473 Da. Six cysteine residues are linked by disulfide bonds between Cys20 and Cys67, Cys50 and Cys57, and Cys159 and Cys188. From 58-68% sequence homology of PLC-B with five class III chitinases, it was concluded that PLC-B is a basic class III chitinase.

  4. Pyruvate decarboxylase from Pisum sativum. Properties, nucleotide and amino acid sequences.

    PubMed

    Mücke, U; Wohlfarth, T; Fiedler, U; Bäumlein, H; Rücknagel, K P; König, S

    1996-04-15

    To study the molecular structure and function of pyruvate decarboxylase (PDC) from plants the protein was isolated from pea seeds and partially characterised. The active enzyme which occurs in the form of higher oligomers consists of two different subunits appearing in SDS/PAGE and mass spectroscopy experiments. For further experiments, like X-ray crystallography, it was necessary to elucidate the protein sequence. Partial cDNA clones encoding pyruvate decarboxylase from seeds of Pisum sativum cv. Miko have been obtained by means of polymerase chain reaction techniques. The first sequences were found using degenerate oligonucleotide primers designated according to conserved amino acid sequences of known pyruvate decarboxylases. The missing parts of one cDNA were amplified applying the 3'- and 5'-rapid amplification of cDNA ends systems. The amino acid sequence deduced from the entire cDNA sequence displays strong similarity to pyruvate decarboxylases from other organisms, especially from plants. A molecular mass of 64 kDa was calculated for this protein correlating with estimations for the smaller subunit of the oligomeric enzyme. The PCR experiments led to at least three different clones representing the middle part of the PDC cDNA indicating the existence of three isozymes. Two of these isoforms could be confirmed on the protein level by sequencing tryptic peptides. Only anaerobically treated roots showed a positive signal for PDC mRNA in Northern analysis although the cDNA from imbibed seeds was successfully used for PCR.

  5. Sleep spindles predict neural and behavioral changes in motor sequence consolidation.

    PubMed

    Barakat, Marc; Carrier, Julie; Debas, Karen; Lungu, Ovidiu; Fogel, Stuart; Vandewalle, Gilles; Hoge, Richard D; Bellec, Pierre; Karni, Avi; Ungerleider, Leslie G; Benali, Habib; Doyon, Julien

    2013-11-01

    The purpose of this study was to investigate the predictive function of sleep spindles in motor sequence consolidation. BOLD responses were acquired in 10 young healthy subjects who were trained on an explicitly known 5-item sequence using their left nondominant hand, scanned at 9:00 pm while performing that same task and then were retested and scanned 12 h later after a night of sleep during which polysomnographic measures were recorded. An automatic algorithm was used to detect sleep spindles and to quantify their characteristics (i.e., density, amplitude, and duration). Analyses revealed significant positive correlations between gains in performance and the amplitude of spindles. Moreover, significant increases in BOLD signal were observed in several motor-related areas, most of which were localized in the right hemisphere, particularly in the right cortico-striatal system. Such increases in BOLD signal also correlated positively with the amplitude of spindles at several derivations. Taken together, our results show that sleep spindles predict neural and behavioral changes in overnight motor sequence consolidation.

  6. Allelic polymorphism in arabian camel ribonuclease and the amino acid sequence of bactrian camel ribonuclease.

    PubMed

    Welling, G W; Mulder, H; Beintema, J J

    1976-04-01

    Pancreatic ribonucleases from several species (whitetail deer, roe deer, guinea pig, and arabian camel) exhibit more than one amino acid at particular positions in their amino acid sequences. Since these enzymes were isolated from pooled pancreas, the origin of this heterogeneity is not clear. The pancreatic ribonucleases from 11 individual arabian camels (Camelus dromedarius) have been investigated with respect to the lysine-glutamine heterogeneity at position 103 (Welling et al., 1975). Six ribonucleases showed only one basic band and five showed two bands after polyacrylamide gel electrophoresis, suggesting a gene frequency of about 0.75 for the Lys gene and about 0.25 for the Gln gene. The amino acid sequence of bactrian camel (Camelus bactrianus) ribonuclease isolated from individual pancreatic tissue was determined and compared with that of arabian camel ribonuclease. The only difference was observed at position 103. In the ribonucleases from two unrelated bactrian camels, only glutamine was observed at that position. PMID:962846

  7. The Recipe for Protein Sequence-Based Function Prediction and Its Implementation in the ANNOTATOR Software Environment.

    PubMed

    Eisenhaber, Birgit; Kuchibhatla, Durga; Sherman, Westley; Sirota, Fernanda L; Berezovsky, Igor N; Wong, Wing-Cheong; Eisenhaber, Frank

    2016-01-01

    As biomolecular sequencing is becoming the main technique in life sciences, functional interpretation of sequences in terms of biomolecular mechanisms with in silico approaches is getting increasingly significant. Function prediction tools are most powerful for protein-coding sequences; yet, the concepts and technologies used for this purpose are not well reflected in bioinformatics textbooks. Notably, protein sequences typically consist of globular domains and non-globular segments. The two types of regions require cardinally different approaches for function prediction. Whereas the former are classic targets for homology-inspired function transfer based on remnant, yet statistically significant sequence similarity to other, characterized sequences, the latter type of regions are characterized by compositional bias or simple, repetitive patterns and require lexical analysis and/or empirical sequence pattern-function correlations. The recipe for function prediction recommends first to find all types of non-globular segments and, then, to subject the remaining query sequence to sequence similarity searches. We provide an updated description of the ANNOTATOR software environment as an advanced example of a software platform that facilitates protein sequence-based function prediction. PMID:27115649

  8. The Recipe for Protein Sequence-Based Function Prediction and Its Implementation in the ANNOTATOR Software Environment.

    PubMed

    Eisenhaber, Birgit; Kuchibhatla, Durga; Sherman, Westley; Sirota, Fernanda L; Berezovsky, Igor N; Wong, Wing-Cheong; Eisenhaber, Frank

    2016-01-01

    As biomolecular sequencing is becoming the main technique in life sciences, functional interpretation of sequences in terms of biomolecular mechanisms with in silico approaches is getting increasingly significant. Function prediction tools are most powerful for protein-coding sequences; yet, the concepts and technologies used for this purpose are not well reflected in bioinformatics textbooks. Notably, protein sequences typically consist of globular domains and non-globular segments. The two types of regions require cardinally different approaches for function prediction. Whereas the former are classic targets for homology-inspired function transfer based on remnant, yet statistically significant sequence similarity to other, characterized sequences, the latter type of regions are characterized by compositional bias or simple, repetitive patterns and require lexical analysis and/or empirical sequence pattern-function correlations. The recipe for function prediction recommends first to find all types of non-globular segments and, then, to subject the remaining query sequence to sequence similarity searches. We provide an updated description of the ANNOTATOR software environment as an advanced example of a software platform that facilitates protein sequence-based function prediction.

  9. Integrating sequence stratigraphy and rock-physics to interpret seismic amplitudes and predict reservoir quality

    NASA Astrophysics Data System (ADS)

    Dutta, Tanima

    This dissertation focuses on the link between seismic amplitudes and reservoir properties. Prediction of reservoir properties, such as sorting, sand/shale ratio, and cement-volume from seismic amplitudes improves by integrating knowledge from multiple disciplines. The key contribution of this dissertation is to improve the prediction of reservoir properties by integrating sequence stratigraphy and rock physics. Sequence stratigraphy has been successfully used for qualitative interpretation of seismic amplitudes to predict reservoir properties. Rock physics modeling allows quantitative interpretation of seismic amplitudes. However, often there is uncertainty about selecting geologically appropriate rock physics model and its input parameters, away from the wells. In the present dissertation, we exploit the predictive power of sequence stratigraphy to extract the spatial trends of sedimentological parameters that control seismic amplitudes. These spatial trends of sedimentological parameters can serve as valuable constraints in rock physics modeling, especially away from the wells. Consequently, rock physics modeling, integrated with the trends from sequence stratigraphy, become useful for interpreting observed seismic amplitudes away from the wells in terms of underlying sedimentological parameters. We illustrate this methodology using a comprehensive dataset from channelized turbidite systems, deposited in minibasin settings in the offshore Equatorial Guinea, West Africa. First, we present a practical recipe for using closed-form expressions of effective medium models to predict seismic velocities in unconsolidated sandstones. We use an effective medium model that combines perfectly rough and smooth grains (the extended Walton model), and use that model to derive coordination number, porosity, and pressure relations for P and S wave velocities from experimental data. Our recipe provides reasonable fits to other experimental and borehole data, and specifically

  10. Application of sequence stratigraphy to carbonate reservoir prediction, Early Palaeozoic eastern Warburton basin, South Australia

    SciTech Connect

    Xiaowen S.; Stuart, W.J.

    1996-01-01

    The Early Palaeozoic Warburton Basin underlies the gas and oil producing Cooper and Eromanga Basins. Postdepositional tectonism created high potential fracture porosities, complicating the stratigraphy and making reservoir prediction difficult. Sequence stratigraphy integrating core, cuttings, well-log, seismic and biostratigraphic data has recognized a carbonate-dominated to mixed carbonate/siliciclastic supersequence comprising several depositional sequences. Biostratigraphy based on trilobites and conodonts ensures reliable well and seismic correlations across structurally complex areas. Lithofacies interpretation indicates sedimentary environments ranging from carbonate inner shelf, peritidal, shelf edge, deep outer shelf and slope to basin. Log facies show gradually upward shallowing trends or abrupt changes indicating possible sequence boundaries. With essential depositional models and sequence analysis from well data, seismic facies suggest general reflection configurations including parallel-continuous layered patterns indicating uniform neuritic shelf, and mounded structures suggesting carbonate build-ups and pre-existing volcanic relief. Seismic stratigraphy also reveals inclined slope and onlapping margins of a possibly isolated platform geometry. The potential reservoirs are dolomitized carbonates containing oomoldic, vuggy, intercrystalline and fracture porosities in lowstand systems tracts either on carbonate mounds and shelf crests or below shelf edge. The source rock is a deep basinal argillaceous mudstone, and the seal is fine-grained siltstone/shale of the transgressive system tract.

  11. Application of sequence stratigraphy to carbonate reservoir prediction, Early Palaeozoic eastern Warburton basin, South Australia

    SciTech Connect

    Xiaowen S.; Stuart, W.J.

    1996-12-31

    The Early Palaeozoic Warburton Basin underlies the gas and oil producing Cooper and Eromanga Basins. Postdepositional tectonism created high potential fracture porosities, complicating the stratigraphy and making reservoir prediction difficult. Sequence stratigraphy integrating core, cuttings, well-log, seismic and biostratigraphic data has recognized a carbonate-dominated to mixed carbonate/siliciclastic supersequence comprising several depositional sequences. Biostratigraphy based on trilobites and conodonts ensures reliable well and seismic correlations across structurally complex areas. Lithofacies interpretation indicates sedimentary environments ranging from carbonate inner shelf, peritidal, shelf edge, deep outer shelf and slope to basin. Log facies show gradually upward shallowing trends or abrupt changes indicating possible sequence boundaries. With essential depositional models and sequence analysis from well data, seismic facies suggest general reflection configurations including parallel-continuous layered patterns indicating uniform neuritic shelf, and mounded structures suggesting carbonate build-ups and pre-existing volcanic relief. Seismic stratigraphy also reveals inclined slope and onlapping margins of a possibly isolated platform geometry. The potential reservoirs are dolomitized carbonates containing oomoldic, vuggy, intercrystalline and fracture porosities in lowstand systems tracts either on carbonate mounds and shelf crests or below shelf edge. The source rock is a deep basinal argillaceous mudstone, and the seal is fine-grained siltstone/shale of the transgressive system tract.

  12. A Novel Method for Accurate Operon Predictions in All SequencedProkaryotes

    SciTech Connect

    Price, Morgan N.; Huang, Katherine H.; Alm, Eric J.; Arkin, Adam P.

    2004-12-01

    We combine comparative genomic measures and the distance separating adjacent genes to predict operons in 124 completely sequenced prokaryotic genomes. Our method automatically tailors itself to each genome using sequence information alone, and thus can be applied to any prokaryote. For Escherichia coli K12 and Bacillus subtilis, our method is 85 and 83% accurate, respectively, which is similar to the accuracy of methods that use the same features but are trained on experimentally characterized transcripts. In Halobacterium NRC-1 and in Helicobacterpylori, our method correctly infers that genes in operons are separated by shorter distances than they are in E.coli, and its predictions using distance alone are more accurate than distance-only predictions trained on a database of E.coli transcripts. We use microarray data from sixphylogenetically diverse prokaryotes to show that combining intergenic distance with comparative genomic measures further improves accuracy and that our method is broadly effective. Finally, we survey operon structure across 124 genomes, and find several surprises: H.pylori has many operons, contrary to previous reports; Bacillus anthracis has an unusual number of pseudogenes within conserved operons; and Synechocystis PCC6803 has many operons even though it has unusually wide spacings between conserved adjacent genes.

  13. Accurate single-sequence prediction of solvent accessible surface area using local and global features.

    PubMed

    Faraggi, Eshel; Zhou, Yaoqi; Kloczkowski, Andrzej

    2014-11-01

    We present a new approach for predicting the Accessible Surface Area (ASA) using a General Neural Network (GENN). The novelty of the new approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Instead we use solely sequential window information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment-based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is tested on predicting the ASA of globular proteins and found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for GENN and ASAquick are available from Research and Information Systems at http://mamiris.com, from the SPARKS Lab at http://sparks-lab.org, and from the Battelle Center for Mathematical Medicine at http://mathmed.org. PMID:25204636

  14. Development of a protein-ligand-binding site prediction method based on interaction energy and sequence conservation.

    PubMed

    Tsujikawa, Hiroto; Sato, Kenta; Wei, Cao; Saad, Gul; Sumikoshi, Kazuya; Nakamura, Shugo; Terada, Tohru; Shimizu, Kentaro

    2016-09-01

    We present a new method for predicting protein-ligand-binding sites based on protein three-dimensional structure and amino acid conservation. This method involves calculation of the van der Waals interaction energy between a protein and many probes placed on the protein surface and subsequent clustering of the probes with low interaction energies to identify the most energetically favorable locus. In addition, it uses amino acid conservation among homologous proteins. Ligand-binding sites were predicted by combining the interaction energy and the amino acid conservation score. The performance of our prediction method was evaluated using a non-redundant dataset of 348 ligand-bound and ligand-unbound protein structure pairs, constructed by filtering entries in a ligand-binding site structure database, LigASite. Ligand-bound structure prediction (bound prediction) indicated that 74.0 % of predicted ligand-binding sites overlapped with real ligand-binding sites by over 25 % of their volume. Ligand-unbound structure prediction (unbound prediction) indicated that 73.9 % of predicted ligand-binding residues overlapped with real ligand-binding residues. The amino acid conservation score improved the average prediction accuracy by 17.0 and 17.6 points for the bound and unbound predictions, respectively. These results demonstrate the effectiveness of the combined use of the interaction energy and amino acid conservation in the ligand-binding site prediction. PMID:27400687

  15. Nucleotide sequence of Crithidia fasciculata cytosol 5S ribosomal ribonucleic acid.

    PubMed

    MacKay, R M; Gray, M W; Doolittle, W F

    1980-11-11

    The complete nucleotide sequence of the cytosol 5S ribosomal ribonucleic acid of the trypanosomatid protozoan Crithidia fasciculata has been determined by a combination of T1-oligonucleotide catalog and gel sequencing techniques. The sequence is: GAGUACGACCAUACUUGAGUGAAAACACCAUAUCCCGUCCGAUUUGUGAAGUUAAGCACC CACAGGCUUAGUUAGUACUGAGGUCAGUGAUGACUCGGGAACCCUGAGUGCCGUACUCCCOH. This 5S ribosomal RNA is unique in having GAUU in place of the GAAC or GAUC found in all other prokaryotic and eukaryotic 5S RNAs, and thought to be involved in interactions with tRNAs. Comparisons to other eukaryotic cytosol 5S ribosomal RNA sequences indicate that the four major eukaryotic kingdoms (animals, plants, fungi, and protists) are about equally remote from each other, and that the latter kingdom may be the most internally diverse.

  16. Pattern recognition in nucleic acid sequences. II. An efficient method for finding locally stable secondary structures.

    PubMed Central

    Kanehisa, M I; Goad, W B

    1982-01-01

    We present a method for calculating all possible single hairpin loop secondary structures in a nucleic acid sequence by the order of N2 operations where N is the total number of bases. Each structure may contain any number of bulges and internal loops. Most natural sequences are found to be indistinguishable from random sequences in the potential of forming secondary structures, which is defined by the frequency of possible secondary structures calculated by the method. There is a strong correlation between the higher G+C content and the higher structure forming potential. Interestingly, the removal of intervening sequences in mRNAs is almost always accompanied by an increase in the G+C content, which may suggest an involvement of structural stabilization in the mRNA maturation. PMID:6174936

  17. Depositional sequence analysis and sedimentologic modeling for improved prediction of Pennsylvanian reservoirs (Annex 1)

    SciTech Connect

    Watney, W.L.

    1992-01-01

    Interdisciplinary studies of the Upper Pennsylvanian Lansing and Kansas City groups have been undertaken in order to improve the geologic characterization of petroleum reservoirs and to develop a quantitative understanding of the processes responsible for formation of associated depositional sequences. To this end, concepts and methods of sequence stratigraphy are being used to define and interpret the three-dimensional depositional framework of the Kansas City Group. The investigation includes characterization of reservoir rocks in oil fields in western Kansas, description of analog equivalents in near-surface and surface sites in southeastern Kansas, and construction of regional structural and stratigraphic framework to link the site specific studies. Geologic inverse and simulation models are being developed to integrate quantitative estimates of controls on sedimentation to produce reconstructions of reservoir-bearing strata in an attempt to enhance our ability to predict reservoir characteristics.

  18. TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors.

    PubMed

    Eichner, Johannes; Topf, Florian; Dräger, Andreas; Wrzodek, Clemens; Wanke, Dierk; Zell, Andreas

    2013-01-01

    One of the key mechanisms of transcriptional control are the specific connections between transcription factors (TF) and cis-regulatory elements in gene promoters. The elucidation of these specific protein-DNA interactions is crucial to gain insights into the complex regulatory mechanisms and networks underlying the adaptation of organisms to dynamically changing environmental conditions. As experimental techniques for determining TF binding sites are expensive and mostly performed for selected TFs only, accurate computational approaches are needed to analyze transcriptional regulation in eukaryotes on a genome-wide level. We implemented a four-step classification workflow which for a given protein sequence (1) discriminates TFs from other proteins, (2) determines the structural superclass of TFs, (3) identifies the DNA-binding domains of TFs and (4) predicts their cis-acting DNA motif. While existing tools were extended and adapted for performing the latter two prediction steps, the first two steps are based on a novel numeric sequence representation which allows for combining existing knowledge from a BLAST scan with robust machine learning-based classification. By evaluation on a set of experimentally confirmed TFs and non-TFs, we demonstrate that our new protein sequence representation facilitates more reliable identification and structural classification of TFs than previously proposed sequence-derived features. The algorithms underlying our proposed methodology are implemented in the two complementary tools TFpredict and SABINE. The online and stand-alone versions of TFpredict and SABINE are freely available to academics at http://www.cogsys.cs.uni-tuebingen.de/software/TFpredict/ and http://www.cogsys.cs.uni-tuebingen.de/software/SABINE/.

  19. Efficient Nucleic Acid Extraction and 16S rRNA Gene Sequencing for Bacterial Community Characterization.

    PubMed

    Anahtar, Melis N; Bowman, Brittany A; Kwon, Douglas S

    2016-01-01

    There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information. PMID:27168460

  20. Efficient Nucleic Acid Extraction and 16S rRNA Gene Sequencing for Bacterial Community Characterization

    PubMed Central

    Anahtar, Melis N.; Bowman, Brittany A.; Kwon, Douglas S.

    2016-01-01

    There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information. PMID:27168460

  1. Design of nucleic acid sequences for DNA computing based on a thermodynamic approach.

    PubMed

    Tanaka, Fumiaki; Kameda, Atsushi; Yamamoto, Masahito; Ohuchi, Azuma

    2005-01-01

    We have developed an algorithm for designing multiple sequences of nucleic acids that have a uniform melting temperature between the sequence and its complement and that do not hybridize non-specifically with each other based on the minimum free energy (DeltaG (min)). Sequences that satisfy these constraints can be utilized in computations, various engineering applications such as microarrays, and nano-fabrications. Our algorithm is a random generate-and-test algorithm: it generates a candidate sequence randomly and tests whether the sequence satisfies the constraints. The novelty of our algorithm is that the filtering method uses a greedy search to calculate DeltaG (min). This effectively excludes inappropriate sequences before DeltaG (min) is calculated, thereby reducing computation time drastically when compared with an algorithm without the filtering. Experimental results in silico showed the superiority of the greedy search over the traditional approach based on the hamming distance. In addition, experimental results in vitro demonstrated that the experimental free energy (DeltaG (exp)) of 126 sequences correlated well with DeltaG (min) (|R| = 0.90) than with the hamming distance (|R| = 0.80). These results validate the rationality of a thermodynamic approach. We implemented our algorithm in a graphic user interface-based program written in Java.

  2. Chemical genomic profiling via barcode sequencing to predict compound mode of action

    PubMed Central

    Piotrowski, Jeff S.; Simpkins, Scott W.; Li, Sheena C.; Deshpande, Raamesh; McIlwain, Sean; Ong, Irene; Myers, Chad L.; Boone, Charlie; Andersen, Raymond J.

    2015-01-01

    Summary Chemical genomics is an unbiased, whole-cell approach to characterizing novel compounds to determine mode of action and cellular target. Our version of this technique is built upon barcoded deletion mutants of Saccharomyces cerevisiae and has been adapted to a high-throughput methodology using next-generation sequencing. Here we describe the steps to generate a chemical genomic profile from a compound of interest, and how to use this information to predict molecular mechanism and targets of bioactive compounds. PMID:25618354

  3. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Patel, Kamlesh D [Ken; SNL,

    2016-07-12

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  4. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    Patel, Kamlesh D; SNL,

    2012-06-01

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  5. Lower serum uric acid level predicts mortality in dialysis patients

    PubMed Central

    Bae, Eunjin; Cho, Hyun-Jeong; Shin, Nara; Kim, Sun Moon; Yang, Seung Hee; Kim, Dong Ki; Kim, Yong-Lim; Kang, Shin-Wook; Yang, Chul Woo; Kim, Nam Ho; Kim, Yon Su; Lee, Hajeong

    2016-01-01

    Abstract We evaluated the impact of serum uric acid (SUA) on mortality in patients with chronic dialysis. A total of 4132 adult patients on dialysis were enrolled prospectively between August 2008 and September 2014. Among them, we included 1738 patients who maintained dialysis for at least 3 months and had available SUA in the database. We categorized the time averaged-SUA (TA-SUA) into 5 groups: <5.5, 5.5–6.4, 6.5–7.4, 7.5–8.4, and ≥8.5 mg/dL. Cox regression analysis was used to calculate the hazard ratio (HR) of all-cause mortality according to SUA group. The mean TA-SUA level was slightly higher in men than in women. Patients with lower TA-SUA level tended to have lower body mass index (BMI), phosphorus, serum albumin level, higher proportion of diabetes mellitus (DM), and higher proportion of malnourishment on the subjective global assessment (SGA). During a median follow-up of 43.9 months, 206 patients died. Patients with the highest SUA had a similar risk to the middle 3 TA-SUA groups, but the lowest TA-SUA group had a significantly elevated HR for mortality. The lowest TA-SUA group was significantly associated with increased all-cause mortality (adjusted HR, 1.720; 95% confidence interval, 1.007–2.937; P = 0.047) even after adjusting for demographic, comorbid, nutritional covariables, and medication use that could affect SUA levels. This association was prominent in patients with well nourishment on the SGA, a preserved serum albumin level, a higher BMI, and concomitant DM although these parameters had no significant interaction in the TA-SUA-mortality relationship except DM. In conclusion, a lower TA-SUA level <5.5 mg/dL predicted all-cause mortality in patients with chronic dialysis. PMID:27310949

  6. Studies on adenosine triphosphate transphosphorylases. Amino acid sequence of rabbit muscle ATP-AMP transphosphorylase.

    PubMed

    Kuby, S A; Palmieri, R H; Frischat, A; Fischer, A H; Wu, L H; Maland, L; Manship, M

    1984-05-22

    The total amino acid sequence of rabbit muscle adenylate kinase has been determined, and the single polypeptide chain of 194 amino acid residues starts with N-acetylmethionine and ends with leucyllysine at its carboxyl terminus, in agreement with the earlier data on its amino acid composition [Mahowald, T. A., Noltmann, E. A., & Kuby, S. A. (1962) J. Biol. Chem. 237, 1138-1145] and its carboxyl-terminus sequence [Olson, O. E., & Kuby, S. A. (1964) J. Biol. Chem. 239, 460-467]. Elucidation of the primary structure was based on tryptic and chymotryptic cleavages of the performic acid oxidized protein, cyanogen bromide cleavages of the 14C-labeled S-carboxymethylated protein at its five methionine sites (followed by maleylation of peptide fragments), and tryptic cleavages at its 12 arginine sites of the maleylated 14C-labeled S-carboxymethylated protein. Calf muscle myokinase, whose sequence has also been established, differs primarily from the rabbit muscle myokinase's sequence in the following: His-30 is replaced by Gln-30; Lys-56 is replaced by Met-56; Ala-84 and Asp 85 are replaced by Val-84 and Asn-85. A comparison of the four muscle-type adenylate kinases, whose covalent structures have now been determined, viz., rabbit, calf, porcine, and human [for the latter two sequences see Heil, A., Müller, G., Noda, L., Pinder, T., Schirmer, H., Schirmer, I., & Von Zabern, I. (1974) Eur. J. Biochem. 43, 131-144, and Von Zabern, I., Wittmann-Liebold, B., Untucht-Grau, R., Schirmer, R. H., & Pai, E. F. (1976) Eur. J. Biochem. 68, 281-290], demonstrates an extraordinary degree of homology.(ABSTRACT TRUNCATED AT 250 WORDS)

  7. Complete amino acid sequence of a human monocyte chemoattractant, a putative mediator of cellular immune reactions.

    PubMed Central

    Robinson, E A; Yoshimura, T; Leonard, E J; Tanaka, S; Griffin, P R; Shabanowitz, J; Hunt, D F; Appella, E

    1989-01-01

    In a study of the structural basis for leukocyte specificity of chemoattractants, we determined the complete amino acid sequence of human glioma-derived monocyte chemotactic factor (GDCF-2), a peptide that attracts human monocytes but not neutrophils. The choice of a tumor cell product for analysis was dictated by its relative abundance and an amino acid composition indistinguishable from that of lymphocyte-derived chemotactic factor (LDCF), the agonist thought to account for monocyte accumulation in cellular immune reactions. By a combination of Edman degradation and mass spectrometry, it was established that GDCF-2 comprises 76 amino acid residues, commencing at the N terminus with pyroglutamic acid. The peptide contains four half-cystines, at positions 11, 12, 36, and 52, which create a pair of loops, clustered at the disulfide bridges. The relative positions of the half-cystines are almost identical to those of monocyte-derived neutrophil chemotactic factor (MDNCF), a peptide of similar mass but with only 24% sequence identity to GDCF. Thus, GDCF and MDNCF have a similar gross secondary structure because of the loops formed by the clustered disulfides, and their different leukocyte specificities are most likely determined by the large differences in primary sequence. PMID:2648385

  8. Amino acid sequences of lower vertebrate parvalbumins and their evolution: parvalbumins of boa, turtle, and salamander.

    PubMed

    Maeda, N; Zhu, D X; Fitch, W M

    1984-11-01

    One major parvalbumin each was isolated from the skeletal muscle of two reptiles, a boa snake, Boa constrictor, and a map turtle, Graptemys geographica, while two parvalbumins were isolated from an amphibian, the salamander Amphiuma means. The amino acid sequences of all four parvalbumins were determined from the sequences of their tryptic peptides, which were ordered partially by homology to other parvalbumins. Phylogenetic study of these and 16 other parvalbumin sequences revealed that the turtle parvalbumin belongs to beta lineage, while the salamander sequences belong, one each, to the alpha and beta lineages defined by Goodman and Pechère (1977). Boa parvalbumin, however, while belonging to the beta lineage, clusters within the fish in all reasonably parsimonious trees. The most parsimonious trees show many parallel or back mutations in the evolution of many parvalbumin residues, although the residues responsible for Ca2+ binding are very well conserved. These most parsimonious trees show an actinopterygian rather than a crossoptyrigian origin of the tetrapods in both the alpha and beta groups. One of two electric eel parvalbumins is evolving more than 10 times faster than its paralogous partner, suggesting it may be on its way to becoming a pseudogene. It is concluded that varying rates of amino acid replacement, much homoplasy, considerable gene duplication, plus complicated lineages make the set of parvalbumin sequences unsuitable for systematic study of the origin of the tetrapods and other higher-taxa divergence, although it may be suitable within a genus or family.

  9. Lateralized human hippocampal activity predicts navigation based on sequence or place memory

    PubMed Central

    Iglói, Kinga; Doeller, Christian F.; Berthoz, Alain; Rondi-Reig, Laure; Burgess, Neil

    2010-01-01

    The hippocampus is crucial for both spatial navigation and episodic memory, suggesting that it provides a common function to both. Here we adapt a spatial paradigm, developed for rodents, for use with functional MRI in humans to show that activation of the right hippocampus predicts the use of an allocentric spatial representation, and activation of the left hippocampus predicts the use of a sequential egocentric representation. Both representations can be identified in hippocampal activity before their effect on behavior at subsequent choice-points. Our results suggest that, rather than providing a single common function, the two hippocampi provide complementary representations for navigation, concerning places on the right and temporal sequences on the left, both of which likely contribute to different aspects of episodic memory. PMID:20660746

  10. PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences

    PubMed Central

    Murakami, Yoichi; Spriggs, Ruth V.; Nakamura, Haruki; Jones, Susan

    2010-01-01

    The PiRaNhA web server is a publicly available online resource that automatically predicts the location of RNA-binding residues (RBRs) in protein sequences. The goal of functional annotation of sequences in the field of RNA binding is to provide predictions of high accuracy that require only small numbers of targeted mutations for verification. The PiRaNhA server uses a support vector machine (SVM), with position-specific scoring matrices, residue interface propensity, predicted residue accessibility and residue hydrophobicity as features. The server allows the submission of up to 10 protein sequences, and the predictions for each sequence are provided on a web page and via email. The prediction results are provided in sequence format with predicted RBRs highlighted, in text format with the SVM threshold score indicated and as a graph which enables users to quickly identify those residues above any specific SVM threshold. The graph effectively enables the increase or decrease of the false positive rate. When tested on a non-redundant data set of 42 protein sequences not used in training, the PiRaNhA server achieved an accuracy of 85%, specificity of 90% and a Matthews correlation coefficient of 0.41 and outperformed other publicly available servers. The PiRaNhA prediction server is freely available at http://www.bioinformatics.sussex.ac.uk/PIRANHA. PMID:20507911

  11. Foreshock Sequences and Short-Term Earthquake Predictability on East Pacific Rise Transform Faults

    NASA Astrophysics Data System (ADS)

    McGuire, J. J.; Boettcher, M. S.; Jordan, T. H.

    2004-12-01

    A predominant view of continental seismicity postulates that all earthquakes initiate in a similar manner regardless of their eventual size and that earthquake triggering can be described by an Endemic Type Aftershock Sequence (ETAS) model [e.g. Ogata, 1988, Helmstetter and Sorenette 2002]. These null hypotheses cannot be rejected as an explanation for the relative abundances of foreshocks and aftershocks to large earthquakes in California [Helmstetter et al., 2003]. An alternative location for testing this hypothesis is mid-ocean ridge transform faults (RTFs), which have many properties that are distinct from continental transform faults: most plate motion is accommodated aseismically, many large earthquakes are slow events enriched in low-frequency radiation, and the seismicity shows depleted aftershock sequences and high foreshock activity. Here we use the 1996-2001 NOAA-PMEL hydroacoustic seismicity catalog for equatorial East Pacific Rise transform faults to show that the foreshock/aftershock ratio is two orders of magnitude greater than the ETAS prediction based on global RTF aftershock abundances. We can thus reject the null hypothesis that there is no fundamental distinction between foreshocks, mainshocks, and aftershocks on RTFs. We further demonstrate (retrospectively) that foreshock sequences on East Pacific Rise transform faults can be used to achieve statistically significant short-term prediction of large earthquakes (magnitude ≥ 5.4) with good spatial (15-km) and temporal (1-hr) resolution using the NOAA-PMEL catalogs. Our very simplistic approach produces a large number of false alarms, but it successfully predicts the majority (70%) of M≥5.4 earthquakes while covering only a tiny fraction (0.15%) of the total potential space-time volume with alarms. Therefore, it achieves a large probability gain (about a factor of 500) over random guessing, despite not using any near field data. The predictability of large EPR transform earthquakes suggests

  12. DNA Cloning of Plasmodium falciparum Circumsporozoite Gene: Amino Acid Sequence of Repetitive Epitope

    NASA Astrophysics Data System (ADS)

    Enea, Vincenzo; Ellis, Joan; Zavala, Fidel; Arnot, David E.; Asavanich, Achara; Masuda, Aoi; Quakyi, Isabella; Nussenzweig, Ruth S.

    1984-08-01

    A clone of complementary DNA encoding the circumsporozoite (CS) protein of the human malaria parasite Plasmodium falciparum has been isolated by screening an Escherichia coli complementary DNA library with a monoclonal antibody to the CS protein. The DNA sequence of the complementary DNA insert encodes a four-amino acid sequence: proline-asparagine-alanine-asparagine, tandemly repeated 23 times. The CS β -lactamase fusion protein specifically binds monoclonal antibodies to the CS protein and inhibits the binding of these antibodies to native Plasmodium falciparum CS protein. These findings provide a basis for the development of a vaccine against Plasmodium falciparum malaria.

  13. Nucleotide and amino acid sequences of human intestinal alkaline phosphatase: close homology to placental alkaline phosphatase

    SciTech Connect

    Henthorn, P.S.; Raducha, M.; Edwards, Y.H.; Weiss, M.J.; Slaughter, C.; Lafferty, M.A.; Harris, H.

    1987-03-01

    A cDNA clone for human adult intestinal alkaline phosphatase (ALP) (orthophosphoric-monoester phosphohydrolase (alkaline optimum); EC 3.1.3.1) was isolated from a lambdagt11 expression library. The cDNA insert of this clone is 2513 base pairs in length and contains an open reading frame that encodes a 528-amino acid polypeptide. This deduced polypeptide contains the first 40 amino acids of human intestinal ALP, as determined by direct protein sequencing. Intestinal ALP shows 86.5% amino acid identity to placental (type 1) ALP and 56.6% amino acid identity to liver/bone/kidney ALP. In the 3'-untranslated regions, intestinal and placental ALP cDNAs are 73.5% identical (excluding gaps). The evolution of this multigene enzyme family is discussed.

  14. Ligand Similarity Complements Sequence, Physical Interaction, and Co-Expression for Gene Function Prediction.

    PubMed

    O'Meara, Matthew J; Ballouz, Sara; Shoichet, Brian K; Gillis, Jesse

    2016-01-01

    The expansion of protein-ligand annotation databases has enabled large-scale networking of proteins by ligand similarity. These ligand-based protein networks, which implicitly predict the ability of neighboring proteins to bind related ligands, may complement biologically-oriented gene networks, which are used to predict functional or disease relevance. To quantify the degree to which such ligand-based protein associations might complement functional genomic associations, including sequence similarity, physical protein-protein interactions, co-expression, and disease gene annotations, we calculated a network based on the Similarity Ensemble Approach (SEA: sea.docking.org), where protein neighbors reflect the similarity of their ligands. We also measured the similarity with functional genomic networks over a common set of 1,131 genes, and found that the networks had only small overlaps, which were significant only due to the large scale of the data. Consistent with the view that the networks contain different information, combining them substantially improved Molecular Function prediction within GO (from AUROC~0.63-0.75 for the individual data modalities to AUROC~0.8 in the aggregate). We investigated the boost in guilt-by-association gene function prediction when the networks are combined and describe underlying properties that can be further exploited. PMID:27467773

  15. Ligand Similarity Complements Sequence, Physical Interaction, and Co-Expression for Gene Function Prediction

    PubMed Central

    Shoichet, Brian K.; Gillis, Jesse

    2016-01-01

    The expansion of protein-ligand annotation databases has enabled large-scale networking of proteins by ligand similarity. These ligand-based protein networks, which implicitly predict the ability of neighboring proteins to bind related ligands, may complement biologically-oriented gene networks, which are used to predict functional or disease relevance. To quantify the degree to which such ligand-based protein associations might complement functional genomic associations, including sequence similarity, physical protein-protein interactions, co-expression, and disease gene annotations, we calculated a network based on the Similarity Ensemble Approach (SEA: sea.docking.org), where protein neighbors reflect the similarity of their ligands. We also measured the similarity with functional genomic networks over a common set of 1,131 genes, and found that the networks had only small overlaps, which were significant only due to the large scale of the data. Consistent with the view that the networks contain different information, combining them substantially improved Molecular Function prediction within GO (from AUROC~0.63–0.75 for the individual data modalities to AUROC~0.8 in the aggregate). We investigated the boost in guilt-by-association gene function prediction when the networks are combined and describe underlying properties that can be further exploited. PMID:27467773

  16. A probabilistic model to predict clinical phenotypic traits from genome sequencing.

    PubMed

    Chen, Yun-Ching; Douville, Christopher; Wang, Cheng; Niknafs, Noushin; Yeo, Grace; Beleva-Guthrie, Violeta; Carter, Hannah; Stenson, Peter D; Cooper, David N; Li, Biao; Mooney, Sean; Karchin, Rachel

    2014-09-01

    Genetic screening is becoming possible on an unprecedented scale. However, its utility remains controversial. Although most variant genotypes cannot be easily interpreted, many individuals nevertheless attempt to interpret their genetic information. Initiatives such as the Personal Genome Project (PGP) and Illumina's Understand Your Genome are sequencing thousands of adults, collecting phenotypic information and developing computational pipelines to identify the most important variant genotypes harbored by each individual. These pipelines consider database and allele frequency annotations and bioinformatics classifications. We propose that the next step will be to integrate these different sources of information to estimate the probability that a given individual has specific phenotypes of clinical interest. To this end, we have designed a Bayesian probabilistic model to predict the probability of dichotomous phenotypes. When applied to a cohort from PGP, predictions of Gilbert syndrome, Graves' disease, non-Hodgkin lymphoma, and various blood groups were accurate, as individuals manifesting the phenotype in question exhibited the highest, or among the highest, predicted probabilities. Thirty-eight PGP phenotypes (26%) were predicted with area-under-the-ROC curve (AUC)>0.7, and 23 (15.8%) of these were statistically significant, based on permutation tests. Moreover, in a Critical Assessment of Genome Interpretation (CAGI) blinded prediction experiment, the models were used to match 77 PGP genomes to phenotypic profiles, generating the most accurate prediction of 16 submissions, according to an independent assessor. Although the models are currently insufficiently accurate for diagnostic utility, we expect their performance to improve with growth of publicly available genomics data and model refinement by domain experts.

  17. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F.W.

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.

  18. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F. William

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.

  19. Predicting the functional consequences of cancer-associated amino acid substitutions

    PubMed Central

    Shihab, Hashem A.; Gough, Julian; Cooper, David N.; Day, Ian N. M.; Gaunt, Tom R.

    2013-01-01

    Motivation: The number of missense mutations being identified in cancer genomes has greatly increased as a consequence of technological advances and the reduced cost of whole-genome/whole-exome sequencing methods. However, a high proportion of the amino acid substitutions detected in cancer genomes have little or no effect on tumour progression (passenger mutations). Therefore, accurate automated methods capable of discriminating between driver (cancer-promoting) and passenger mutations are becoming increasingly important. In our previous work, we developed the Functional Analysis through Hidden Markov Models (FATHMM) software and, using a model weighted for inherited disease mutations, observed improved performances over alternative computational prediction algorithms. Here, we describe an adaptation of our original algorithm that incorporates a cancer-specific model to potentiate the functional analysis of driver mutations. Results: The performance of our algorithm was evaluated using two separate benchmarks. In our analysis, we observed improved performances when distinguishing between driver mutations and other germ line variants (both disease-causing and putatively neutral mutations). In addition, when discriminating between somatic driver and passenger mutations, we observed performances comparable with the leading computational prediction algorithms: SPF-Cancer and TransFIC. Availability and implementation: A web-based implementation of our cancer-specific model, including a downloadable stand-alone package, is available at http://fathmm.biocompute.org.uk. Contact: fathmm@biocompute.org.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23620363

  20. Predicting most probable conformations of a given peptide sequence in the random coil state.

    PubMed

    Bayrak, Cigdem Sevim; Erman, Burak

    2012-11-01

    In this work, we present a computational scheme for finding high probability conformations of peptides. The scheme calculates the probability of a given conformation of the given peptide sequence using the probability distribution of torsion states. Dependence of the states of a residue on the states of its first neighbors along the chain is considered. Prior probabilities of torsion states are obtained from a coil library. Posterior probabilities are calculated by the matrix multiplication Rotational Isomeric States Model of polymer theory. The conformation of a peptide with highest probability is determined by using a hidden Markov model Viterbi algorithm. First, the probability distribution of the torsion states of the residues is obtained. Using the highest probability torsion state, one can generate, step by step, states with lower probabilities. To validate the method, the highest probability state of residues in a given sequence is calculated and compared with probabilities obtained from the Coil Databank. Predictions based on the method are 32% better than predictions based on the most probable states of residues. The ensemble of "n" high probability conformations of a given protein is also determined using the Viterbi algorithm with multistep backtracking. PMID:22955874

  1. The complete amino acid sequence of lectin-C from the roots of pokeweed (Phytolacca americana).

    PubMed

    Yamaguchi, K; Mori, A; Funatsu, G

    1995-07-01

    The complete amino acid sequence of pokeweed lectin-C (PL-C) consisting of 126 residues has been determined. PL-C is an acidic simple protein with molecular mass of 13,747 Da and consists of three cysteine-rich domains with 51-63% homology. PL-C shows homology to chitin-binding proteins such as wheat germ agglutinin, and all eight cysteine residues in the three domains of PL-C are completely conserved in all other chitin-binding domains.

  2. Amino-acid sequence of a cooperative, dimeric myoglobin from the gastropod mollusc, Buccinum undatum L.

    PubMed

    Wen, D; Laursen, R A

    1994-10-19

    The complete amino-acid sequence of a dimeric myoglobin from the radular mussel of the gastropod mollusc, Buccinum undatum L. has been determined. The globin, which shows cooperative binding of oxygen, contains 146 amino acids, is N-terminal aminoacetylated, and has histidine residues at position 65 and 97, corresponding to the heme-binding histidines seen in mammalian myoglobins. It shows about 75% and 50% homology, respectively, with the dimeric molluscan myoglobins from Busycon canaliculatum and Cerithidea rhizophorarum, the former of which also shows weak cooperatively, but much less similarity to other species of myoglobin and hemoglobin.

  3. Depositional sequence analysis and sedimentologic modeling for improved prediction of Pennsylvanian reservoirs

    SciTech Connect

    Watney, W.L.

    1991-01-01

    The objectives of this research are to: (1) assist producers in locating and producing petroleum not currently produced because of technological problems or the inability to identify details of reservoir compartmentalization, (2) to decrease risk in field development, and (3) accelerate the retrieval and analysis of baseline geoscience information for initial reservoir description. The interdisciplinary data sought in this research will be used to resolve specific problems in correlation of strata and to establish the mechanisms responsible for the Upper Pennsylvanian stratigraphic architecture in the Midcontinent. The data will better constrain ancillary problems related to the validation of depositional sequence and subsequence correlation, subsidence patterns, sedimentation rates, sea-level changes, and the relationship of sedimentary sequences to basement terrains. The geoscientific information, including data from field studies, surface and near-surface reservoir analogues, and regional database development, will also be used for development of geologic computer process-based simulation models tailored to specific depositional sequences for use in improving prediction of reservoir characteristics. 4 refs.

  4. Weighted sequence motifs as an improved seeding step in microRNA target prediction algorithms.

    PubMed

    Saetrom, Ola; Snøve, Ola; Saetrom, Pål

    2005-07-01

    We present a new microRNA target prediction algorithm called TargetBoost, and show that the algorithm is stable and identifies more true targets than do existing algorithms. TargetBoost uses machine learning on a set of validated microRNA targets in lower organisms to create weighted sequence motifs that capture the binding characteristics between microRNAs and their targets. Existing algorithms require candidates to have (1) near-perfect complementarity between microRNAs' 5' end and their targets; (2) relatively high thermodynamic duplex stability; (3) multiple target sites in the target's 3' UTR; and (4) evolutionary conservation of the target between species. Most algorithms use one of the two first requirements in a seeding step, and use the three others as filters to improve the method's specificity. The initial seeding step determines an algorithm's sensitivity and also influences its specificity. As all algorithms may add filters to increase the specificity, we propose that methods should be compared before such filtering. We show that TargetBoost's weighted sequence motif approach is favorable to using both the duplex stability and the sequence complementarity steps. (TargetBoost is available as a Web tool from http://www.interagon.com/demo/.).

  5. The Complete Genome Sequence of the Lactic Acid Bacterium Lactococcus lactis ssp. lactis IL1403

    PubMed Central

    Bolotin, Alexander; Wincker, Patrick; Mauger, Stéphane; Jaillon, Olivier; Malarme, Karine; Weissenbach, Jean; Ehrlich, S. Dusko; Sorokin, Alexei

    2001-01-01

    Lactococcus lactis is a nonpathogenic AT-rich gram-positive bacterium closely related to the genus Streptococcus and is the most commonly used cheese starter. It is also the best-characterized lactic acid bacterium. We sequenced the genome of the laboratory strain IL1403, using a novel two-step strategy that comprises diagnostic sequencing of the entire genome and a shotgun polishing step. The genome contains 2,365,589 base pairs and encodes 2310 proteins, including 293 protein-coding genes belonging to six prophages and 43 insertion sequence (IS) elements. Nonrandom distribution of IS elements indicates that the chromosome of the sequenced strain may be a product of recent recombination between two closely related genomes. A complete set of late competence genes is present, indicating the ability of L. lactis to undergo DNA transformation. Genomic sequence revealed new possibilities for fermentation pathways and for aerobic respiration. It also indicated a horizontal transfer of genetic information from Lactococcus to gram-negative enteric bacteria of Salmonella-Escherichia group. [The sequence data described in this paper has been submitted to the GenBank data library under accession no. AE005176.] PMID:11337471

  6. Folic acid intake from fortification in United States exceeds predictions.

    PubMed

    Choumenkovitch, Silvina F; Selhub, Jacob; Wilson, Peter W F; Rader, Jeanne I; Rosenberg, Irwin H; Jacques, Paul F

    2002-09-01

    In 1996, the U.S. Food and Drug Administration issued a regulation requiring that all enriched cereal-grain products be fortified with folic acid by January 1998. An average increase in folic acid intake of 100 micro g/d was projected as a result of this fortification. The objective of the present study was to estimate the effect of this fortification on the intake of folic acid and total folate, and on the prevalence of individuals with inadequate folate intake and with high folic acid intake. We used data on food and nutrient intake from 1480 individuals who participated in the 5th and 6th examinations of the Framingham Offspring Cohort Study. Fortification was instituted during the 6th examination so that 931 participants were examined before its implementation (nonexposed) and 549 after implementation (exposed). Published data on total folate in enriched cereal-grain products were used to correct folate content in these foods to reflect fortification. Among nonsupplement users, folic acid intake increased by a mean of 190 [95% confidence interval (CI): 176, 204] micro g/d (P < 0.001) and total folate intake increased by a mean of 323 (95% CI: 296-350) micro g dietary folate equivalents (DFE)/d (P < 0.001) in the exposed participants. Similar increases were seen among supplement users exposed to fortification. The prevalence of exposed individuals with total folate intake below the estimated average requirement (320 micro g DFE/d) decreased from 48.6% (95% CI: 44.2-53.1%) before fortification to 7.0% (95% CI: 3.1-10.9%) after fortification in individuals who did not use folic acid supplements. This prevalence was approximately 1% or less for users of supplements both before and after fortification. Prevalence of individuals with folic acid intake above the upper tolerable intake level (1000 micro g folic acid/d) increased only among supplement users exposed to fortification (from 1.3 to 11.3%, P < 0.001). No changes in folic acid intake were observed over time

  7. Amino acid sequence of human cholinesterase. Annual report, 30 September 1984-30 September 1985

    SciTech Connect

    Lockridge, O.

    1985-10-01

    The active-site serine residue is located 198 amino acids from the N-terminal. The active-site peptide was isolated from three different genetic types of human serum cholinesterase: from usual, atypical, and atypical-silent genotypes. It was found that the amino acid sequence of the active-site peptide was identical in all three genotypes. Comparison of the complete sequences of cholinesterase from human serum and acetylcholinesterase from the electric organ of Torpedo californica shows an identity of 53%. Cholinesterase is of interest to the Department of Defense because cholinesterase protects against organophosphate poisons of the type used in chemical warfare. The structural results presented here will serve as the basis for cloning the gene for cholinesterase. The potential uses of large amounts of cholinesterase would be for cleaning up spills of organophosphates and possibly for detoxifying exposed personnel.

  8. Amino acid sequence differences in pancreatic ribonucleases from water buffalo breeds from Indonesia and Italy.

    PubMed

    Sidik, A; Martena, B; Beintema, J J

    1979-12-01

    The amino acid sequences of the pancreatic ribonucleases from river-breed water buffaloes from Italy and swamp-breed water buffaloes from Indonesia differ at three positions. One of the differences involves a replacement of asparagine-34, with covalently attached carbohydrate on all molecules, in the river-breed enzyme by serine in the swamp-breed enzyme. The ribonuclease content of the pancreas differs considerably between breeds and is lower in river buffaloes. A ribonuclease preparation from two swamp buffaloes contained a minor glycosylated component. Preliminary evidence was obtained that the amino acid sequence of this component has factors in common with the main component of the swamp-breed ribonuclease and with the river-breed enzyme.

  9. On human disease-causing amino acid variants: statistical study of sequence and structural patterns

    PubMed Central

    Alexov, Emil

    2015-01-01

    Statistical analysis was carried out on large set of naturally occurring human amino acid variations and it was demonstrated that there is a preference for some amino acid substitutions to be associated with diseases. At an amino acid sequence level, it was shown that the disease-causing variants frequently involve drastic changes of amino acid physico-chemical properties of proteins such as charge, hydrophobicity and geometry. Structural analysis of variants involved in diseases and being frequently observed in human population showed similar trends: disease-causing variants tend to cause more changes of hydrogen bond network and salt bridges as compared with harmless amino acid mutations. Analysis of thermodynamics data reported in literature, both experimental and computational, indicated that disease-causing variants tend to destabilize proteins and their interactions, which prompted us to investigate the effects of amino acid mutations on large databases of experimentally measured energy changes in unrelated proteins. Although the experimental datasets were linked neither to diseases nor exclusory to human proteins, the observed trends were the same: amino acid mutations tend to destabilize proteins and their interactions. Having in mind that structural and thermodynamics properties are interrelated, it is pointed out that any large change of any of them is anticipated to cause a disease. PMID:25689729

  10. Medical target prediction from genome sequence: combining different sequence analysis algorithms with expert knowledge and input from artificial intelligence approaches.

    PubMed

    Dandekar, T; Du, F; Schirmer, R H; Schmidt, S

    2001-12-01

    By exploiting the rapid increase in available sequence data, the definition of medically relevant protein targets has been improved by a combination of: (i) differential genome analysis (target list): and (ii) analysis of individual proteins (target analysis). Fast sequence comparisons, data mining, and genetic algorithms further promote these procedures. Mycobacterium tuberculosis proteins were chosen as applied examples.

  11. Self-sequencing of amino acids and origins of polyfunctional protocells

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1984-01-01

    The role of proteins in the origin of living things is discussed. It has been experimentally established that amino acids can sequence themselves under simulated geological conditions with highly nonrandom products which accordingly contain diverse information. Multiple copies of each type of macromolecule are formed, resulting in greater power for any protoenzymic molecule than would accrue from a single copy of each type. Thermal proteins are readily incorporated into laboratory protocells. The experimental evidence for original polyfunctional protocells is discussed.

  12. Comparisons of the Distribution of Nucleotides and Common Sequences in Deoxyribonucleic Acid from Selected Bacteriophages

    PubMed Central

    Skalka, A.; Hanson, P.

    1972-01-01

    Results from comparisons of deoxyribonucleic acid (DNA) from several classes of bacteriophages suggest that most phage chromosomes contain either a homogeneous distribution of nucleotides or are made up of a few, rather large segments of different quanine plus cytosine (G + C) contents which are internally homogeneous. Among those temperate phages tested, most contained segmented DNA. Comparisons of sequence similarities among segments from lambdoid phage DNA species revealed the following order in relatedness to λ: 82 (and 434) > 21 > 424 > φ80. Most common sequences are found in the highest G + C segments, which in λ contain head and tail genes. Hybridization tests with λ and 186 or P2 DNA species verified that the lambdoids and 186 and P2 belong to two distinct groups. There are fewer homologous sequences between the DNA species of coliphages λ and P2 or 186 than there are between the DNA species of coliphage λ and salmonella phage P22. PMID:4553679

  13. Structure of the fully modified left-handed cyclohexene nucleic acid sequence GTGTACAC.

    PubMed

    Robeyns, Koen; Herdewijn, Piet; Van Meervelt, Luc

    2008-02-13

    CeNA oligonucleotides consist of a phosphorylated backbone where the deoxyribose sugars are replaced by cyclohexene moieties. The X-ray structure determination and analysis of a fully modified octamer sequence GTGTACAC, which is the first crystal structure of a carbocyclic-based nucleic acid, is presented. This particular sequence was built with left-handed building blocks and crystallizes as a left-handed double helix. The helix can be characterized as belonging to the (mirrored) A-type family. Crystallographic data were processed up to 1.53 A, and the octamer sequence crystallizes in the space group R32. The sugar puckering is found to adopt the 3H2 half-chair conformation which mimics the C3'-endo conformation of the ribose sugar. The double helices stack on top of each other to form continuous helices, and static disorder is observed due to this end-to-end stacking.

  14. Amino acid sequence of a protease inhibitor isolated from Sarcophaga bullata determined by mass spectrometry.

    PubMed

    Papayannopoulos, I A; Biemann, K

    1992-02-01

    The amino acid sequence of a protease inhibitor isolated from the hemolymph of Sarcophaga bullata larvae was determined by tandem mass spectrometry. Homology considerations with respect to other protease inhibitors with known primary structures assisted in the choice of the procedure followed in the sequence determination and in the alignment of the various peptides obtained from specific chemical cleavage at cysteines and enzyme digests of the S. bullata protease inhibitor. The resulting sequence of 57 residues is as follows: Val Asp Lys Ser Ala Cys Leu Gln Pro Lys Glu Val Gly Pro Cys Arg Lys Ser Asp Phe Val Phe Phe Tyr Asn Ala Asp Thr Lys Ala Cys Glu Glu Phe Leu Tyr Gly Gly Cys Arg Gly Asn Asp Asn Arg Phe Asn Thr Lys Glu Glu Cys Glu Lys Leu Cys Leu.

  15. Fatty Acid Profile and Unigene-Derived Simple Sequence Repeat Markers in Tung Tree (Vernicia fordii)

    PubMed Central

    Zhang, Lin; Jia, Baoguang; Tan, Xiaofeng; Thammina, Chandra S.; Long, Hongxu; Liu, Min; Wen, Shanna; Song, Xianliang; Cao, Heping

    2014-01-01

    Tung tree (Vernicia fordii) provides the sole source of tung oil widely used in industry. Lack of fatty acid composition and molecular markers hinders biochemical, genetic and breeding research. The objectives of this study were to determine fatty acid profiles and develop unigene-derived simple sequence repeat (SSR) markers in tung tree. Fatty acid profiles of 41 accessions showed that the ratio of α-eleostearic acid was increasing continuously with a parallel trend to the amount of tung oil accumulation while the ratios of other fatty acids were decreasing in different stages of the seeds and that α-eleostearic acid (18∶3) consisted of 77% of the total fatty acids in tung oil. Transcriptome sequencing identified 81,805 unigenes from tung cDNA library constructed using seed mRNA and discovered 6,366 SSRs in 5,404 unigenes. The di- and tri-nucleotide microsatellites accounted for 92% of the SSRs with AG/CT and AAG/CTT being the most abundant SSR motifs. Fifteen polymorphic genic-SSR markers were developed from 98 unigene loci tested in 41 cultivated tung accessions by agarose gel and capillary electrophoresis. Genbank database search identified 10 of them putatively coding for functional proteins. Quantitative PCR demonstrated that all 15 polymorphic SSR-associated unigenes were expressed in tung seeds and some of them were highly correlated with oil composition in the seeds. Dendrogram revealed that most of the 41 accessions were clustered according to the geographic region. These new polymorphic genic-SSR markers will facilitate future studies on genetic diversity, molecular fingerprinting, comparative genomics and genetic mapping in tung tree. The lipid profiles in the seeds of 41 tung accessions will be valuable for biochemical and breeding studies. PMID:25167054

  16. LncDisease: a sequence based bioinformatics tool for predicting lncRNA-disease associations

    PubMed Central

    Wang, Junyi; Ma, Ruixia; Ma, Wei; Chen, Ji; Yang, Jichun; Xi, Yaguang; Cui, Qinghua

    2016-01-01

    LncRNAs represent a large class of noncoding RNA molecules that have important functions and play key roles in a variety of human diseases. There is an urgent need to develop bioinformatics tools as to gain insight into lncRNAs. This study developed a sequence-based bioinformatics method, LncDisease, to predict the lncRNA-disease associations based on the crosstalk between lncRNAs and miRNAs. Using LncDisease, we predicted the lncRNAs associated with breast cancer and hypertension. The breast-cancer-associated lncRNAs were studied in two breast tumor cell lines, MCF-7 and MDA-MB-231. The qRT-PCR results showed that 11 (91.7%) of the 12 predicted lncRNAs could be validated in both breast cancer cell lines. The hypertension-associated lncRNAs were further evaluated in human vascular smooth muscle cells (VSMCs) stimulated with angiotensin II (Ang II). The qRT-PCR results showed that 3 (75.0%) of the 4 predicted lncRNAs could be validated in Ang II-treated human VSMCs. In addition, we predicted 6 diseases associated with the lncRNA GAS5 and validated 4 (66.7%) of them by literature mining. These results greatly support the specificity and efficacy of LncDisease in the study of lncRNAs in human diseases. The LncDisease software is freely available on the Software Page: http://www.cuilab.cn/. PMID:26887819

  17. Some properties and amino acid sequence of plastocyanin from a green alga, Ulva arasakii.

    PubMed

    Yoshizaki, F; Fukazawa, T; Mishina, Y; Sugimura, Y

    1989-08-01

    Plastocyanin was purified from a multicellular, marine green alga, Ulva arasakii, by conventional methods to homogeneity. The oxidized plastocyanin showed absorption maxima at 252, 276.8, 460, 595.3, and 775 nm, and shoulders at 259, 265, 269, and 282.5 nm; the ratio A276.8/A595.3 was 1.5. The midpoint redox potential was determined to be 0.356 V at pH 7.0 with a ferri- and ferrocyanide system. The molecular weight was estimated to be 10,200 and 11,000 by SDS-PAGE and by gel filtration, respectively. U. arasakii also has a small amount of cytochrome c6, like Enteromorpha prolifera. The amino acid sequence of U. arasakii plastocyanin was determined by Edman degradation and by carboxypeptidase digestion of the plastocyanin, six tryptic peptides, and five staphylococcal protease peptides. The plastocyanin contained 98 amino acid residues, giving a molecular weight of 10,236 including one copper atom. The complete sequence is as follows: AQIVKLGGDDGALAFVPSKISVAAGEAIEFVNNAGFPHNIVFDEDAVPAGVDADAISYDDYLNSKGETV VRKLSTPGVY G VYCEPHAGAGMKMTITVQ. The sequence of U. arasakii plastocyanin is closet to that of the E. prolifera protein (85% homology). A phylogenetic tree of five algal and two higher plant plastocyanins was constructed by comparing the amino acid differences. The branching order is considered to be as follows: a blue-green alga, unicellular green algae, multicellular green algae, and higher plants. PMID:2509442

  18. Characterization of the microbial acid mine drainage microbial community using culturing and direct sequencing techniques.

    PubMed

    Auld, Ryan R; Myre, Maxine; Mykytczuk, Nadia C S; Leduc, Leo G; Merritt, Thomas J S

    2013-05-01

    We characterized the bacterial community from an AMD tailings pond using both classical culturing and modern direct sequencing techniques and compared the two methods. Acid mine drainage (AMD) is produced by the environmental and microbial oxidation of minerals dissolved from mining waste. Surprisingly, we know little about the microbial communities associated with AMD, despite the fundamental ecological roles of these organisms and large-scale economic impact of these waste sites. AMD microbial communities have classically been characterized by laboratory culturing-based techniques and more recently by direct sequencing of marker gene sequences, primarily the 16S rRNA gene. In our comparison of the techniques, we find that their results are complementary, overall indicating very similar community structure with similar dominant species, but with each method identifying some species that were missed by the other. We were able to culture the majority of species that our direct sequencing results indicated were present, primarily species within the Acidithiobacillus and Acidiphilium genera, although estimates of relative species abundance were only obtained from direct sequencing. Interestingly, our culture-based methods recovered four species that had been overlooked from our sequencing results because of the rarity of the marker gene sequences, likely members of the rare biosphere. Further, direct sequencing indicated that a single genus, completely missed in our culture-based study, Legionella, was a dominant member of the microbial community. Our results suggest that while either method does a reasonable job of identifying the dominant members of the AMD microbial community, together the methods combine to give a more complete picture of the true diversity of this environment. PMID:23485423

  19. Complete amino acid sequence of chitinase-A from leaves of pokeweed (Phytolacca americana).

    PubMed

    Yamagami, T; Tanigawa, M; Ishiguro, M; Funatsu, G

    1998-04-01

    The complete amino acid sequence of pokeweed leaf chitinase-A was determined. First all 11 tryptic peptides from the reduced and S-carboxymethylated form of the enzyme were sequenced. Then the same form of the enzyme was cleaved with cyanogen bromide, giving three fragments. The fragments were digested with chymotrypsin or Staphylococcus aureus V8 protease. Last, the 11 tryptic peptides were put in order. Of seven cysteine residues, six were linked by disulfide bonds (between Cys25 and Cys74, Cys89 and Cys98, and Cys195 and Cys208); Cys176 was free. The enzyme consisted of 208 amino acid residues and had a molecular weight of 22,391. It consisted of only one polypeptide chain without a chitin-binding domain. The length of the chain was almost the same as that of the catalytic domains of class IL chitinases. These findings suggested that this enzyme is a new kind of class IIL chitinase, although its sequence resembles that of catalytic domains of class IL chitinases more than that of the class IIL chitinases reported so far. Discussion on the involvement of specific tryptophan residue in the active site of PLC-A is also given based on the sequence similarity with rye seed chitinase-c.

  20. Metazoan remaining genes for essential amino acid biosynthesis: sequence conservation and evolutionary analyses.

    PubMed

    Costa, Igor R; Thompson, Julie D; Ortega, José Miguel; Prosdocimi, Francisco

    2014-12-24

    Essential amino acids (EAA) consist of a group of nine amino acids that animals are unable to synthesize via de novo pathways. Recently, it has been found that most metazoans lack the same set of enzymes responsible for the de novo EAA biosynthesis. Here we investigate the sequence conservation and evolution of all the metazoan remaining genes for EAA pathways. Initially, the set of all 49 enzymes responsible for the EAA de novo biosynthesis in yeast was retrieved. These enzymes were used as BLAST queries to search for similar sequences in a database containing 10 complete metazoan genomes. Eight enzymes typically attributed to EAA pathways were found to be ubiquitous in metazoan genomes, suggesting a conserved functional role. In this study, we address the question of how these genes evolved after losing their pathway partners. To do this, we compared metazoan genes with their fungal and plant orthologs. Using phylogenetic analysis with maximum likelihood, we found that acetolactate synthase (ALS) and betaine-homocysteine S-methyltransferase (BHMT) diverged from the expected Tree of Life (ToL) relationships. High sequence conservation in the paraphyletic group Plant-Fungi was identified for these two genes using a newly developed Python algorithm. Selective pressure analysis of ALS and BHMT protein sequences showed higher non-synonymous mutation ratios in comparisons between metazoans/fungi and metazoans/plants, supporting the hypothesis that these two genes have undergone non-ToL evolution in animals.

  1. The amino acid sequence of the aspartate aminotransferase from baker's yeast (Saccharomyces cerevisiae).

    PubMed Central

    Cronin, V B; Maras, B; Barra, D; Doonan, S

    1991-01-01

    1. The single (cytosolic) aspartate aminotransferase was purified in high yield from baker's yeast (Saccharomyces cerevisiae). 2. Amino-acid-sequence analysis was carried out by digestion of the protein with trypsin and with CNBr; some of the peptides produced were further subdigested with Staphylococcus aureus V8 proteinase or with pepsin. Peptides were sequenced by the dansyl-Edman method and/or by automated gas-phase methods. The amino acid sequence obtained was complete except for a probable gap of two residues as indicated by comparison with the structures of counterpart proteins in other species. 3. The N-terminus of the enzyme is blocked. Fast-atom-bombardment m.s. was used to identify the blocking group as an acetyl one. 4. Alignment of the sequence of the enzyme with those of vertebrate cytosolic and mitochondrial aspartate aminotransferases and with the enzyme from Escherichia coli showed that about 25% of residues are conserved between these distantly related forms. 5. Experimental details and confirmatory data for the results presented here are given in a Supplementary Publication (SUP 50164, 25 pages) that has been deposited at the British Library Document Supply Centre, Boston Spa. Wetherby, West Yorkshire LS23 7 BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1991) 273, 5. PMID:1859361

  2. [MOLECULAR EVOLUTION OF ION CHANNELS: AMINO ACID SEQUENCES AND 3D STRUCTURES].

    PubMed

    Korkosh, V S; Zhorov, B S; Tikhonov, D B

    2016-01-01

    An integral part of modern evolutionary biology is comparative analysis of structure and function of macromolecules such as proteins. The first and critical step to understand evolution of homologous proteins is their amino acid sequence alignment. However, standard algorithms fop not provide unambiguous sequence alignments for proteins of poor homology. More reliable results can be obtained by comparing experimental 3D structures obtained at atomic resolution, for instance, with the aid of X-ray structural analysis. If such structures are lacking, homology modeling is used, which may take into account indirect experimental data on functional roles of individual amino-acid residues. An important problem is that the sequence alignment, which reflects genetic modifications, does not necessarily correspond to the functional homology. The latter depends on three-dimensional structures which are critical for natural selection. Since alignment techniques relying only on the analysis of primary structures carry no information on the functional properties of proteins, including 3D structures into consideration is very important. Here we consider several examples involving ion channels and demonstrate that alignment of their three-dimensional structures can significantly improve sequence alignments obtained by traditional methods.

  3. Accurate ab initio prediction of NMR chemical shifts of nucleic acids and nucleic acids/protein complexes

    PubMed Central

    Victora, Andrea; Möller, Heiko M.; Exner, Thomas E.

    2014-01-01

    NMR chemical shift predictions based on empirical methods are nowadays indispensable tools during resonance assignment and 3D structure calculation of proteins. However, owing to the very limited statistical data basis, such methods are still in their infancy in the field of nucleic acids, especially when non-canonical structures and nucleic acid complexes are considered. Here, we present an ab initio approach for predicting proton chemical shifts of arbitrary nucleic acid structures based on state-of-the-art fragment-based quantum chemical calculations. We tested our prediction method on a diverse set of nucleic acid structures including double-stranded DNA, hairpins, DNA/protein complexes and chemically-modified DNA. Overall, our quantum chemical calculations yield highly/very accurate predictions with mean absolute deviations of 0.3–0.6 ppm and correlation coefficients (r2) usually above 0.9. This will allow for identifying misassignments and validating 3D structures. Furthermore, our calculations reveal that chemical shifts of protons involved in hydrogen bonding are predicted significantly less accurately. This is in part caused by insufficient inclusion of solvation effects. However, it also points toward shortcomings of current force fields used for structure determination of nucleic acids. Our quantum chemical calculations could therefore provide input for force field optimization. PMID:25404135

  4. A Novel Data Assimilation Methodology for Predicting Lithology Based on Sequence Labeling Algorithms

    NASA Astrophysics Data System (ADS)

    Park, E.; Jeong, J.; Han, W. S.; Kim, K. Y.

    2014-12-01

    A hidden Markov model (HMM) and a conditional random fields (CRFs) model for lithological predictions based on multiple geophysical well-logging data are derived for dealing with directional non-stationarity through bi-directional training and conditioning. The developed models were benchmarked against their conventional counterparts, and hypothetical boreholes with the corresponding synthetic geophysical data including artificial errors were employed. In the three test scenarios devised, the average fitness and unfitness values of the developed CRFs model and HMM are 0.84 and 0.071, and 0.81 and 0.084, respectively, while those of the conventional CRFs model and HMM are 0.78 and 0.091, and 0.77 and 0.099, respectively. Comparisons of their predictabilities show that the models designed for directional non-stationarity clearly perform better than the conventional models for all tested examples. Among them, the developed linear-chain CRFs model showed the best or close to the best performance with high predictability and a low training data requirement. Keywords: one-dimensional lithological characterization, sequence labeling algorithm, conditional random fields, hidden Markov model, borehole, geophysical well-logging data.

  5. Geptop: A Gene Essentiality Prediction Tool for Sequenced Bacterial Genomes Based on Orthology and Phylogeny

    PubMed Central

    Wei, Wen; Ning, Lu-Wen; Ye, Yuan-Nong; Guo, Feng-Biao

    2013-01-01

    Integrative genomics predictors, which score highly in predicting bacterial essential genes, would be unfeasible in most species because the data sources are limited. We developed a universal approach and tool designated Geptop, based on orthology and phylogeny, to offer gene essentiality annotations. In a series of tests, our Geptop method yielded higher area under curve (AUC) scores in the receiver operating curves than the integrative approaches. In the ten-fold cross-validations among randomly upset samples, Geptop yielded an AUC of 0.918, and in the cross-organism predictions for 19 organisms Geptop yielded AUC scores between 0.569 and 0.959. A test applied to the very recently determined essential gene dataset from the Porphyromonas gingivalis, which belongs to a phylum different with all of the above 19 bacterial genomes, gave an AUC of 0.77. Therefore, Geptop can be applied to any bacterial species whose genome has been sequenced. Compared with the essential genes uniquely identified by the lethal screening, the essential genes predicted only by Gepop are associated with more protein-protein interactions, especially in the three bacteria with lower AUC scores (<0.7). This may further illustrate the reliability and feasibility of our method in some sense. The web server and standalone version of Geptop are available at http://cefg.uestc.edu.cn/geptop/ free of charge. The tool has been run on 968 bacterial genomes and the results are accessible at the website. PMID:23977285

  6. A sequence-based computational approach to predicting PDZ domain-peptide interactions.

    PubMed

    Nakariyakul, Songyot; Liu, Zhi-Ping; Chen, Luonan

    2014-01-01

    The PDZ domain is one of the most ubiquitous protein domains that is involved in coordinating signaling complex formation and protein networking by reversibly interacting with multiple binding partners. It has been linked to many devastating diseases such as avian influenza, Fraser syndrome, Usher syndrome and Dejerine-Sottas neuropathy. Understanding the selectivity of PDZ domains can help elucidate how defects in PDZ proteins and their binding partners lead to human diseases. Since experimental methods to determine the interaction specificity of the PDZ domains are expensive and labor intensive, an accurate computational method is thus needed. Our developed support vector machine-based predictor using dipeptide composition is shown to qualitatively predict PDZ domain-peptide interaction with a high accuracy rate. Furthermore, since most of the dipeptide compositions are redundant and irrelevant, we propose a new hybrid feature selection technique to select only a subset of these compositions for interaction prediction. The experimental results show that only approximately 25% of dipeptide features are needed and that our method improves the prediction results significantly. The selected dipeptide features are also analyzed and shown to play important roles in specificity patterns of PDZ domains. Our method is based only on primary sequence information, and it can be used for the research of drug target and drug design in identifying PDZ domain-ligand interactions. This article is part of a Special Issue entitled: Computational Proteomics, Systems Biology & Clinical Implications. Guest Editor: Yudong Cai. PMID:23608946

  7. Geptop: a gene essentiality prediction tool for sequenced bacterial genomes based on orthology and phylogeny.

    PubMed

    Wei, Wen; Ning, Lu-Wen; Ye, Yuan-Nong; Guo, Feng-Biao

    2013-01-01

    Integrative genomics predictors, which score highly in predicting bacterial essential genes, would be unfeasible in most species because the data sources are limited. We developed a universal approach and tool designated Geptop, based on orthology and phylogeny, to offer gene essentiality annotations. In a series of tests, our Geptop method yielded higher area under curve (AUC) scores in the receiver operating curves than the integrative approaches. In the ten-fold cross-validations among randomly upset samples, Geptop yielded an AUC of 0.918, and in the cross-organism predictions for 19 organisms Geptop yielded AUC scores between 0.569 and 0.959. A test applied to the very recently determined essential gene dataset from the Porphyromonas gingivalis, which belongs to a phylum different with all of the above 19 bacterial genomes, gave an AUC of 0.77. Therefore, Geptop can be applied to any bacterial species whose genome has been sequenced. Compared with the essential genes uniquely identified by the lethal screening, the essential genes predicted only by Gepop are associated with more protein-protein interactions, especially in the three bacteria with lower AUC scores (<0.7). This may further illustrate the reliability and feasibility of our method in some sense. The web server and standalone version of Geptop are available at http://cefg.uestc.edu.cn/geptop/ free of charge. The tool has been run on 968 bacterial genomes and the results are accessible at the website.

  8. High nucleotide and amino acid sequence similarities in tumour necrosis factor-alpha amongst Indian buffalo (Bubalus bubalis), Indian cattle (Bos indicus) and other ruminants.

    PubMed

    Gupta, P K; Bind, R B; Walunj, S S; Saini, M

    2004-08-01

    Tumour necrosis factor-alpha (TNF-alpha) mRNA from Indian water buffalo (Bubalus bubalis) and Indian cattle (Bos indicus) was reverse transcribed and amplified using reverse transcriptase-polymerase chain reaction (RT-PCR). The nucleotide sequences of cDNAs were determined after cloning into pGEM-T-Easy vector (Promega, Madison, WI) and compared with reported nucleotide sequences of TNF-alpha cDNA from other species. The nucleotide sequences of TNF-alpha from Indian cattle revealed significantly high similarities at nucleotide (99.2%) and amino acid (100%) levels with those of cattle (Bos taurus; Zebu). The sequences from buffalo had 98.4% nucleotide and 99.1% amino acid similarities with Indian cattle, indicating functional cross-reactivity. One amino acid deletion at position 63 and one substitution (A-->P) at position 64 were observed in buffalo compared with Indian cattle. The amino acid deletion at position 63 was predicted due to differences in pre-mRNA splicing.

  9. Extremely acidophilic protists from acid mine drainage host Rickettsiales-lineage endosymbionts that have intervening sequences in their 16S rRNA genes.

    PubMed

    Baker, Brett J; Hugenholtz, Philip; Dawson, Scott C; Banfield, Jillian F

    2003-09-01

    During a molecular phylogenetic survey of extremely acidic (pH < 1), metal-rich acid mine drainage habitats in the Richmond Mine at Iron Mountain, Calif., we detected 16S rRNA gene sequences of a novel bacterial group belonging to the order Rickettsiales in the Alphaproteobacteria. The closest known relatives of this group (92% 16S rRNA gene sequence identity) are endosymbionts of the protist Acanthamoeba. Oligonucleotide 16S rRNA probes were designed and used to observe members of this group within acidophilic protists. To improve visualization of eukaryotic populations in the acid mine drainage samples, broad-specificity probes for eukaryotes were redesigned and combined to highlight this component of the acid mine drainage community. Approximately 4% of protists in the acid mine drainage samples contained endosymbionts. Measurements of internal pH of the protists showed that their cytosol is close to neutral, indicating that the endosymbionts may be neutrophilic. The endosymbionts had a conserved 273-nucleotide intervening sequence (IVS) in variable region V1 of their 16S rRNA genes. The IVS does not match any sequence in current databases, but the predicted secondary structure forms well-defined stem loops. IVSs are uncommon in rRNA genes and appear to be confined to bacteria living in close association with eukaryotes. Based on the phylogenetic novelty of the endosymbiont sequences and initial culture-independent characterization, we propose the name "Candidatus Captivus acidiprotistae." To our knowledge, this is the first report of an endosymbiotic relationship in an extremely acidic habitat.

  10. Multistrand Structure Prediction of Nucleic Acid Assemblies and Design of RNA Switches.

    PubMed

    Bindewald, Eckart; Afonin, Kirill A; Viard, Mathias; Zakrevsky, Paul; Kim, Taejin; Shapiro, Bruce A

    2016-03-01

    RNA is an attractive material for the creation of molecular logic gates that release programmed functionalities only in the presence of specific molecular interaction partners. Here we present HyperFold, a multistrand RNA/DNA structure prediction approach for predicting nucleic acid complexes that can contain pseudoknots. We show that HyperFold also performs competitively compared to other published folding algorithms. We performed a large variety of RNA/DNA hybrid reassociation experiments for different concentrations, DNA toehold lengths, and G+C content and find that the observed tendencies for reassociation correspond well to computational predictions. Importantly, we apply this method to the design and experimental verification of a two-stranded RNA molecular switch that upon binding to a single-stranded RNA toehold disease-marker trigger mRNA changes its conformation releasing an shRNA-like Dicer substrate structure. To demonstrate the concept, connective tissue growth factor (CTGF) mRNA and enhanced green fluorescent protein (eGFP) mRNA were chosen as trigger and target sequences, respectively. In vitro experiments confirm the formation of an RNA switch and demonstrate that the functional unit is being released when the trigger RNA interacts with the switch toehold. The designed RNA switch is shown to be functional in MDA-MB-231 breast cancer cells. Several other switches were also designed and tested. We conclude that this approach has considerable potential because, in principle, it allows the release of an siRNA designed against a gene that differs from the gene that is utilized as a biomarker for a disease state. PMID:26926528

  11. Structure prediction and evolution of a halo-acid dehalogenase of Burkholderia mallei.

    PubMed

    Rai, Alok R; Singh, Raghvendra Pratap; Srivastava, Alok Kumar; Dubey, Ramesh Chandra

    2012-01-01

    Environmental pollutants containing halogenated organic compounds e.g. haloacid, can cause a plethora of health problems. The structural and functional analyses of the gene responsible of their degradation are an important aspect for environmental studies and are important to human well-being. It has been shown that some haloacids are toxic and mutagenic. Microorganisms capable of degrading these haloacids can be found in the natural environment. One of these, a soil-borne Burkholderia mallei posses the ability to grow on monobromoacetate (MBA). This bacterium produces a haloacid dehalogenase that allows the cell to grow on MBA, a highly toxic and mutagenic environmental pollutant. For the structural and functional analysis, a 346 amino acid encoding protein sequence of haloacid dehalogenase is retrieve from NCBI data base. Primary and secondary structure analysis suggested that the high percentage of helices in the structure makes the protein more flexible for folding, which might increase protein interactions. The consensus protein sub-cellular localization predictions suggest that dehalogenase protein is a periplasmic protein 3D2GO server, suggesting that it is mainly employed in metabolic process followed by hydrolase activity and catalytic activity. The tertiary structure of protein was predicted by homology modeling. The result suggests that the protein is an unstable protein which is also an important characteristic of active enzyme enabling them to bind various cofactors and substrate for proper functioning. Validation of 3D structure was done using Ramachandran plot ProsA-web and RMSD score. This predicted information will help in better understanding of mechanism underlying haloacid dehalogenase encoding protein and its evolutionary relationship.

  12. Genomic-scale comparison of sequence- and structure-based methods of function prediction: Does structure provide additional insight?

    PubMed Central

    Fetrow, Jacquelyn S.; Siew, Naomi; Di Gennaro, Jeannine A.; Martinez-Yamout, Maria; Dyson, H. Jane; Skolnick, Jeffrey

    2001-01-01

    A function annotation method using the sequence-to-structure-to-function paradigm is applied to the identification of all disulfide oxidoreductases in the Saccharomyces cerevisiae genome. The method identifies 27 sequences as potential disulfide oxidoreductases. All previously known thioredoxins, glutaredoxins, and disulfide isomerases are correctly identified. Three of the 27 predictions are probable false-positives. Three novel predictions, which subsequently have been experimentally validated, are presented. Two additional novel predictions suggest a disulfide oxidoreductase regulatory mechanism for two subunits (OST3 and OST6) of the yeast oligosaccharyltransferase complex. Based on homology, this prediction can be extended to a potential tumor suppressor gene, N33, in humans, whose biochemical function was not previously known. Attempts to obtain a folded, active N33 construct to test the prediction were unsuccessful. The results show that structure prediction coupled with biochemically relevant structural motifs is a powerful method for the function annotation of genome sequences and can provide more detailed, robust predictions than function prediction methods that rely on sequence comparison alone. PMID:11316881

  13. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    PubMed Central

    Rhee, Mun Su; Moritz, Brélan E.; Xie, Gary; Glavina del Rio, T.; Dalin, E.; Tice, H.; Bruce, D.; Goodwin, L.; Chertkov, O.; Brettin, T.; Han, C.; Detter, C.; Pitluck, S.; Land, Miriam L.; Patel, Milind; Ou, Mark; Harbrucker, Roberta; Ingram, Lonnie O.; Shanmugam, K. T.

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 °C and pH 5.0 and ferments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 °C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemicellulose. This bacterium is also considered as a potential probiotic. Complete genome sequence of a representative strain, B. coagulans strain 36D1, is presented and discussed. PMID:22675583

  14. BeadCons: detection of nucleic acid sequences by flow cytometry.

    PubMed

    Horejsh, Douglas; Martini, Federico; Capobianchi, Maria Rosaria

    2005-11-01

    Molecular beacons are single-stranded nucleic acid structures with a terminal fluorophore and a distal, terminal quencher. These molecules are typically used in real-time PCR assays, but have also been conjugated with solid matrices. This unit describes protocols related to molecular beacon-conjugated beads (BeadCons), whose specific hybridization with complementary target sequences can be resolved by cytometry. Assay sensitivity is achieved through the concentration of fluorescence signal on discrete particles. By using molecular beacons with different fluorophores and microspheres of different sizes, it is possible to construct a fluid array system with each bead corresponding to a specific target nucleic acid. Methods are presented for the design, construction, and use of BeadCons for the specific, multiplexed detection of unlabeled nucleic acids in solution. The use of bead-based detection methods will likely lead to the design of new multiplex molecular diagnostic tools.

  15. Measuring nanometer distances in nucleic acids using a sequence-independent nitroxide probe

    PubMed Central

    Qin, Peter Z; Haworth, Ian S; Cai, Qi; Kusnetzow, Ana K; Grant, Gian Paola G; Price, Eric A; Sowa, Glenna Z; Popova, Anna; Herreros, Bruno; He, Honghang

    2008-01-01

    This protocol describes the procedures for measuring nanometer distances in nucleic acids using a nitroxide probe that can be attached to any nucleotide within a given sequence. Two nitroxides are attached to phosphorothioates that are chemically substituted at specific sites of DNA or RNA. Inter-nitroxide distances are measured using a four-pulse double electron–electron resonance technique, and the measured distances are correlated to the parent structures using a Web-accessible computer program. Four to five days are needed for sample labeling, purification and distance measurement. The procedures described herein provide a method for probing global structures and studying conformational changes of nucleic acids and protein/nucleic acid complexes. PMID:17947978

  16. [Partial sequence homology of FtsZ in phylogenetics analysis of lactic acid bacteria].

    PubMed

    Zhang, Bin; Dong, Xiu-zhu

    2005-10-01

    FtsZ is a structurally conserved protein, which is universal among the prokaryotes. It plays a key role in prokaryote cell division. A partial fragment of the ftsZ gene about 800bp in length was amplified and sequenced and a partial FtsZ protein phylogenetic tree for the lactic acid bacteria was constructed. By comparing the FtsZ phylogenetic tree with the 16S rDNA tree, it was shown that the two trees were similar in topology. Both trees revealed that Pediococcus spp. were closely related with L. casei group of Lactobacillus spp. , but less related with other lactic acid cocci such as Enterococcus and Streptococcus. The results also showed that the discriminative power of FtsZ was higher than that of 16S rDNA for either inter-species or inter-genus and could be a very useful tool in species identification of lactic acid bacteria. PMID:16342751

  17. Prediction of multi-drug resistance transporters using a novel sequence analysis method [version 2; referees: 2 approved

    DOE PAGES

    McDermott, Jason E.; Bruillard, Paul; Overall, Christopher C.; Gosink, Luke; Lindemann, Stephen R.

    2015-03-09

    There are many examples of groups of proteins that have similar function, but the determinants of functional specificity may be hidden by lack of sequencesimilarity, or by large groups of similar sequences with different functions. Transporters are one such protein group in that the general function, transport, can be easily inferred from the sequence, but the substrate specificity can be impossible to predict from sequence with current methods. In this paper we describe a linguistic-based approach to identify functional patterns from groups of unaligned protein sequences and its application to predict multi-drug resistance transporters (MDRs) from bacteria. We first showmore » that our method can recreate known patterns from PROSITE for several motifs from unaligned sequences. We then show that the method, MDRpred, can predict MDRs with greater accuracy and positive predictive value than a collection of currently available family-based models from the Pfam database. Finally, we apply MDRpred to a large collection of protein sequences from an environmental microbiome study to make novel predictions about drug resistance in a potential environmental reservoir.« less

  18. Prediction of multi-drug resistance transporters using a novel sequence analysis method [version 2; referees: 2 approved

    SciTech Connect

    McDermott, Jason E.; Bruillard, Paul; Overall, Christopher C.; Gosink, Luke; Lindemann, Stephen R.

    2015-03-09

    There are many examples of groups of proteins that have similar function, but the determinants of functional specificity may be hidden by lack of sequencesimilarity, or by large groups of similar sequences with different functions. Transporters are one such protein group in that the general function, transport, can be easily inferred from the sequence, but the substrate specificity can be impossible to predict from sequence with current methods. In this paper we describe a linguistic-based approach to identify functional patterns from groups of unaligned protein sequences and its application to predict multi-drug resistance transporters (MDRs) from bacteria. We first show that our method can recreate known patterns from PROSITE for several motifs from unaligned sequences. We then show that the method, MDRpred, can predict MDRs with greater accuracy and positive predictive value than a collection of currently available family-based models from the Pfam database. Finally, we apply MDRpred to a large collection of protein sequences from an environmental microbiome study to make novel predictions about drug resistance in a potential environmental reservoir.

  19. The amino acid sequence of Lady Amherst's pheasant (Chrysolophus amherstiae) and golden pheasant (Chrysolophus pictus) egg-white lysozymes.

    PubMed

    Araki, T; Kuramoto, M; Torikata, T

    1990-09-01

    The amino acids of Lady Amherst's pheasant and golden pheasant egg-white lysozymes have been sequenced. The carboxymethylated lysozymes were digested with trypsin followed by sequencing of the tryptic peptides. Lady Amherst's pheasant lysozyme proved to consist of 129 amino acid residues, and a relative molecular mass of 14,423 Da was calculated. This lysozyme had 6 amino acids substitutions when compared with hen egg-white lysozyme: Phe3 to Tyr, His15 to Leu, Gln41 to His, Asn77 to His, Gln 121 to Asn, and a newly found substitution of Ile124 to Thr. The amino acid sequence of golden pheasant lysozyme was identical to that of Lady Amherst's phesant lysozyme. The phylogenetic tree constructured by the comparison of amino acid sequences of phasianoid birds lysozymes revealed a minimum genetic distance between these pheasants and the turkey-peafowl group.

  20. The amino acid sequence of Lady Amherst's pheasant (Chrysolophus amherstiae) and golden pheasant (Chrysolophus pictus) egg-white lysozymes.

    PubMed

    Araki, T; Kuramoto, M; Torikata, T

    1990-09-01

    The amino acids of Lady Amherst's pheasant and golden pheasant egg-white lysozymes have been sequenced. The carboxymethylated lysozymes were digested with trypsin followed by sequencing of the tryptic peptides. Lady Amherst's pheasant lysozyme proved to consist of 129 amino acid residues, and a relative molecular mass of 14,423 Da was calculated. This lysozyme had 6 amino acids substitutions when compared with hen egg-white lysozyme: Phe3 to Tyr, His15 to Leu, Gln41 to His, Asn77 to His, Gln 121 to Asn, and a newly found substitution of Ile124 to Thr. The amino acid sequence of golden pheasant lysozyme was identical to that of Lady Amherst's phesant lysozyme. The phylogenetic tree constructured by the comparison of amino acid sequences of phasianoid birds lysozymes revealed a minimum genetic distance between these pheasants and the turkey-peafowl group. PMID:1368578

  1. The thermostability of two kinds of recombinant ∆6-fatty acid desaturase with different N-terminal sequence lengths in low temperature.

    PubMed

    Lu, He; Zhu, Yu

    2013-09-01

    Two recombinant Rhizopus stolonifer ∆6-fatty acid desaturase enzymes with different-length N-termini were cloned and expressed in Saccharomyces cerevisiae strain INVScl: LRsD6D begins with the sequence of the N-terminal of the R. stolonifer ∆6-fatty acid desaturase native, encoding a deduced polypeptide of 459 amino acids (M-S-T-L-D-R-Q-S-I-F-T-I-K-E-L-E-S-I-S-Q-R-I-H-D-G-D-E-E-A-M-K-F), whereas SRsD6D begins with the amino acid sequence of the predicted ORF, encoding a deduced polypeptide of 430 amino acids (M-K-F) and LRsD6D is longer than SRsD6D by 29 amino acids (M-S-T-L-D-R-Q-S-I-F-T-I-K-E-L-E-S-I-S-Q-R-I-H-D-G-D-E-E-A). Bioinformatic analysis characterized the two recombinant ∆6-fatty acid desaturase enzymes with different-length N-termini, including three conserved histidine-rich motifs, hydropathy profile, and a cytochrome b5-like domain in the N-terminus. When the coding sequence was expressed in S. cerevisiae strain INVScl, the coding produced ∆6-fatty acid desaturase activity exhibited by RsD6D, leading to a novel peak corresponding to γ-linolenic acid methyl ester standards, which was detected with the same retention time. The residual activity of LRsD6D was 74 % at 15 °C for 4 h and that of SRsD6D was 43 %. Purified recombinant LRsD6D was more stable than SRsD6D, indicating that the N-terminal extension, containing mostly hydrophobic residues, affected the overall stability of recombinant LRsD6D.

  2. A sequence-based approach for prediction of CsrA/RsmA targets in bacteria with experimental validation in Pseudomonas aeruginosa

    PubMed Central

    Kulkarni, Prajna R.; Jia, Tao; Kuehne, Sarah A.; Kerkering, Thomas M.; Morris, Elizabeth R.; Searle, Mark S.; Heeb, Stephan; Rao, Jayasimha; Kulkarni, Rahul V.

    2014-01-01

    CsrA/RsmA homologs are an extensive family of ribonucleic acid (RNA)-binding proteins that function as global post-transcriptional regulators controlling important cellular processes such as secondary metabolism, motility, biofilm formation and the production and secretion of virulence factors in diverse bacterial species. While direct messenger RNA binding by CsrA/RsmA has been studied in detail for some genes, it is anticipated that there are numerous additional, as yet undiscovered, direct targets that mediate its global regulation. To assist in the discovery of these targets, we propose a sequence-based approach to predict genes directly regulated by these regulators. In this work, we develop a computer code (CSRA_TARGET) implementing this approach, which leads to predictions for several novel targets in Escherichia coli and Pseudomonas aeruginosa. The predicted targets in other bacteria, specifically Salmonella enterica serovar Typhimurium, Pectobacterium carotovorum and Legionella pneumophila, also include global regulators that control virulence in these pathogens, unraveling intricate indirect regulatory roles for CsrA/RsmA. We have experimentally validated four predicted RsmA targets in P. aeruginosa. The sequence-based approach developed in this work can thus lead to several testable predictions for direct targets of CsrA homologs, thereby complementing and accelerating efforts to unravel global regulation by this important family of proteins. PMID:24782516

  3. Electromyographic Patterns during Golf Swing: Activation Sequence Profiling and Prediction of Shot Effectiveness.

    PubMed

    Verikas, Antanas; Vaiciukynas, Evaldas; Gelzinis, Adas; Parker, James; Olsson, M Charlotte

    2016-01-01

    This study analyzes muscle activity, recorded in an eight-channel electromyographic (EMG) signal stream, during the golf swing using a 7-iron club and exploits information extracted from EMG dynamics to predict the success of the resulting shot. Muscles of the arm and shoulder on both the left and right sides, namely flexor carpi radialis, extensor digitorum communis, rhomboideus and trapezius, are considered for 15 golf players (∼5 shots each). The method using Gaussian filtering is outlined for EMG onset time estimation in each channel and activation sequence profiling. Shots of each player revealed a persistent pattern of muscle activation. Profiles were plotted and insights with respect to player effectiveness were provided. Inspection of EMG dynamics revealed a pair of highest peaks in each channel as the hallmark of golf swing, and a custom application of peak detection for automatic extraction of swing segment was introduced. Various EMG features, encompassing 22 feature sets, were constructed. Feature sets were used individually and also in decision-level fusion for the prediction of shot effectiveness. The prediction of the target attribute, such as club head speed or ball carry distance, was investigated using random forest as the learner in detection and regression tasks. Detection evaluates the personal effectiveness of a shot with respect to the player-specific average, whereas regression estimates the value of target attribute, using EMG features as predictors. Fusion after decision optimization provided the best results: the equal error rate in detection was 24.3% for the speed and 31.7% for the distance; the mean absolute percentage error in regression was 3.2% for the speed and 6.4% for the distance. Proposed EMG feature sets were found to be useful, especially when used in combination. Rankings of feature sets indicated statistics for muscle activity in both the left and right body sides, correlation-based analysis of EMG dynamics and features

  4. Electromyographic Patterns during Golf Swing: Activation Sequence Profiling and Prediction of Shot Effectiveness.

    PubMed

    Verikas, Antanas; Vaiciukynas, Evaldas; Gelzinis, Adas; Parker, James; Olsson, M Charlotte

    2016-01-01

    This study analyzes muscle activity, recorded in an eight-channel electromyographic (EMG) signal stream, during the golf swing using a 7-iron club and exploits information extracted from EMG dynamics to predict the success of the resulting shot. Muscles of the arm and shoulder on both the left and right sides, namely flexor carpi radialis, extensor digitorum communis, rhomboideus and trapezius, are considered for 15 golf players (∼5 shots each). The method using Gaussian filtering is outlined for EMG onset time estimation in each channel and activation sequence profiling. Shots of each player revealed a persistent pattern of muscle activation. Profiles were plotted and insights with respect to player effectiveness were provided. Inspection of EMG dynamics revealed a pair of highest peaks in each channel as the hallmark of golf swing, and a custom application of peak detection for automatic extraction of swing segment was introduced. Various EMG features, encompassing 22 feature sets, were constructed. Feature sets were used individually and also in decision-level fusion for the prediction of shot effectiveness. The prediction of the target attribute, such as club head speed or ball carry distance, was investigated using random forest as the learner in detection and regression tasks. Detection evaluates the personal effectiveness of a shot with respect to the player-specific average, whereas regression estimates the value of target attribute, using EMG features as predictors. Fusion after decision optimization provided the best results: the equal error rate in detection was 24.3% for the speed and 31.7% for the distance; the mean absolute percentage error in regression was 3.2% for the speed and 6.4% for the distance. Proposed EMG feature sets were found to be useful, especially when used in combination. Rankings of feature sets indicated statistics for muscle activity in both the left and right body sides, correlation-based analysis of EMG dynamics and features

  5. Electromyographic Patterns during Golf Swing: Activation Sequence Profiling and Prediction of Shot Effectiveness

    PubMed Central

    Verikas, Antanas; Vaiciukynas, Evaldas; Gelzinis, Adas; Parker, James; Olsson, M. Charlotte

    2016-01-01

    This study analyzes muscle activity, recorded in an eight-channel electromyographic (EMG) signal stream, during the golf swing using a 7-iron club and exploits information extracted from EMG dynamics to predict the success of the resulting shot. Muscles of the arm and shoulder on both the left and right sides, namely flexor carpi radialis, extensor digitorum communis, rhomboideus and trapezius, are considered for 15 golf players (∼5 shots each). The method using Gaussian filtering is outlined for EMG onset time estimation in each channel and activation sequence profiling. Shots of each player revealed a persistent pattern of muscle activation. Profiles were plotted and insights with respect to player effectiveness were provided. Inspection of EMG dynamics revealed a pair of highest peaks in each channel as the hallmark of golf swing, and a custom application of peak detection for automatic extraction of swing segment was introduced. Various EMG features, encompassing 22 feature sets, were constructed. Feature sets were used individually and also in decision-level fusion for the prediction of shot effectiveness. The prediction of the target attribute, such as club head speed or ball carry distance, was investigated using random forest as the learner in detection and regression tasks. Detection evaluates the personal effectiveness of a shot with respect to the player-specific average, whereas regression estimates the value of target attribute, using EMG features as predictors. Fusion after decision optimization provided the best results: the equal error rate in detection was 24.3% for the speed and 31.7% for the distance; the mean absolute percentage error in regression was 3.2% for the speed and 6.4% for the distance. Proposed EMG feature sets were found to be useful, especially when used in combination. Rankings of feature sets indicated statistics for muscle activity in both the left and right body sides, correlation-based analysis of EMG dynamics and features

  6. N-terminal amino acid sequences and some characteristics of fibrinolytic/hemorrhagic metalloproteinases purified from Bothrops jararaca venom.

    PubMed

    Maruyama, Masugi; Sugiki, Masahiko; Anai, Keita; Yoshida, Etsuo

    2002-08-01

    We determined the N-terminal amino acid sequences of the fibrinolytic/hemorrhagic metalloproteinases (jararafibrases I, III and IV) purified from Bothrops jararaca venom. The N-terminal amino acid sequences of jararafibrase I and its degradation products were identical to those of jararhagin, another hemorrhagic metalloproteinase purified from the same snake venom. Together with enzymatic and immunological properties, we concluded that those two enzymes are identical. The N-terminal amino acid sequence of jararafibrase III was quite similar to C-type lectin isolated from Crotalus atrox, and the protein had a hemagglutinating activity on intact rat red blood cells. PMID:12165326

  7. Prediction of nucleating sequences from amyloidogenic propensities of tau-related peptides.

    PubMed

    Rojas Quijano, Federico A; Morrow, Dana; Wise, Barry M; Brancia, Francesco L; Goux, Warren J

    2006-04-11

    Physical properties, including amyloid morphology, FTIR and CD spectra, enhancement of Congo red absorbance, polymerization rate, critical monomer concentration, free energy of stabilization, hydrophobicity, and the partition coefficient between soluble and amyloid states, were measured for the tau-related peptide Ac-VQIVYK amide (AcPHF6) and its single site mutants Ac-VQIVXK amide (X not equal Cys). Transmission electron microscopy showed that 15 out of the 19 peptides formed amyloid in buffer, with morphologies ranging from straight and twisted filaments to sheets and rolled sheets. Using principal component analysis (PCA), measured properties were treated in a comprehensive manner, and scores along the most significant principal components were used to define individual amino acid amyloidogenic propensities. Quantitative structure-activity modeling (QSAM) showed that residues with greater size and hydrophobicity made the largest contributions to the propensity of peptides to form amyloid. Using individual amino acid propensities, sequences within tau with high amyloid-forming potential were estimated and found to include 226VAVVR230 in the proline-rich region, 275VQIINK280 (PHF6) and 306VQIVYK311 (PHF6) within the microtubule binding region, and 392IVYK395 in the C-tail region of the protein. The results suggest that regions outside the microtubule-binding region may play important roles in tau aggregation kinetics or paired helical filament structure.

  8. Protein sequence analysis by incorporating modified chaos game and physicochemical properties into Chou's general pseudo amino acid composition.

    PubMed

    Xu, Chunrui; Sun, Dandan; Liu, Shenghui; Zhang, Yusen

    2016-10-01

    In this contribution we introduced a novel graphical method to compare protein sequences. By mapping a protein sequence into 3D space based on codons and physicochemical properties of 20 amino acids, we are able to get a unique P-vector from the 3D curve. This approach is consistent with wobble theory of amino acids. We compute the distance between sequences by their P-vectors to measure similarities/dissimilarities among protein sequences. Finally, we use our method to analyze four datasets and get better results compared with previous approaches. PMID:27375218

  9. Applications of the predictability of the Coherent Noise Model to aftershock sequences

    NASA Astrophysics Data System (ADS)

    Christopoulos, Stavros-Richard; Sarlis, Nicholas

    2014-05-01

    A study [1] of the coherent noise model [2-4] in natural time [5-7] has shown that it exhibits predictability. Interestingly, one of the predictors suggested [1] for the coherent noise model can be generalized and applied to the case of (real) aftershock sequences. The results obtained [8] so far are beyond chance. Here, we apply this approach to several aftershock sequences of strong earthquakes with magnitudes Mw ≥6.9 in Indonesia, California and Greece, including the Mw9.2 earthquake that occurred on 26 December 2004 in Sumatra. References. [1] N. V. Sarlis and S.-R. G. Christopoulos, Predictability of the coherent-noise model and its applications, Physical Review E, 85, 051136, 2012. [2] M.E.J. Newman, Self-organized criticality, evolution and the fossil extinction record, Proc. R. Soc. London B, 263, 1605-1610, 1996. [3] M. E. J. Newman and K. Sneppen, Avalanches, scaling, and coherent noise, Phys. Rev. E, 54, 6226-6231, 1996. [4] K. Sneppen and M. Newman, Coherent noise, scale invariance and intermittency in large systems, Physica D, 110, 209 - 222. [5] P. Varotsos, N. Sarlis, and E. Skordas, Spatiotemporal complexity aspects on the interrelation between Seismic Electric Signals and seismicity, Practica of Athens Academy, 76, 294-321, 2001. [6] P.A. Varotsos, N.V. Sarlis, and E.S. Skordas, Long-range correlations in the electric signals that precede rupture, Phys. Rev. E, 66, 011902, 2002. [7] Varotsos P. A., Sarlis N. V. and Skordas E. S., Natural Time Analysis: The new view of time. Precursory Seismic Electric Signals, Earthquakes and other Complex Time-Series (Springer-Verlag, Berlin Heidelberg) 2011. [8] N. V. Sarlis and S.-R. G. Christopoulos, "Visualization of the significance of Receiver Operating Characteristics based on confidence ellipses", Computer Physics Communications, http://dx.doi.org/10.1016/j.cpc.2013.12.009

  10. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis

    PubMed Central

    Bradley, Phelim; Gordon, N. Claire; Walker, Timothy M.; Dunn, Laura; Heys, Simon; Huang, Bill; Earle, Sarah; Pankhurst, Louise J.; Anson, Luke; de Cesare, Mariateresa; Piazza, Paolo; Votintseva, Antonina A.; Golubchik, Tanya; Wilson, Daniel J.; Wyllie, David H.; Diel, Roland; Niemann, Stefan; Feuerriegel, Silke; Kohl, Thomas A.; Ismail, Nazir; Omar, Shaheed V.; Smith, E. Grace; Buck, David; McVean, Gil; Walker, A. Sarah; Peto, Tim E. A.; Crook, Derrick W.; Iqbal, Zamin

    2015-01-01

    The rise of antibiotic-resistant bacteria has led to an urgent need for rapid detection of drug resistance in clinical samples, and improvements in global surveillance. Here we show how de Bruijn graph representation of bacterial diversity can be used to identify species and resistance profiles of clinical isolates. We implement this method for Staphylococcus aureus and Mycobacterium tuberculosis in a software package (‘Mykrobe predictor') that takes raw sequence data as input, and generates a clinician-friendly report within 3 minutes on a laptop. For S. aureus, the error rates of our method are comparable to gold-standard phenotypic methods, with sensitivity/specificity of 99.1%/99.6% across 12 antibiotics (using an independent validation set, n=470). For M. tuberculosis, our method predicts resistance with sensitivity/specificity of 82.6%/98.5% (independent validation set, n=1,609); sensitivity is lower here, probably because of limited understanding of the underlying genetic mechanisms. We give evidence that minor alleles improve detection of extremely drug-resistant strains, and demonstrate feasibility of the use of emerging single-molecule nanopore sequencing techniques for these purposes. PMID:26686880

  11. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis.

    PubMed

    Bradley, Phelim; Gordon, N Claire; Walker, Timothy M; Dunn, Laura; Heys, Simon; Huang, Bill; Earle, Sarah; Pankhurst, Louise J; Anson, Luke; de Cesare, Mariateresa; Piazza, Paolo; Votintseva, Antonina A; Golubchik, Tanya; Wilson, Daniel J; Wyllie, David H; Diel, Roland; Niemann, Stefan; Feuerriegel, Silke; Kohl, Thomas A; Ismail, Nazir; Omar, Shaheed V; Smith, E Grace; Buck, David; McVean, Gil; Walker, A Sarah; Peto, Tim E A; Crook, Derrick W; Iqbal, Zamin

    2015-01-01

    The rise of antibiotic-resistant bacteria has led to an urgent need for rapid detection of drug resistance in clinical samples, and improvements in global surveillance. Here we show how de Bruijn graph representation of bacterial diversity can be used to identify species and resistance profiles of clinical isolates. We implement this method for Staphylococcus aureus and Mycobacterium tuberculosis in a software package ('Mykrobe predictor') that takes raw sequence data as input, and generates a clinician-friendly report within 3 minutes on a laptop. For S. aureus, the error rates of our method are comparable to gold-standard phenotypic methods, with sensitivity/specificity of 99.1%/99.6% across 12 antibiotics (using an independent validation set, n=470). For M. tuberculosis, our method predicts resistance with sensitivity/specificity of 82.6%/98.5% (independent validation set, n=1,609); sensitivity is lower here, probably because of limited understanding of the underlying genetic mechanisms. We give evidence that minor alleles improve detection of extremely drug-resistant strains, and demonstrate feasibility of the use of emerging single-molecule nanopore sequencing techniques for these purposes. PMID:26686880

  12. Deep-sequence profiling of miRNAs and their target prediction in Monotropa hypopitys.

    PubMed

    Shchennikova, Anna V; Beletsky, Alexey V; Shulga, Olga A; Mazur, Alexander M; Prokhortchouk, Egor B; Kochieva, Elena Z; Ravin, Nikolay V; Skryabin, Konstantin G

    2016-07-01

    Myco-heterotroph Monotropa hypopitys is a widely spread perennial herb used to study symbiotic interactions and physiological mechanisms underlying the development of non-photosynthetic plant. Here, we performed, for the first time, transcriptome-wide characterization of M. hypopitys miRNA profile using high throughput Illumina sequencing. As a result of small RNA library sequencing and bioinformatic analysis, we identified 55 members belonging to 40 families of known miRNAs and 17 putative novel miRNAs unique for M. hypopitys. Computational screening revealed 206 potential mRNA targets for known miRNAs and 31 potential mRNA targets for novel miRNAs. The predicted target genes were described in Gene Ontology terms and were found to be involved in a broad range of metabolic and regulatory pathways. The identification of novel M. hypopitys-specific miRNAs, some with few target genes and low abundances, suggests their recent evolutionary origin and participation in highly specialized regulatory mechanisms fundamental for non-photosynthetic biology of M. hypopitys. This global analysis of miRNAs and their potential targets in M. hypopitys provides a framework for further investigation of miRNA role in the evolution and establishment of non-photosynthetic myco-heterotrophs. PMID:27097902

  13. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis.

    PubMed

    Bradley, Phelim; Gordon, N Claire; Walker, Timothy M; Dunn, Laura; Heys, Simon; Huang, Bill; Earle, Sarah; Pankhurst, Louise J; Anson, Luke; de Cesare, Mariateresa; Piazza, Paolo; Votintseva, Antonina A; Golubchik, Tanya; Wilson, Daniel J; Wyllie, David H; Diel, Roland; Niemann, Stefan; Feuerriegel, Silke; Kohl, Thomas A; Ismail, Nazir; Omar, Shaheed V; Smith, E Grace; Buck, David; McVean, Gil; Walker, A Sarah; Peto, Tim E A; Crook, Derrick W; Iqbal, Zamin

    2015-01-01

    The rise of antibiotic-resistant bacteria has led to an urgent need for rapid detection of drug resistance in clinical samples, and improvements in global surveillance. Here we show how de Bruijn graph representation of bacterial diversity can be used to identify species and resistance profiles of clinical isolates. We implement this method for Staphylococcus aureus and Mycobacterium tuberculosis in a software package ('Mykrobe predictor') that takes raw sequence data as input, and generates a clinician-friendly report within 3 minutes on a laptop. For S. aureus, the error rates of our method are comparable to gold-standard phenotypic methods, with sensitivity/specificity of 99.1%/99.6% across 12 antibiotics (using an independent validation set, n=470). For M. tuberculosis, our method predicts resistance with sensitivity/specificity of 82.6%/98.5% (independent validation set, n=1,609); sensitivity is lower here, probably because of limited understanding of the underlying genetic mechanisms. We give evidence that minor alleles improve detection of extremely drug-resistant strains, and demonstrate feasibility of the use of emerging single-molecule nanopore sequencing techniques for these purposes.

  14. Genome Sequence Variability Predicts Drug Precautions and Withdrawals from the Market

    PubMed Central

    Baik, Su Youn; Lee, Soo Youn; Park, Chan Hee; Park, Paul J.; Kim, Ju Han

    2016-01-01

    Despite substantial premarket efforts, a significant portion of approved drugs has been withdrawn from the market for safety reasons. The deleterious impact of nonsynonymous substitutions predicted by the SIFT algorithm on structure and function of drug-related proteins was evaluated for 2504 personal genomes. Both withdrawn (n = 154) and precautionary (Beers criteria (n = 90), and US FDA pharmacogenomic biomarkers (n = 96)) drugs showed significantly lower genomic deleteriousness scores (P < 0.001) compared to others (n = 752). Furthermore, the rates of drug withdrawals and precautions correlated significantly with the deleteriousness scores of the drugs (P < 0.01); this trend was confirmed for all drugs included in the withdrawal and precaution lists by the United Nations, European Medicines Agency, DrugBank, Beers criteria, and US FDA. Our findings suggest that the person-to-person genome sequence variability is a strong independent predictor of drug withdrawals and precautions. We propose novel measures of drug safety based on personal genome sequence analysis. PMID:27690231

  15. Temporal and Spatial Predictability of an Irrelevant Event Differently Affect Detection and Memory of Items in a Visual Sequence

    PubMed Central

    Ohyama, Junji; Watanabe, Katsumi

    2016-01-01

    We examined how the temporal and spatial predictability of a task-irrelevant visual event affects the detection and memory of a visual item embedded in a continuously changing sequence. Participants observed 11 sequentially presented letters, during which a task-irrelevant visual event was either present or absent. Predictabilities of spatial location and temporal position of the event were controlled in 2 × 2 conditions. In the spatially predictable conditions, the event occurred at the same location within the stimulus sequence or at another location, while, in the spatially unpredictable conditions, it occurred at random locations. In the temporally predictable conditions, the event timing was fixed relative to the order of the letters, while in the temporally unpredictable condition; it could not be predicted from the letter order. Participants performed a working memory task and a target detection reaction time (RT) task. Memory accuracy was higher for a letter simultaneously presented at the same location as the event in the temporally unpredictable conditions, irrespective of the spatial predictability of the event. On the other hand, the detection RTs were only faster for a letter simultaneously presented at the same location as the event when the event was both temporally and spatially predictable. Thus, to facilitate ongoing detection processes, an event must be predictable both in space and time, while memory processes are enhanced by temporally unpredictable (i.e., surprising) events. Evidently, temporal predictability has differential effects on detection and memory of a visual item embedded in a sequence of images. PMID:26869966

  16. JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms.

    PubMed

    Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim

    2015-01-01

    The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/. PMID:26424080

  17. JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms

    PubMed Central

    Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim

    2015-01-01

    The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/ PMID:26424080

  18. JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms.

    PubMed

    Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim

    2015-01-01

    The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/.

  19. Purification to homogeneity and amino acid sequence analysis of two anionic species of human interleukin 1

    PubMed Central

    1986-01-01

    Two anionic species of human IL-1 have been purified to homogeneity. These molecules were characterized as having pI of 5.4 and 5.2 and molecular weights identical to IL-1/6.8 (17,500). The specific activities of IL-1/5.4 and IL-1/5.2, as measured in the mouse thymocyte co-mitogenic assay, were identical to that of IL-1/6.8, namely 1.2 X 10(7) U/mg, with half-maximal stimulation observed at 2 X 10(-11) M. IL- 1/5.4 and IL-1/5.2 were found to be antigenically distinct from IL- 1/6.8 in an ELISA. IL-1/5.4 was structurally distinct from IL-1/6.8 based on reverse-phase HPLC or CNBr peptides. Intact IL-1/5.2 and three intact CNBr peptides of IL-1/5.4 were sequenced, with the identification of 74 amino acid residues. These sequences were found to correspond exactly with the amino acid sequence deduced from the IL-1- alpha cDNA reported by March et al. PMID:3487613

  20. Computational sequence analysis of predicted long dsRNA transcriptomes of major crops reveals sequence complementarity with human genes.

    PubMed

    Jensen, Peter D; Zhang, Yuanji; Wiggins, B Elizabeth; Petrick, Jay S; Zhu, Jin; Kerstetter, Randall A; Heck, Gregory R; Ivashuta, Sergey I

    2013-01-01

    Long double-stranded RNAs (long dsRNAs) are precursors for the effector molecules of sequence-specific RNA-based gene silencing in eukaryotes. Plant cells can contain numerous endogenous long dsRNAs. This study demonstrates that such endogenous long dsRNAs in plants have sequence complementarity to human genes. Many of these complementary long dsRNAs have perfect sequence complementarity of at least 21 nucleotides to human genes; enough complementarity to potentially trigger gene silencing in targeted human cells if delivered in functional form. However, the number and diversity of long dsRNA molecules in plant tissue from crops such as lettuce, tomato, corn, soy and rice with complementarity to human genes that have a long history of safe consumption supports a conclusion that long dsRNAs do not present a significant dietary risk.

  1. Integration of Expressed Sequence Tag Data Flanking Predicted RNA Secondary Structures Facilitates Novel Non-Coding RNA Discovery

    PubMed Central

    Krzyzanowski, Paul M.; Price, Feodor D.; Muro, Enrique M.; Rudnicki, Michael A.; Andrade-Navarro, Miguel A.

    2011-01-01

    Many computational methods have been used to predict novel non-coding RNAs (ncRNAs), but none, to our knowledge, have explicitly investigated the impact of integrating existing cDNA-based Expressed Sequence Tag (EST) data that flank structural RNA predictions. To determine whether flanking EST data can assist in microRNA (miRNA) prediction, we identified genomic sites encoding putative miRNAs by combining functional RNA predictions with flanking ESTs data in a model consistent with miRNAs undergoing cleavage during maturation. In both human and mouse genomes, we observed that the inclusion of flanking ESTs adjacent to and not overlapping predicted miRNAs significantly improved the performance of various methods of miRNA prediction, including direct high-throughput sequencing of small RNA libraries. We analyzed the expression of hundreds of miRNAs predicted to be expressed during myogenic differentiation using a customized microarray and identified several known and predicted myogenic miRNA hairpins. Our results indicate that integrating ESTs flanking structural RNA predictions improves the quality of cleaved miRNA predictions and suggest that this strategy can be used to predict other non-coding RNAs undergoing cleavage during maturation. PMID:21698286

  2. Prediction of phase equilibrium and hydration free energy of carboxylic acids by Monte Carlo simulations.

    PubMed

    Ferrando, Nicolas; Gedik, Ibrahim; Lachet, Véronique; Pigeon, Laurent; Lugo, Rafael

    2013-06-13

    In this work, a new transferable united-atom force field has been developed to predict phase equilibrium and hydration free energy of carboxylic acids. To take advantage of the transferability of the AUA4 force field, all Lennard-Jones parameters of groups involved in the carboxylic acid chemical function are reused from previous parametrizations of this force field. Only a unique set of partial electrostatic charges is proposed to reproduce the experimental gas phase dipole moment, saturated liquid densities and vapor pressures. Phase equilibrium properties of various pure carboxylic acids (acetic acid, propanoic acid, butanoic acid, pentanoic acid, hexanoic acid) and one diacid (1,5-pentanedioic) are studied through Monte Carlo simulations in the Gibbs ensemble. A good accuracy is obtained for pure compound saturated liquid densities and vapor pressures (average deviation of 2% and 6%, respectively), as well as for critical points. The vaporization enthalpy is, however, poorly predicted for short acids, probably due to a limitation of the force field to correctly describe the significant dimerization in the vapor phase. Pressure-composition diagrams for two binary mixtures (acetic acid + n-butane and propanoic acid + pentanoic acid) are also computed with a good accuracy, showing the transferability of the proposed force field to mixtures. Hydration free energies are calculated for three carboxylic acids using thermodynamic integration. A systematic overestimation of around 10 kJ/mol is observed compared to experimental data. This new force field parametrized only on saturated equilibrium properties appears insufficient to reach an acceptable precision for this property, and only relative hydration free energies between two carboxylic acids can be correctly predicted. This highlights the limitation of the transferability feature of force fields to properties not included in the parametrization database.

  3. Prediction of phase equilibrium and hydration free energy of carboxylic acids by Monte Carlo simulations.

    PubMed

    Ferrando, Nicolas; Gedik, Ibrahim; Lachet, Véronique; Pigeon, Laurent; Lugo, Rafael

    2013-06-13

    In this work, a new transferable united-atom force field has been developed to predict phase equilibrium and hydration free energy of carboxylic acids. To take advantage of the transferability of the AUA4 force field, all Lennard-Jones parameters of groups involved in the carboxylic acid chemical function are reused from previous parametrizations of this force field. Only a unique set of partial electrostatic charges is proposed to reproduce the experimental gas phase dipole moment, saturated liquid densities and vapor pressures. Phase equilibrium properties of various pure carboxylic acids (acetic acid, propanoic acid, butanoic acid, pentanoic acid, hexanoic acid) and one diacid (1,5-pentanedioic) are studied through Monte Carlo simulations in the Gibbs ensemble. A good accuracy is obtained for pure compound saturated liquid densities and vapor pressures (average deviation of 2% and 6%, respectively), as well as for critical points. The vaporization enthalpy is, however, poorly predicted for short acids, probably due to a limitation of the force field to correctly describe the significant dimerization in the vapor phase. Pressure-composition diagrams for two binary mixtures (acetic acid + n-butane and propanoic acid + pentanoic acid) are also computed with a good accuracy, showing the transferability of the proposed force field to mixtures. Hydration free energies are calculated for three carboxylic acids using thermodynamic integration. A systematic overestimation of around 10 kJ/mol is observed compared to experimental data. This new force field parametrized only on saturated equilibrium properties appears insufficient to reach an acceptable precision for this property, and only relative hydration free energies between two carboxylic acids can be correctly predicted. This highlights the limitation of the transferability feature of force fields to properties not included in the parametrization database. PMID:23697338

  4. Bacteria obtained from a sequencing batch reactor that are capable of growth on dehydroabietic acid.

    PubMed

    Mohn, W W

    1995-06-01

    Eleven isolates capable of growth on the resin acid dehydroabietic acid (DhA) were obtained from a sequencing batch reactor designed to treat a high-strength process stream from a paper mill. The isolates belonged to two groups, represented by strains DhA-33 and DhA-35, which were characterized. In the bioreactor, bacteria like DhA-35 were more abundant than those like DhA-33. The population in the bioreactor of organisms capable of growth on DhA was estimated to be 1.1 x 10(6) propagules per ml, based on a most-probable-number determination. Analysis of small-subunit rRNA partial sequences indicated that DhA-33 was most closely related to Sphingomonas yanoikuyae (Sab = 0.875) and that DhA-35 was most closely related to Zoogloea ramigera (Sab = 0.849). Both isolates additionally grew on other abietanes, i.e., abietic and palustric acids, but not on the pimaranes, pimaric and isopimaric acids. For DhA-33 and DhA-35 with DhA as the sole organic substrate, doubling times were 2.7 and 2.2 h, respectively, and growth yields were 0.30 and 0.25 g of protein per g of DhA, respectively. Glucose as a cosubstrate stimulated growth of DhA-33 on DhA and stimulated DhA degradation by the culture. Pyruvate as a cosubstrate did not stimulate growth of DhA-35 on DhA and reduced the specific rate of DhA degradation of the culture. DhA induced DhA and abietic acid degradation activities in both strains, and these activities were heat labile. Cell suspensions of both strains consumed DhA at a rate of 6 mumol mg of protein-1 h-1.(ABSTRACT TRUNCATED AT 250 WORDS)

  5. Foreshock sequences and short-term earthquake predictability on East Pacific Rise transform faults.

    PubMed

    McGuire, Jeffrey J; Boettcher, Margaret S; Jordan, Thomas H

    2005-03-24

    East Pacific Rise transform faults are characterized by high slip rates (more than ten centimetres a year), predominantly aseismic slip and maximum earthquake magnitudes of about 6.5. Using recordings from a hydroacoustic array deployed by the National Oceanic and Atmospheric Administration, we show here that East Pacific Rise transform faults also have a low number of aftershocks and high foreshock rates compared to continental strike-slip faults. The high ratio of foreshocks to aftershocks implies that such transform-fault seismicity cannot be explained by seismic triggering models in which there is no fundamental distinction between foreshocks, mainshocks and aftershocks. The foreshock sequences on East Pacific Rise transform faults can be used to predict (retrospectively) earthquakes of magnitude 5.4 or greater, in narrow spatial and temporal windows and with a high probability gain. The predictability of such transform earthquakes is consistent with a model in which slow slip transients trigger earthquakes, enrich their low-frequency radiation and accommodate much of the aseismic plate motion. PMID:15791246

  6. Position-specific prediction of methylation sites from sequence conservation based on information theory.

    PubMed

    Shi, Yinan; Guo, Yanzhi; Hu, Yayun; Li, Menglong

    2015-07-23

    Protein methylation plays vital roles in many biological processes and has been implicated in various human diseases. To fully understand the mechanisms underlying methylation for use in drug design and work in methylation-related diseases, an initial but crucial step is to identify methylation sites. The use of high-throughput bioinformatics methods has become imperative to predict methylation sites. In this study, we developed a novel method that is based only on sequence conservation to predict protein methylation sites. Conservation difference profiles between methylated and non-methylated peptides were constructed by the information entropy (IE) in a wider neighbor interval around the methylation sites that fully incorporated all of the environmental information. Then, the distinctive neighbor residues were identified by the importance scores of information gain (IG). The most representative model was constructed by support vector machine (SVM) for Arginine and Lysine methylation, respectively. This model yielded a promising result on both the benchmark dataset and independent test set. The model was used to screen the entire human proteome, and many unknown substrates were identified. These results indicate that our method can serve as a useful supplement to elucidate the mechanism of protein methylation and facilitate hypothesis-driven experimental design and validation.

  7. Predicting lipase types by improved Chou's pseudo-amino acid composition.

    PubMed

    Zhang, Guang-Ya; Li, Hong-Chun; Gao, Jia-Qiang; Fang, Bai-Shan

    2008-01-01

    By proposing a improved Chou's pseudo amino acid composition approach to extract the features of the sequences, a powerful predictor based on k-nearest neighbor was introduced to identify the types of lipases according to their sequences. To avoid redundancy and bias, demonstrations were performed on a dataset where none of the proteins has > or =25% sequence identity to any other. The overall success rate thus obtained by the 10-fold cross-validation test was over 90%, indicating that the improved Chou's pseudo amino acid composition might be a useful tool for extracting the features of protein sequences, or at lease can play a complementary role to many of the other existing approaches. PMID:19075826

  8. Development of a SCAR (sequence-characterised amplified region) marker for acid resistance-related gene in Lactobacillus plantarum.

    PubMed

    Liu, Shu-Wen; Li, Kai; Yang, Shi-Ling; Tian, Shu-Fen; He, Ling

    2015-03-01

    A sequence characterised amplified region marker was developed to determine an acid resistance-related gene in Lactobacillus plantarum. A random amplified polymorphic DNA marker named S116-680 was reported to be closely related to the acid resistance of the strains. The DNA band corresponding to this marker was cloned and sequenced with the induction of specific designed PCR primers. The results of PCR test helped to amplify a clear specific band of 680 bp in the tested acid-resistant strains. S116-680 marker would be useful to explore the acid-resistant mechanism of L. plantarum and to screen desirable malolactic fermentation strains.

  9. Nucleic and amino acid sequences relating to a novel transketolase, and methods for the expression thereof

    DOEpatents

    Croteau, Rodney Bruce; Wildung, Mark Raymond; Lange, Bernd Markus; McCaskill, David G.

    2001-01-01

    cDNAs encoding 1-deoxyxylulose-5-phosphate synthase from peppermint (Mentha piperita) have been isolated and sequenced, and the corresponding amino acid sequences have been determined. Accordingly, isolated DNA sequences (SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7) are provided which code for the expression of 1-deoxyxylulose-5-phosphate synthase from plants. In another aspect the present invention provides for isolated, recombinant DXPS proteins, such as the proteins having the sequences set forth in SEQ ID NO:4, SEQ ID NO:6 and SEQ ID NO:8. In other aspects, replicable recombinant cloning vehicles are provided which code for plant 1-deoxyxylulose-5-phosphate synthases, or for a base sequence sufficiently complementary to at least a portion of 1-deoxyxylulose-5-phosphate synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding a plant 1-deoxyxylulose-5-phosphate synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant 1-deoxyxylulose-5-phosphate synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant 1-deoxyxylulose-5-phosphate synthase may be used to obtain expression or enhanced expression of 1-deoxyxylulose-5-phosphate synthase in plants in order to enhance the production of 1-deoxyxylulose-5-phosphate, or its derivatives such as isopentenyl diphosphate (BP), or may be otherwise employed for the regulation or expression of 1-deoxyxylulose-5-phosphate synthase, or the production of its products.

  10. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction.

    PubMed

    Brøndum, R F; Su, G; Janss, L; Sahana, G; Guldbrandtsen, B; Boichard, D; Lund, M S

    2015-06-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected with the aim of augmenting the custom low-density Illumina BovineLD SNP chip (San Diego, CA) used in the Nordic countries. The single-marker analysis was done breed-wise on all 16 index traits included in the breeding goals for Nordic Holstein, Danish Jersey, and Nordic Red cattle plus the total merit index itself. Depending on the trait's economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage disequilibrium and assaying performance on the array, a total of 1,623 QTL markers were selected for inclusion on the custom chip. Genomic prediction analyses were performed for Nordic and French Holstein and Nordic Red animals using either a genomic BLUP or a Bayesian variable selection model. When using the genomic BLUP model including the QTL markers in the analysis, reliability was increased by up to 4 percentage points for production traits in Nordic Holstein animals, up to 3 percentage points for Nordic Reds, and up to 5 percentage points for French Holstein. Smaller gains of up to 1 percentage point was observed for mastitis, but only a 0.5 percentage point increase was seen for fertility. When using a Bayesian model accuracies were generally higher with only 54k data compared with the genomic BLUP approach, but increases in reliability were relatively smaller when QTL markers were included. Results from this study indicate that the reliability of genomic prediction can be increased by including markers significant in genome-wide association studies on whole genome

  11. Prediction of Scylla olivacea (Crustacea; Brachyura) peptide hormones using publicly accessible transcriptome shotgun assembly (TSA) sequences.

    PubMed

    Christie, Andrew E

    2016-05-01

    The aquaculture of crabs from the genus Scylla is of increasing economic importance for many Southeast Asian countries. Expansion of Scylla farming has led to increased efforts to understand the physiology and behavior of these crabs, and as such, there are growing molecular resources for them. Here, publicly accessible Scylla olivacea transcriptomic data were mined for putative peptide-encoding transcripts; the proteins deduced from the identified sequences were then used to predict the structures of mature peptide hormones. Forty-nine pre/preprohormone-encoding transcripts were identified, allowing for the prediction of 187 distinct mature peptides. The identified peptides included isoforms of adipokinetic hormone-corazonin-like peptide, allatostatin A, allatostatin B, allatostatin C, bursicon β, CCHamide, corazonin, crustacean cardioactive peptide, crustacean hyperglycemic hormone/molt-inhibiting hormone, diuretic hormone 31, eclosion hormone, FMRFamide-like peptide, HIGSLYRamide, insulin-like peptide, intocin, leucokinin, myosuppressin, neuroparsin, neuropeptide F, orcokinin, pigment dispersing hormone, pyrokinin, red pigment concentrating hormone, RYamide, short neuropeptide F, SIFamide and tachykinin-related peptide, all well-known neuropeptide families. Surprisingly, the tissue used to generate the transcriptome mined here is reported to be testis. Whether or not the testis samples had neural contamination is unknown. However, if the peptides are truly produced by this reproductive organ, it could have far reaching consequences for the study of crustacean endocrinology, particularly in the area of reproductive control. Regardless, this peptidome is the largest thus far predicted for any brachyuran (true crab) species, and will serve as a foundation for future studies of peptidergic control in members of the commercially important genus Scylla.

  12. Deep sequencing reveals microRNAs predictive of antiangiogenic drug response

    PubMed Central

    García-Donas, Jesús; Beuselinck, Benoit; Inglada-Pérez, Lucía; Graña, Osvaldo; Schöffski, Patrick; Wozniak, Agnieszka; Bechter, Oliver; Apellániz-Ruiz, Maria; Leandro-García, Luis Javier; Esteban, Emilio; Castellano, Daniel E.; González del Alba, Aranzazu; Climent, Miguel Angel; Hernando, Susana; Arranz, José Angel; Morente, Manuel; Pisano, David G.; Robledo, Mercedes

    2016-01-01

    The majority of metastatic renal cell carcinoma (RCC) patients are treated with tyrosine kinase inhibitors (TKI) in first-line treatment; however, a fraction are refractory to these antiangiogenic drugs. MicroRNAs (miRNAs) are regulatory molecules proven to be accurate biomarkers in cancer. Here, we identified miRNAs predictive of progressive disease under TKI treatment through deep sequencing of 74 metastatic clear cell RCC cases uniformly treated with these drugs. Twenty-nine miRNAs were differentially expressed in the tumors of patients who progressed under TKI therapy (P values from 6 × 10–9 to 3 × 10–3). Among 6 miRNAs selected for validation in an independent series, the most relevant associations corresponded to miR–1307-3p, miR–155-5p, and miR–221-3p (P = 4.6 × 10–3, 6.5 × 10–3, and 3.4 × 10–2, respectively). Furthermore, a 2 miRNA–based classifier discriminated individuals with progressive disease upon TKI treatment (AUC = 0.75, 95% CI, 0.64–0.85; P = 1.3 × 10–4) with better predictive value than clinicopathological risk factors commonly used. We also identified miRNAs significantly associated with progression-free survival and overall survival (P = 6.8 × 10–8 and 7.8 × 10–7 for top hits, respectively), and 7 overlapped with early progressive disease. In conclusion, this is the first miRNome comprehensive study, to our knowledge, that demonstrates a predictive value of miRNAs for TKI response and provides a new set of relevant markers that can help rationalize metastatic RCC treatment. PMID:27699216

  13. Deep sequencing reveals microRNAs predictive of antiangiogenic drug response

    PubMed Central

    García-Donas, Jesús; Beuselinck, Benoit; Inglada-Pérez, Lucía; Graña, Osvaldo; Schöffski, Patrick; Wozniak, Agnieszka; Bechter, Oliver; Apellániz-Ruiz, Maria; Leandro-García, Luis Javier; Esteban, Emilio; Castellano, Daniel E.; González del Alba, Aranzazu; Climent, Miguel Angel; Hernando, Susana; Arranz, José Angel; Morente, Manuel; Pisano, David G.; Robledo, Mercedes

    2016-01-01

    The majority of metastatic renal cell carcinoma (RCC) patients are treated with tyrosine kinase inhibitors (TKI) in first-line treatment; however, a fraction are refractory to these antiangiogenic drugs. MicroRNAs (miRNAs) are regulatory molecules proven to be accurate biomarkers in cancer. Here, we identified miRNAs predictive of progressive disease under TKI treatment through deep sequencing of 74 metastatic clear cell RCC cases uniformly treated with these drugs. Twenty-nine miRNAs were differentially expressed in the tumors of patients who progressed under TKI therapy (P values from 6 × 10–9 to 3 × 10–3). Among 6 miRNAs selected for validation in an independent series, the most relevant associations corresponded to miR–1307-3p, miR–155-5p, and miR–221-3p (P = 4.6 × 10–3, 6.5 × 10–3, and 3.4 × 10–2, respectively). Furthermore, a 2 miRNA–based classifier discriminated individuals with progressive disease upon TKI treatment (AUC = 0.75, 95% CI, 0.64–0.85; P = 1.3 × 10–4) with better predictive value than clinicopathological risk factors commonly used. We also identified miRNAs significantly associated with progression-free survival and overall survival (P = 6.8 × 10–8 and 7.8 × 10–7 for top hits, respectively), and 7 overlapped with early progressive disease. In conclusion, this is the first miRNome comprehensive study, to our knowledge, that demonstrates a predictive value of miRNAs for TKI response and provides a new set of relevant markers that can help rationalize metastatic RCC treatment.

  14. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, Heinz-Ulrich G.; Gray, Joe W.

    1995-01-01

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers ard probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity.

  15. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, H.U.G.; Gray, J.W.

    1995-06-27

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers and probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity. 18 figs.

  16. Unconventional amino acid sequence of the sun anemone (Stoichactis helianthus) polypeptide neurotoxin

    SciTech Connect

    Kem, W.; Dunn, B.; Parten, B.; Pennington, M.; Price, D.

    1986-05-01

    A 5000 dalton polypeptide neurotoxin (Sh-NI) purified by G50 Sephadex, P-cellulose, and SP-Sephadex chromatography was homogeneous by isoelectric focusing. Sh-NI was highly toxic to crayfish (LD/sub 50/ 0.6 ..mu..g/kg) but without effect upon mice at 15,000 ..mu..g/kg (i.p. injection). The reduced, /sup 3/H-carboxymethylated toxin and its fragments were subjected to automatic Edman degradation and the resulting PTH-amino acids were identified by HPLC, back hydrolysis, and scintillation counting. Peptides resulting from proteolytic (clostripain, staphylococcal protease) and chemical (tryptophan) cleavage were sequenced. The sequence is: AACKCDDEGPDIRTAPLTGTVDLGSCNAGWEKCASYYTIIADCCRKKK. This sequence differs considerably from the homologous Anemonia and Anthopleura toxins; many of the identical residues (6 half-cystines, G9, P10, R13, G19, G29, W30) are probably critical for folding rather than receptor recognition. However, the Sh-NI sequence closely resembles Radioanthus macrodactylus neurotoxin III and r. paumotensis II. The authors propose that Sh-NI and related Radioanthus toxins act upon a different site on the sodium channel.

  17. Sequence-defined bioactive macrocycles via an acid-catalysed cascade reaction

    NASA Astrophysics Data System (ADS)

    Porel, Mintu; Thornlow, Dana N.; Phan, Ngoc N.; Alabi, Christopher A.

    2016-06-01

    Synthetic macrocycles derived from sequence-defined oligomers are a unique structural class whose ring size, sequence and structure can be tuned via precise organization of the primary sequence. Similar to peptides and other peptidomimetics, these well-defined synthetic macromolecules become pharmacologically relevant when bioactive side chains are incorporated into their primary sequence. In this article, we report the synthesis of oligothioetheramide (oligoTEA) macrocycles via a one-pot acid-catalysed cascade reaction. The versatility of the cyclization chemistry and modularity of the assembly process was demonstrated via the synthesis of >20 diverse oligoTEA macrocycles. Structural characterization via NMR spectroscopy revealed the presence of conformational isomers, which enabled the determination of local chain dynamics within the macromolecular structure. Finally, we demonstrate the biological activity of oligoTEA macrocycles designed to mimic facially amphiphilic antimicrobial peptides. The preliminary results indicate that macrocyclic oligoTEAs with just two-to-three cationic charge centres can elicit potent antibacterial activity against Gram-positive and Gram-negative bacteria.

  18. A new antifungal peptide from the seeds of Phytolacca americana: characterization, amino acid sequence and cDNA cloning.

    PubMed

    Shao, F; Hu, Z; Xiong, Y M; Huang, Q Z; WangCG; Zhu, R H; Wang, D C

    1999-03-19

    An antifungal peptide from seeds of Phytolacca americana, designated PAFP-s, has been isolated. The peptide is highly basic and consists of 38 residues with three disulfide bridges. Its molecular mass of 3929.0 was determined by mass spectrometry. The complete amino acid sequence was obtained from automated Edman degradation, and cDNA cloning was successfully performed by 3'-RACE. The deduced amino acid sequence of a partial cDNA corresponded to the amino acid sequence from chemical sequencing. PAFP-s exhibited a broad spectrum of antifungal activity, and its activities differed among various fungi. PAFP-s displayed no inhibitory activity towards Escherichia coli. PAFP-s shows significant sequence similarities and the same cysteine motif with Mj-AMPs, antimicrobial peptides from seeds of Mirabilis jalapa belonging to the knottin-type antimicrobial peptide.

  19. Prediction of Thermostability from Amino Acid Attributes by Combination of Clustering with Attribute Weighting: A New Vista in Engineering Enzymes

    PubMed Central

    Ebrahimi, Mansour; Lakizadeh, Amir; Agha-Golzadeh, Parisa; Ebrahimie, Esmaeil; Ebrahimi, Mahdi

    2011-01-01

    The engineering of thermostable enzymes is receiving increased attention. The paper, detergent, and biofuel industries, in particular, seek to use environmentally friendly enzymes instead of toxic chlorine chemicals. Enzymes typically function at temperatures below 60°C and denature if exposed to higher temperatures. In contrast, a small portion of enzymes can withstand higher temperatures as a result of various structural adaptations. Understanding the protein attributes that are involved in this adaptation is the first step toward engineering thermostable enzymes. We employed various supervised and unsupervised machine learning algorithms as well as attribute weighting approaches to find amino acid composition attributes that contribute to enzyme thermostability. Specifically, we compared two groups of enzymes: mesostable and thermostable enzymes. Furthermore, a combination of attribute weighting with supervised and unsupervised clustering algorithms was used for prediction and modelling of protein thermostability from amino acid composition properties. Mining a large number of protein sequences (2090) through a variety of machine learning algorithms, which were based on the analysis of more than 800 amino acid attributes, increased the accuracy of this study. Moreover, these models were successful in predicting thermostability from the primary structure of proteins. The results showed that expectation maximization clustering in combination with uncertainly and correlation attribute weighting algorithms can effectively (100%) classify thermostable and mesostable proteins. Seventy per cent of the weighting methods selected Gln content and frequency of hydrophilic residues as the most important protein attributes. On the dipeptide level, the frequency of Asn-Glu was the key factor in distinguishing mesostable from thermostable enzymes. This study demonstrates the feasibility of predicting thermostability irrespective of sequence similarity and will serve as a

  20. Amino acid sequence and variant forms of favin, a lectin from Vicia faba.

    PubMed

    Hopp, T P; Hemperly, J J; Cunningham, B A

    1982-04-25

    We have determined the complete amino acid sequence (182 residues) of the beta chain of favin, the glucose-binding lectin from fava beans (Vicia faba), and have established that the carbohydrate moiety is attached to Asn 168. Together with the sequence of the alpha chain previously reported (Hemperly, J. J., Hopp, T. P., Becker, J. W., and Cunningham, B. A. (1979) J. Biol. Chem. 254, 6803-6810), these data complete the analysis of the primary structure of the lectin. We have also examined minor polypeptides that appear in all preparations of favin. Two lower molecular weight species (Mr = 9,500-11,600) appear to be fragments of the beta chain resulting from cleavage following Asn 76, whereas six high molecular weight forms (Mr = 25,000 or greater) appear to include aggregates of the beta chain and possibly some alternative products of chain processing. PMID:7068646

  1. Pyrosequencing on templates generated by asymmetric nucleic acid sequence-based amplification (asymmetric-NASBA).

    PubMed

    Jia, Huning; Chen, Zhiyao; Wu, Haiping; Ye, Hui; Yan, Zhengyu; Zhou, Guohua

    2011-12-21

    Pyrosequencing is an ideal tool for verifying the sequence of amplicons. To enable pyrosequencing on amplicons from nucleic acid sequence-based amplification (NASBA), asymmetric NASBA with unequal concentrations of T7 promoter primer and reverse transcription primer was proposed. By optimizing the ratio of two primers and the concentration of dNTPs and NTPs, the amount of single-stranded cDNA in the amplicons from asymmetric NASBA was found increased 12 times more than the conventional NASBA through the real-time detection of a molecular beacon specific to cDNA of interest. More than 20 bases have been successfully detected by pyrosequencing on amplicons from asymmetric NASBA using Human parainfluenza virus (HPIV) as an amplification template. The primary results indicate that the combination of NASBA with a pyrosequencing system is practical, and should open a new field in clinical diagnosis.

  2. Morphological tranformation of calcite crystal growth by prismatic "acidic" polypeptide sequences.

    SciTech Connect

    Kim, I; Giocondi, J L; Orme, C A; Collino, J; Evans, J S

    2007-02-13

    Many of the interesting mechanical and materials properties of the mollusk shell are thought to stem from the prismatic calcite crystal assemblies within this composite structure. It is now evident that proteins play a major role in the formation of these assemblies. Recently, a superfamily of 7 conserved prismatic layer-specific mollusk shell proteins, Asprich, were sequenced, and the 42 AA C-terminal sequence region of this protein superfamily was found to introduce surface voids or porosities on calcite crystals in vitro. Using AFM imaging techniques, we further investigate the effect that this 42 AA domain (Fragment-2) and its constituent subdomains, DEAD-17 and Acidic-2, have on the morphology and growth kinetics of calcite dislocation hillocks. We find that Fragment-2 adsorbs on terrace surfaces and pins acute steps, accelerates then decelerates the growth of obtuse steps, forms clusters and voids on terrace surfaces, and transforms calcite hillock morphology from a rhombohedral form to a rounded one. These results mirror yet are distinct from some of the earlier findings obtained for nacreous polypeptides. The subdomains Acidic-2 and DEAD-17 were found to accelerate then decelerate obtuse steps and induce oval rather than rounded hillock morphologies. Unlike DEAD-17, Acidic-2 does form clusters on terrace surfaces and exhibits stronger obtuse velocity inhibition effects than either DEAD-17 or Fragment-2. Interestingly, a 1:1 mixture of both subdomains induces an irregular polygonal morphology to hillocks, and exhibits the highest degree of acute step pinning and obtuse step velocity inhibition. This suggests that there is some interplay between subdomains within an intra (Fragment-2) or intermolecular (1:1 mixture) context, and sequence interplay phenomena may be employed by biomineralization proteins to exert net effects on crystal growth and morphology.

  3. Amino acid sequence similarity of the VP7 protein of human rotavirus HCR3 to that of canine and feline rotaviruses.

    PubMed

    Li, B; Clark, H F; Gouvea, V

    1994-01-01

    The sequence of the VP7 gene of human rotavirus strain HCR3 was determined and its predicted amino acid sequence was compared with that of other rotavirus strains. The VP7 gene is 1062 nucleotides long and contains a single open reading frame of 981 nucleotides capable of encoding a protein of 326 amino acids. The VP7 amino acid sequence similarity of strain HCR3 to those of various human and animal G3 serotypes ranged from 88.7 to 99.4%, and from 60.4 to 88.3% to strains representing each of the other 13 G serotypes. Alignment of four variable regions [VR4, VR5(A), VR7(B) and VR8(C)] of HCR3 with those of G3 strains of different host species showed that HCR3 possesses a sequence almost identical to that of canine rotaviruses and feline rotavirus strain CAT97 in all four regions. A considerable divergence in regions VR4, VR7(B) and VR8(C) was found with strains of human, mouse, monkey, horse and rabbit rotaviruses. This observation together with results of our previous study on VP4 indicated that human rotavirus HCR3 is genetically more closely related to animal rotaviruses than to other human rotaviruses. PMID:8113730

  4. The amino-acid sequences of sculpin islet somatostatin-28 and peptide YY.

    PubMed

    Cutfield, S M; Carne, A; Cutfield, J F

    1987-04-01

    Two pancreatic peptides, somatostatin-28 and peptide YY, have been isolated from the Brockmann bodies of the teleost fish Cottus scorpius (daddy sculpin). Following purification by reverse-phase HPLC, each peptide was sequenced completely through to the carboxyl-terminus by gas-phase Edman degradation. Somatostatin-28 was the major form of somatostatin detected and is similar to the gene II product from anglerfish. Peptide YY (36 amino acids) more closely resembles porcine neuropeptide YY and intestinal peptide YY than it does the pancreatic polypeptides. PMID:2883025

  5. Sequence selective recognition of double-stranded RNA using triple helix-forming peptide nucleic acids.

    PubMed

    Zengeya, Thomas; Gupta, Pankaj; Rozners, Eriks

    2014-01-01

    Noncoding RNAs are attractive targets for molecular recognition because of the central role they play in gene expression. Since most noncoding RNAs are in a double-helical conformation, recognition of such structures is a formidable problem. Herein, we describe a method for sequence-selective recognition of biologically relevant double-helical RNA (illustrated on ribosomal A-site RNA) using peptide nucleic acids (PNA) that form a triple helix in the major grove of RNA under physiologically relevant conditions. Protocols for PNA preparation and binding studies using isothermal titration calorimetry are described in detail.

  6. Sequence selective double strand DNA cleavage by peptide nucleic acid (PNA) targeting using nuclease S1.

    PubMed Central

    Demidov, V; Frank-Kamenetskii, M D; Egholm, M; Buchardt, O; Nielsen, P E

    1993-01-01

    A novel method for sequence specific double strand DNA cleavage using PNA (peptide nucleic acid) targeting is described. Nuclease S1 digestion of double stranded DNA gives rise to double strand cleavage at an occupied PNA strand displacement binding site, and under optimized conditions complete cleavage can be obtained. The efficiency of this cleavage is more than 10 fold enhanced when a tandem PNA site is targeted, and additionally enhanced if this site is in trans rather than in cis orientation. Thus in effect, the PNA targeting makes the single strand specific nuclease S1 behave like a pseudo restriction endonuclease. Images PMID:8502550

  7. WinGene/WinPep: user-friendly software for the analysis of amino acid sequences.

    PubMed

    Hennig, L

    1999-06-01

    WinGene1.0/WinPep1.2 is a pair of Microsoft Windows programs designed to read nucleotide or amino acid sequence data. These versatile programs have the following capabilities: (i) searches for open reading frames and their translation, (ii) assisting the design of primers for PCR and (iii) calculation of molecular weight, isoelectric point and molar absorbtion coefficients of polypeptides. Furthermore, hydropathic plots and helical wheel displays are easily produced. The programs run with an intuitive Windows interface, contain a comprehensive help file and enable data exchange with other applications by means of the Copy&Paste command. The software is free for academic and noncommercial users.

  8. Complete genome sequence of Lactococcus lactis IO-1, a lactic acid bacterium that utilizes xylose and produces high levels of L-lactic acid.

    PubMed

    Kato, Hiroaki; Shiwa, Yuh; Oshima, Kenshiro; Machii, Miki; Araya-Kojima, Tomoko; Zendo, Takeshi; Shimizu-Kadota, Mariko; Hattori, Masahira; Sonomoto, Kenji; Yoshikawa, Hirofumi

    2012-04-01

    We report the complete genome sequence of Lactococcus lactis IO-1 (= JCM7638). It is a nondairy lactic acid bacterium, produces nisin Z, ferments xylose, and produces predominantly L-lactic acid at high xylose concentrations. From ortholog analysis with other five L. lactis strains, IO-1 was identified as L. lactis subsp. lactis.

  9. Purification and amino acid sequence of aminopeptidase P from pig kidney.

    PubMed

    Vergas Romero, C; Neudorfer, I; Mann, K; Schäfer, W

    1995-04-01

    Aminopeptidase P from kidney cortex was purified in high yield (recovery greater than or equal to 20%) by a series of column chromatographic steps after solubilization of the membrane-bound glycoprotein with n-butanol. A coupled enzymic assay, using Gly-Pro-Pro-NH-Nap as substrate and dipeptidyl-peptidase IV as auxilliary enzyme, was used to monitor the purification. The purification procedure yielded two forms of aminopeptidase P differing in their carbohydrate composition (glycoforms). Both enzyme preparations were homogeneous as assessed by SDS/PAGE silver staining, and isoelectric focusing. Both forms possessed the same substrate specificity, catalysed the same reaction, and consisted of identical protein chains. The amino acid sequence determined by Edman degradation and mass spectrometry consisted of 623 amino acids. Six N-glycosylation sites, all contained in the N-terminal half of the protein, were characterized. PMID:7744038

  10. Mass spectrometric detection of the amino acid sequence polymorphism of the hepatitis C virus antigen.

    PubMed

    Kaysheva, A L; Ivanov, Yu D; Frantsuzov, P A; Krohin, N V; Pavlova, T I; Uchaikin, V F; Konev, V А; Kovalev, O B; Ziborov, V S; Archakov, A I

    2016-03-01

    A method for detection and identification of the hepatitis C virus antigen (HCVcoreAg) in human serum with consideration for possible amino acid substitutions is proposed. The method is based on a combination of biospecific capturing and concentrating of the target protein on the surface of the chip for atomic force microscope (AFM chip) with subsequent protein identification by tandem mass spectrometric (MS/MS) analysis. Biospecific AFM-capturing of viral particles containing HCVcoreAg from serum samples was performed by use of AFM chips with monoclonal antibodies (anti-HCVcore) covalently immobilized on the surface. Biospecific complexes were registered and counted by AFM. Further MS/MS analysis allowed to reliably identify the HCVcoreAg in the complexes formed on the AFM chip surface. Analysis of MS/MS spectra, with the account taken of the possible polymorphisms in the amino acid sequence of the HCVcoreAg, enabled us to increase the number of identified peptides.

  11. High Dietary Acid Load Predicts ESRD among Adults with CKD.

    PubMed

    Banerjee, Tanushree; Crews, Deidra C; Wesson, Donald E; Tilea, Anca M; Saran, Rajiv; Ríos-Burrows, Nilka; Williams, Desmond E; Powe, Neil R

    2015-07-01

    Small clinical trials have shown that a reduction in dietary acid load (DAL) improves kidney injury and slows kidney function decline; however, the relationship between DAL and risk of ESRD in a population-based cohort with CKD remains unexamined. We examined the association between DAL, quantified by net acid excretion (NAEes), and progression to ESRD in a nationally representative sample of adults in the United States. Among 1486 adults with CKD age≥20 years enrolled in the National Health and Nutrition Examination Survey III, DAL was determined by 24-h dietary recall questionnaire. The development of ESRD was ascertained over a median 14.2 years of follow-up through linkage with the Medicare ESRD Registry. We used the Fine-Gray competing risks method to estimate the association of high, medium, and low DAL with ESRD after adjusting for demographics, nutritional factors, clinical factors, and kidney function/damage markers and accounting for intervening mortality events. In total, 311 (20.9%) participants developed ESRD. Higher levels of DAL were associated with increased risk of ESRD; relative hazards (95% confidence interval) were 3.04 (1.58 to 5.86) for the highest tertile and 1.81 (0.89 to 3.68) for the middle tertile compared with the lowest tertile in the fully adjusted model. The risk of ESRD associated with DAL tertiles increased as eGFR decreased (P trend=0.001). Among participants with albuminuria, high DAL was strongly associated with ESRD risk (P trend=0.03). In conclusion, high DAL in persons with CKD is independently associated with increased risk of ESRD in a nationally representative population.

  12. High Dietary Acid Load Predicts ESRD among Adults with CKD

    PubMed Central

    Crews, Deidra C.; Wesson, Donald E.; Tilea, Anca M.; Saran, Rajiv; Ríos-Burrows, Nilka; Williams, Desmond E.; Powe, Neil R.

    2015-01-01

    Small clinical trials have shown that a reduction in dietary acid load (DAL) improves kidney injury and slows kidney function decline; however, the relationship between DAL and risk of ESRD in a population-based cohort with CKD remains unexamined. We examined the association between DAL, quantified by net acid excretion (NAEes), and progression to ESRD in a nationally representative sample of adults in the United States. Among 1486 adults with CKD age≥20 years enrolled in the National Health and Nutrition Examination Survey III, DAL was determined by 24-h dietary recall questionnaire. The development of ESRD was ascertained over a median 14.2 years of follow-up through linkage with the Medicare ESRD Registry. We used the Fine–Gray competing risks method to estimate the association of high, medium, and low DAL with ESRD after adjusting for demographics, nutritional factors, clinical factors, and kidney function/damage markers and accounting for intervening mortality events. In total, 311 (20.9%) participants developed ESRD. Higher levels of DAL were associated with increased risk of ESRD; relative hazards (95% confidence interval) were 3.04 (1.58 to 5.86) for the highest tertile and 1.81 (0.89 to 3.68) for the middle tertile compared with the lowest tertile in the fully adjusted model. The risk of ESRD associated with DAL tertiles increased as eGFR decreased (P trend=0.001). Among participants with albuminuria, high DAL was strongly associated with ESRD risk (P trend=0.03). In conclusion, high DAL in persons with CKD is independently associated with increased risk of ESRD in a nationally representative population. PMID:25677388

  13. Integrating bioinformatic resources to predict transcription factors interacting with cis-sequences conserved in co-regulated genes

    PubMed Central

    2014-01-01

    Background Using motif detection programs it is fairly straightforward to identify conserved cis-sequences in promoters of co-regulated genes. In contrast, the identification of the transcription factors (TFs) interacting with these cis-sequences is much more elaborate. To facilitate this, we explore the possibility of using several bioinformatic and experimental approaches for TF identification. This starts with the selection of co-regulated gene sets and leads first to the prediction and then to the experimental validation of TFs interacting with cis-sequences conserved in the promoters of these co-regulated genes. Results Using the PathoPlant database, 32 up-regulated gene groups were identified with microarray data for drought-responsive gene expression from Arabidopsis thaliana. Application of the binding site estimation suite of tools (BEST) discovered 179 conserved sequence motifs within the corresponding promoters. Using the STAMP web-server, 49 sequence motifs were classified into 7 motif families for which similarities with known cis-regulatory sequences were identified. All motifs were subjected to a footprintDB analysis to predict interacting DNA binding domains from plant TF families. Predictions were confirmed by using a yeast-one-hybrid approach to select interacting TFs belonging to the predicted TF families. TF-DNA interactions were further experimentally validated in yeast and with a Physcomitrella patens transient expression system, leading to the discovery of several novel TF-DNA interactions. Conclusions The present work demonstrates the successful integration of several bioinformatic resources with experimental approaches to predict and validate TFs interacting with conserved sequence motifs in co-regulated genes. PMID:24773781

  14. Accurate Prediction of the Statistics of Repetitions in Random Sequences: A Case Study in Archaea Genomes

    PubMed Central

    Régnier, Mireille; Chassignet, Philippe

    2016-01-01

    Repetitive patterns in genomic sequences have a great biological significance and also algorithmic implications. Analytic combinatorics allow to derive formula for the expected length of repetitions in a random sequence. Asymptotic results, which generalize previous works on a binary alphabet, are easily computable. Simulations on random sequences show their accuracy. As an application, the sample case of Archaea genomes illustrates how biological sequences may differ from random sequences. PMID:27376057

  15. Accurate Prediction of the Statistics of Repetitions in Random Sequences: A Case Study in Archaea Genomes.

    PubMed

    Régnier, Mireille; Chassignet, Philippe

    2016-01-01

    Repetitive patterns in genomic sequences have a great biological significance and also algorithmic implications. Analytic combinatorics allow to derive formula for the expected length of repetitions in a random sequence. Asymptotic results, which generalize previous works on a binary alphabet, are easily computable. Simulations on random sequences show their accuracy. As an application, the sample case of Archaea genomes illustrates how biological sequences may differ from random sequences.

  16. Accurate Prediction of the Statistics of Repetitions in Random Sequences: A Case Study in Archaea Genomes.

    PubMed

    Régnier, Mireille; Chassignet, Philippe

    2016-01-01

    Repetitive patterns in genomic sequences have a great biological significance and also algorithmic implications. Analytic combinatorics allow to derive formula for the expected length of repetitions in a random sequence. Asymptotic results, which generalize previous works on a binary alphabet, are easily computable. Simulations on random sequences show their accuracy. As an application, the sample case of Archaea genomes illustrates how biological sequences may differ from random sequences. PMID:27376057

  17. Whole-Genome Sequencing Analysis Accurately Predicts Antimicrobial Resistance Phenotypes in Campylobacter spp.

    PubMed Central

    Tyson, G. H.; Chen, Y.; Li, C.; Mukherjee, S.; Young, S.; Lam, C.; Folster, J. P.; Whichard, J. M.; McDermott, P. F.

    2015-01-01

    The objectives of this study were to identify antimicrobial resistance genotypes for Campylobacter and to evaluate the correlation between resistance phenotypes and genotypes using in vitro antimicrobial susceptibility testing and whole-genome sequencing (WGS). A total of 114 Campylobacter species isolates (82 C. coli and 32 C. jejuni) obtained from 2000 to 2013 from humans, retail meats, and cecal samples from food production animals in the United States as part of the National Antimicrobial Resistance Monitoring System were selected for study. Resistance phenotypes were determined using broth microdilution of nine antimicrobials. Genomic DNA was sequenced using the Illumina MiSeq platform, and resistance genotypes were identified using assembled WGS sequences through blastx analysis. Eighteen resistance genes, including tet(O), blaOXA-61, catA, lnu(C), aph(2″)-Ib, aph(2″)-Ic, aph(2′)-If, aph(2″)-Ig, aph(2″)-Ih, aac(6′)-Ie-aph(2″)-Ia, aac(6′)-Ie-aph(2″)-If, aac(6′)-Im, aadE, sat4, ant(6′), aad9, aph(3′)-Ic, and aph(3′)-IIIa, and mutations in two housekeeping genes (gyrA and 23S rRNA) were identified. There was a high degree of correlation between phenotypic resistance to a given drug and the presence of one or more corresponding resistance genes. Phenotypic and genotypic correlation was 100% for tetracycline, ciprofloxacin/nalidixic acid, and erythromycin, and correlations ranged from 95.4% to 98.7% for gentamicin, azithromycin, clindamycin, and telithromycin. All isolates were susceptible to florfenicol, and no genes associated with florfenicol resistance were detected. There was a strong correlation (99.2%) between resistance genotypes and phenotypes, suggesting that WGS is a reliable indicator of resistance to the nine antimicrobial agents assayed in this study. WGS has the potential to be a powerful tool for antimicrobial resistance surveillance programs. PMID:26519386

  18. Exome Sequencing and Prediction of Long-Term Kidney Allograft Function

    PubMed Central

    Mesnard, Laurent; Muthukumar, Thangamani; Burbach, Maren; Li, Carol; Shang, Huimin; Dadhania, Darshana; Lee, John R.; Xiang, Jenny; Suberbielle, Caroline; Carmagnat, Maryvonnick; Ouali, Nacera; Rondeau, Eric; Abecassis, Michael M.; Suthanthiran, Manikkam

    2016-01-01

    Current strategies to improve graft outcome following kidney transplantation consider information at the human leukocyte antigen (HLA) loci. Cell surface antigens, in addition to HLA, may serve as the stimuli as well as the targets for the anti-allograft immune response and influence long-term graft outcomes. We therefore performed exome sequencing of DNA from kidney graft recipients and their living donors and estimated all possible cell surface antigens mismatches for a given donor/recipient pair by computing the number of amino acid mismatches in trans-membrane proteins. We designated this tally as the allogenomics mismatch score (AMS). We examined the association between the AMS and post-transplant estimated glomerular filtration rate (eGFR) using mixed models, considering transplants from three independent cohorts (a total of 53 donor-recipient pairs, 106 exomes, and 239 eGFR measurements). We found that the AMS has a significant effect on eGFR (mixed model, effect size across the entire range of the score: -19.4 [-37.7, -1.1], P = 0.0042, χ2 = 8.1919, d.f. = 1) that is independent of the HLA-A, B, DR matching, donor age, and time post-transplantation. The AMS effect is consistent across the three independent cohorts studied and similar to the strong effect size of donor age. Taken together, these results show that the AMS, a novel tool to quantify amino acid mismatches in trans-membrane proteins in individual donor/recipient pair, is a strong, robust predictor of long-term graft function in kidney transplant recipients. PMID:27684477

  19. Draft Genome Sequence of Bacillus subtilis subsp. natto Strain CGMCC 2108, a High Producer of Poly-γ-Glutamic Acid

    PubMed Central

    Tan, Siyuan; Su, Anping; Zhang, Chen; Ren, Yuanyuan

    2016-01-01

    Here, we report the 4.1-Mb draft genome sequence of Bacillus subtilis subsp. natto strain CGMCC 2108, a high producer of poly-γ-glutamic acid (γ-PGA). This sequence will provide further help for the biosynthesis of γ-PGA and will greatly facilitate research efforts in metabolic engineering of B. subtilis subsp. natto strain CGMCC 2108. PMID:27231363

  20. WAViS server for handling, visualization and presentation of multiple alignments of nucleotide or amino acids sequences.

    PubMed

    Zika, Radek; Paces, Jan; Pavlícek, Adam; Paces, Václav

    2004-07-01

    Web Alignment Visualization Server contains a set of web-tools designed for quick generation of publication-quality color figures of multiple alignments of nucleotide or amino acids sequences. It can be used for identification of conserved regions and gaps within many sequences using only common web browsers. The server is accessible at http://wavis.img.cas.cz.

  1. ANTICALIgN: visualizing, editing and analyzing combined nucleotide and amino acid sequence alignments for combinatorial protein engineering.

    PubMed

    Jarasch, Alexander; Kopp, Melanie; Eggenstein, Evelyn; Richter, Antonia; Gebauer, Michaela; Skerra, Arne

    2016-07-01

    ANTIC ALIGN: is an interactive software developed to simultaneously visualize, analyze and modify alignments of DNA and/or protein sequences that arise during combinatorial protein engineering, design and selection. ANTIC ALIGN: combines powerful functions known from currently available sequence analysis tools with unique features for protein engineering, in particular the possibility to display and manipulate nucleotide sequences and their translated amino acid sequences at the same time. ANTIC ALIGN: offers both template-based multiple sequence alignment (MSA), using the unmutated protein as reference, and conventional global alignment, to compare sequences that share an evolutionary relationship. The application of similarity-based clustering algorithms facilitates the identification of duplicates or of conserved sequence features among a set of selected clones. Imported nucleotide sequences from DNA sequence analysis are automatically translated into the corresponding amino acid sequences and displayed, offering numerous options for selecting reading frames, highlighting of sequence features and graphical layout of the MSA. The MSA complexity can be reduced by hiding the conserved nucleotide and/or amino acid residues, thus putting emphasis on the relevant mutated positions. ANTIC ALIGN: is also able to handle suppressed stop codons or even to incorporate non-natural amino acids into a coding sequence. We demonstrate crucial functions of ANTIC ALIGN: in an example of Anticalins selected from a lipocalin random library against the fibronectin extradomain B (ED-B), an established marker of tumor vasculature. Apart from engineered protein scaffolds, ANTIC ALIGN: provides a powerful tool in the area of antibody engineering and for directed enzyme evolution.

  2. 3-d structure-based amino acid sequence alignment of esterases, lipases and related proteins

    SciTech Connect

    Gentry, M.K.; Doctor, B.P.; Cygler, M.; Schrag, J.D.; Sussman, J.L.

    1993-05-13

    Acetylcholinesterase and butyrylcholinesterase, enzymes with potential as pretreatment drugs for organophosphate toxicity, are members of a larger family of homologous proteins that includes carboxylesterases, cholesterol esterases, lipases, and several nonhydrolytic proteins. A computer-generated alignment of 18 of the proteins, the acetylcholinesases, butyrylcholinesterases, carboxylesterases, some esterases, and the nonenzymatic proteins has been previously presented. More recently, the three-dimensional structures of two enzymes enzymes in this group, acetylcholinesterase from Torpedo californica and lipase from Geotrichum candidum, have been determined. Based on the x-ray structures and the superposition of these two enzymes, it was possible to obtain an improved amino acid sequence alignment of 32 members of this family of proteins. Examination of this alignment reveals that 24 amino acids are invariant in all of the hydrolytic proteins, and an additional 49 are well conserved. Conserved amino acids include those of the active site, the disulfide bridges, the salt bridges, in the core of the proteins, and at the edges of secondary structural elements. Comparison of the three-dimensional structures makes it possible to find a well-defined structural basis for the conservation of many of these amino acids.

  3. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Support Vector Machine-Pairwise Algorithm Utilizing LZ-Complexity

    PubMed Central

    Shahrudin, Shahriza

    2015-01-01

    This study concerns an attempt to establish a new method for predicting antimicrobial peptides (AMPs) which are important to the immune system. Recently, researchers are interested in designing alternative drugs based on AMPs because they have found that a large number of bacterial strains have become resistant to available antibiotics. However, researchers have encountered obstacles in the AMPs designing process as experiments to extract AMPs from protein sequences are costly and require a long set-up time. Therefore, a computational tool for AMPs prediction is needed to resolve this problem. In this study, an integrated algorithm is newly introduced to predict AMPs by integrating sequence alignment and support vector machine- (SVM-) LZ complexity pairwise algorithm. It was observed that, when all sequences in the training set are used, the sensitivity of the proposed algorithm is 95.28% in jackknife test and 87.59% in independent test, while the sensitivity obtained for jackknife test and independent test is 88.74% and 78.70%, respectively, when only the sequences that has less than 70% similarity are used. Applying the proposed algorithm may allow researchers to effectively predict AMPs from unknown protein peptide sequences with higher sensitivity. PMID:25802839

  4. Sequence comparison, molecular modeling, and network analysis predict structural diversity in cysteine proteases from the Cape sundew, Drosera capensis.

    PubMed

    Butts, Carter T; Zhang, Xuhong; Kelly, John E; Roskamp, Kyle W; Unhelkar, Megha H; Freites, J Alfredo; Tahir, Seemal; Martin, Rachel W

    2016-01-01

    Carnivorous plants represent a so far underexploited reservoir of novel proteases with potentially useful activities. Here we investigate 44 cysteine proteases from the Cape sundew, Drosera capensis, predicted from genomic DNA sequences. D. capensis has a large number of cysteine protease genes; analysis of their sequences reveals homologs of known plant proteases, some of which are predicted to have novel properties. Many functionally significant sequence and structural features are observed, including targeting signals and occluding loops. Several of the proteases contain a new type of granulin domain. Although active site residues are conserved, the sequence identity of these proteases to known proteins is moderate to low; therefore, comparative modeling with all-atom refinement and subsequent atomistic MD-simulation is used to predict their 3D structures. The structure prediction data, as well as analysis of protein structure networks, suggest multifarious variations on the papain-like cysteine protease structural theme. This in silico methodology provides a general framework for investigating a large pool of sequences that are potentially useful for biotechnology applications, enabling informed choices about which proteins to investigate in the laboratory. PMID:27471585

  5. Multiple Amino Acid Sequence Alignment Nitrogenase Component 1: Insights into Phylogenetics and Structure-Function Relationships

    PubMed Central

    Howard, James B.; Kechris, Katerina J.; Rees, Douglas C.; Glazer, Alexander N.

    2013-01-01

    Amino acid residues critical for a protein's structure-function are retained by natural selection and these residues are identified by the level of variance in co-aligned homologous protein sequences. The relevant residues in the nitrogen fixation Component 1 α- and β-subunits were identified by the alignment of 95 protein sequences. Proteins were included from species encompassing multiple microbial phyla and diverse ecological niches as well as the nitrogen fixation genotypes, anf, nif, and vnf, which encode proteins associated with cofactors differing at one metal site. After adjusting for differences in sequence length, insertions, and deletions, the remaining >85% of the sequence co-aligned the subunits from the three genotypes. Six Groups, designated Anf, Vnf , and Nif I-IV, were assigned based upon genetic origin, sequence adjustments, and conserved residues. Both subunits subdivided into the same groups. Invariant and single variant residues were identified and were defined as “core” for nitrogenase function. Three species in Group Nif-III, Candidatus Desulforudis audaxviator, Desulfotomaculum kuznetsovii, and Thermodesulfatator indicus, were found to have a seleno-cysteine that replaces one cysteinyl ligand of the 8Fe:7S, P-cluster. Subsets of invariant residues, limited to individual groups, were identified; these unique residues help identify the gene of origin (anf, nif, or vnf) yet should not be considered diagnostic of the metal content of associated cofactors. Fourteen of the 19 residues that compose the cofactor pocket are invariant or single variant; the other five residues are highly variable but do not correlate with the putative metal content of the cofactor. The variable residues are clustered on one side of the cofactor, away from other functional centers in the three dimensional structure. Many of the invariant and single variant residues were not previously recognized as potentially critical and their identification provides the bases

  6. Probability of coding of a DNA sequence: an algorithm to predict translated reading frames from their thermodynamic characteristics.

    PubMed Central

    Tramontano, A; Macchiato, M F

    1986-01-01

    An algorithm to determine the probability that a reading frame codifies for a protein is presented. It is based on the results of our previous studies on the thermodynamic characteristics of a translated reading frame. We also develop a prediction procedure to distinguish between coding and non-coding reading frames. The procedure is based on the characteristics of the putative product of the DNA sequence and not on periodicity characteristics of the sequence, so the prediction is not biased by the presence of overlapping translated reading frames or by the presence of translated reading frames on the complementary DNA strand. PMID:3753761

  7. Inter-Protein Sequence Co-Evolution Predicts Known Physical Interactions in Bacterial Ribosomes and the Trp Operon

    PubMed Central

    Feinauer, Christoph; Szurmant, Hendrik; Weigt, Martin; Pagnani, Andrea

    2016-01-01

    Interaction between proteins is a fundamental mechanism that underlies virtually all biological processes. Many important interactions are conserved across a large variety of species. The need to maintain interaction leads to a high degree of co-evolution between residues in the interface between partner proteins. The inference of protein-protein interaction networks from the rapidly growing sequence databases is one of the most formidable tasks in systems biology today. We propose here a novel approach based on the Direct-Coupling Analysis of the co-evolution between inter-protein residue pairs. We use ribosomal and trp operon proteins as test cases: For the small resp. large ribosomal subunit our approach predicts protein-interaction partners at a true-positive rate of 70% resp. 90% within the first 10 predictions, with areas of 0.69 resp. 0.81 under the ROC curves for all predictions. In the trp operon, it assigns the two largest interaction scores to the only two interactions experimentally known. On the level of residue interactions we show that for both the small and the large ribosomal subunit our approach predicts interacting residues in the system with a true positive rate of 60% and 85% in the first 20 predictions. We use artificial data to show that the performance of our approach depends crucially on the size of the joint multiple sequence alignments and analyze how many sequences would be necessary for a perfect prediction if the sequences were sampled from the same model that we use for prediction. Given the performance of our approach on the test data we speculate that it can be used to detect new interactions, especially in the light of the rapid growth of available sequence data. PMID:26882169

  8. Inter-Protein Sequence Co-Evolution Predicts Known Physical Interactions in Bacterial Ribosomes and the Trp Operon.

    PubMed

    Feinauer, Christoph; Szurmant, Hendrik; Weigt, Martin; Pagnani, Andrea

    2016-01-01

    Interaction between proteins is a fundamental mechanism that underlies virtually all biological processes. Many important interactions are conserved across a large variety of species. The need to maintain interaction leads to a high degree of co-evolution between residues in the interface between partner proteins. The inference of protein-protein interaction networks from the rapidly growing sequence databases is one of the most formidable tasks in systems biology today. We propose here a novel approach based on the Direct-Coupling Analysis of the co-evolution between inter-protein residue pairs. We use ribosomal and trp operon proteins as test cases: For the small resp. large ribosomal subunit our approach predicts protein-interaction partners at a true-positive rate of 70% resp. 90% within the first 10 predictions, with areas of 0.69 resp. 0.81 under the ROC curves for all predictions. In the trp operon, it assigns the two largest interaction scores to the only two interactions experimentally known. On the level of residue interactions we show that for both the small and the large ribosomal subunit our approach predicts interacting residues in the system with a true positive rate of 60% and 85% in the first 20 predictions. We use artificial data to show that the performance of our approach depends crucially on the size of the joint multiple sequence alignments and analyze how many sequences would be necessary for a perfect prediction if the sequences were sampled from the same model that we use for prediction. Given the performance of our approach on the test data we speculate that it can be used to detect new interactions, especially in the light of the rapid growth of available sequence data.

  9. Inter-Protein Sequence Co-Evolution Predicts Known Physical Interactions in Bacterial Ribosomes and the Trp Operon.

    PubMed

    Feinauer, Christoph; Szurmant, Hendrik; Weigt, Martin; Pagnani, Andrea

    2016-01-01

    Interaction between proteins is a fundamental mechanism that underlies virtually all biological processes. Many important interactions are conserved across a large variety of species. The need to maintain interaction leads to a high degree of co-evolution between residues in the interface between partner proteins. The inference of protein-protein interaction networks from the rapidly growing sequence databases is one of the most formidable tasks in systems biology today. We propose here a novel approach based on the Direct-Coupling Analysis of the co-evolution between inter-protein residue pairs. We use ribosomal and trp operon proteins as test cases: For the small resp. large ribosomal subunit our approach predicts protein-interaction partners at a true-positive rate of 70% resp. 90% within the first 10 predictions, with areas of 0.69 resp. 0.81 under the ROC curves for all predictions. In the trp operon, it assigns the two largest interaction scores to the only two interactions experimentally known. On the level of residue interactions we show that for both the small and the large ribosomal subunit our approach predicts interacting residues in the system with a true positive rate of 60% and 85% in the first 20 predictions. We use artificial data to show that the performance of our approach depends crucially on the size of the joint multiple sequence alignments and analyze how many sequences would be necessary for a perfect prediction if the sequences were sampled from the same model that we use for prediction. Given the performance of our approach on the test data we speculate that it can be used to detect new interactions, especially in the light of the rapid growth of available sequence data. PMID:26882169

  10. Incorporating predicted functions of nonsynonymous variants into gene-based analysis of exome sequencing data: a comparative study

    PubMed Central

    2011-01-01

    Next-generation sequencing has opened up new avenues for the genetic study of complex traits. However, because of the small number of observations for any given rare allele and high sequencing error, it is a challenge to identify functional rare variants associated with the phenotype of interest. Recent research shows that grouping variants by gene and incorporating computationally predicted functions of variants may provide higher statistical power. On the other hand, many algorithms are available for predicting the damaging effects of nonsynonymous variants. Here, we use the simulated mini-exome data of Genetic Analysis Workshop 17 to study and compare the effects of incorporating the functional predictions of single-nucleotide polymorphisms using two popular algorithms, SIFT and PolyPhen-2, into a gene-based association test. We also propose a simple mixture model that can effectively combine test results based on different functional prediction algorithms. PMID:22373178

  11. Draft Genome Sequences of Gluconobacter cerinus CECT 9110 and Gluconobacter japonicus CECT 8443, Acetic Acid Bacteria Isolated from Grape Must

    PubMed Central

    Sainz, Florencia

    2016-01-01

    We report here the draft genome sequences of Gluconobacter cerinus strain CECT9110 and Gluconobacter japonicus CECT8443, acetic acid bacteria isolated from grape must. Gluconobacter species are well known for their ability to oxidize sugar alcohols into the corresponding acids. Our objective was to select strains to oxidize effectively d-glucose. PMID:27365351

  12. Molecular cloning, encoding sequence, and expression of vaccinia virus nucleic acid-dependent nucleoside triphosphatase gene.

    PubMed Central

    Rodriguez, J F; Kahn, J S; Esteban, M

    1986-01-01

    A rabbit poxvirus genomic library contained within the expression vector lambda gt11 was screened with polyclonal antiserum prepared against vaccinia virus nucleic acid-dependent nucleoside triphosphatase (NTPase)-I enzyme. Five positive phage clones containing from 0.72- to 2.5-kilobase-pair (kbp) inserts expressed a beta-galactosidase fusion protein that was reactive by immunoblotting with the NTPase-I antibody. Hybridization analysis allowed the location of this gene within the vaccinia HindIIID restriction fragment. From the known nucleotide sequence of the 16-kbp vaccinia HindIIID fragment, we identified a region that contains a 1896-base open reading frame coding for a 631-amino acid protein. Analysis of the complete sequence revealed a highly basic protein, with hydrophilic COOH and NH2 termini, various hydrophobic domains, and no significant homology to other known proteins. Translational studies demonstrate that NTPase-I belongs to a late class of viral genes. This protein is highly conserved among Orthopoxviruses. Images PMID:3025846

  13. Partial amino acid sequences around sulfhydryl groups of soybean beta-amylase.

    PubMed

    Nomura, K; Mikami, B; Morita, Y

    1987-08-01

    Sulfhydryl (SH) groups of soybean beta-amylase were modified with 5-(iodoaceto-amidoethyl)aminonaphthalene-1-sulfonate (IAEDANS) and the SH-containing peptides exhibiting fluorescence were purified after chymotryptic digestion of the modified enzyme. The sequence analysis of the peptides derived from the modification of all SH groups in the denatured enzyme revealed the existence of six SH groups, in contrast to five reported previously. One of them was found to have extremely low reactivity toward SH-reagents without reduction. In the native state, IAEDANS reacted with 2 mol of SH groups per mol of the enzyme (SH1 and SH2) accompanied with inactivation of the enzyme owing to the modification of SH2 located near the active site of this enzyme. The selective modification of SH2 with IAEDANS was attained after the blocking of SH1 with 5,5'-dithiobis-(2-nitrobenzoic acid). The amino acid sequences of the peptides containing SH1 and SH2 were determined to be Cys-Ala-Asn-Pro-Gln and His-Gln-Cys-Gly-Gly-Asn-Val-Gly-Asp-Ile-Val-Asn-Ile-Pro-Ile-Pro-Gln-Trp, respectively.

  14. Detection of piscine nodaviruses by real-time nucleic acid sequence based amplification (NASBA).

    PubMed

    Starkey, William G; Millar, Rose Mary; Jenkins, Mary E; Ireland, Jacqueline H; Muir, K Fiona; Richards, Randolph H

    2004-05-01

    Nucleic acid sequence based amplification (NASBA) is an isothermal nucleic acid amplification procedure based on target-specific primers and probes, and the co-ordinated activity of 3 enzymes: AMV reverse transcriptase, RNase H, and T7 RNA polymerase. We have developed a real-time NASBA procedure for detection of piscine nodaviruses, which have emerged as major pathogens of marine fish. Viral RNA was isolated by guanidine thiocyanate lysis followed by purification on silica particles. Primers were designed to target sequences in the nodavirus capsid protein gene, yielding an amplification product of 120 nucleotides. Amplification products were detected in real-time with a molecular beacon (FAM labelled/methyl-red quenched) that recognised an internal region of the target amplicon. Amplification and detection were performed at 41 degrees C for 90 min in a Corbett Research Rotorgene. Based on the detection of cell culture-derived nodavirus, and a synthetic RNA target, the real-time NASBA procedure was approximately 100-fold more sensitive than single-tube RT-PCR. When used to test a panel of 37 clinical samples (negative, n = 18; positive, n = 19), the real-time NASBA assay correctly identified all 18 negative and 19 positive samples. In comparison, the RT-PCR procedure identified all 18 negative samples, but only 16 of the positive samples. These results suggest that real-time NASBA may represent a sensitive and specific diagnostic procedure for piscine nodaviruses.

  15. From amino acid sequence to bioactivity: The biomedical potential of antitumor peptides.

    PubMed

    Blanco-Míguez, Aitor; Gutiérrez-Jácome, Alberto; Pérez-Pérez, Martín; Pérez-Rodríguez, Gael; Catalán-García, Sandra; Fdez-Riverola, Florentino; Lourenço, Anália; Sánchez, Borja

    2016-06-01

    Chemoprevention is the use of natural and/or synthetic substances to block, reverse, or retard the process of carcinogenesis. In this field, the use of antitumor peptides is of interest as, (i) these molecules are small in size, (ii) they show good cell diffusion and permeability, (iii) they affect one or more specific molecular pathways involved in carcinogenesis, and (iv) they are not usually genotoxic. We have checked the Web of Science Database (23/11/2015) in order to collect papers reporting on bioactive peptide (1691 registers), which was further filtered searching terms such as "antiproliferative," "antitumoral," or "apoptosis" among others. Works reporting the amino acid sequence of an antiproliferative peptide were kept (60 registers), and this was complemented with the peptides included in CancerPPD, an extensive resource for antiproliferative peptides and proteins. Peptides were grouped according to one of the following mechanism of action: inhibition of cell migration, inhibition of tumor angiogenesis, antioxidative mechanisms, inhibition of gene transcription/cell proliferation, induction of apoptosis, disorganization of tubulin structure, cytotoxicity, or unknown mechanisms. The main mechanisms of action of those antiproliferative peptides with known amino acid sequences are presented and finally, their potential clinical usefulness and future challenges on their application is discussed.

  16. Molecular cloning, encoding sequence, and expression of vaccinia virus nucleic acid-dependent nucleoside triphosphatase gene.

    PubMed

    Rodriguez, J F; Kahn, J S; Esteban, M

    1986-12-01

    A rabbit poxvirus genomic library contained within the expression vector lambda gt11 was screened with polyclonal antiserum prepared against vaccinia virus nucleic acid-dependent nucleoside triphosphatase (NTPase)-I enzyme. Five positive phage clones containing from 0.72- to 2.5-kilobase-pair (kbp) inserts expressed a beta-galactosidase fusion protein that was reactive by immunoblotting with the NTPase-I antibody. Hybridization analysis allowed the location of this gene within the vaccinia HindIIID restriction fragment. From the known nucleotide sequence of the 16-kbp vaccinia HindIIID fragment, we identified a region that contains a 1896-base open reading frame coding for a 631-amino acid protein. Analysis of the complete sequence revealed a highly basic protein, with hydrophilic COOH and NH2 termini, various hydrophobic domains, and no significant homology to other known proteins. Translational studies demonstrate that NTPase-I belongs to a late class of viral genes. This protein is highly conserved among Orthopoxviruses.

  17. gDNA-Prot: Predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of protein sequence.

    PubMed

    Zhang, Yan-Ping; Wuyunqiqige; Zheng, Wei; Liu, Shuyi; Zhao, Chunguang

    2016-10-01

    DNA-binding proteins are the functional proteins in cells, which play an important role in various essential biological activities. An effective and fast computational method gDNA-Prot is proposed to predict DNA-binding proteins in this paper, which is a DNA-binding predictor that combines the support vector machine classifier and a novel kind of feature called graphical representation. The DNA-binding protein sequence information was described with the 20 probabilities of amino acids and the 23 new numerical graphical representation features of a protein sequence, based on 23 physicochemical properties of 20 amino acids. The Principal Components Analysis (PCA) was employed as feature selection method for removing the irrelevant features and reducing redundant features. The Sigmod function and Min-max normalization methods for PCA were applied to accelerate the training speed and obtain higher accuracy. Experiments demonstrated that the Principal Components Analysis with Sigmod function generated the best performance. The gDNA-Prot method was also compared with the DNAbinder, iDNA-Prot and DNA-Prot. The results suggested that gDNA-Prot outperformed the DNAbinder and iDNA-Prot. Although the DNA-Prot outperformed gDNA-Prot, gDNA-Prot was faster and convenient to predict the DNA-binding proteins. Additionally, the proposed gNDA-Prot method is available at http://sourceforge.net/projects/gdnaprot.

  18. Predicting G-protein-coupled receptors families using different physiochemical properties and pseudo amino acid composition.

    PubMed

    Rehman, Zia-Ur; Mirza, Muhammad Tayyeb; Khan, Asifullah; Xhaard, Henri

    2013-01-01

    G-protein-coupled receptors (GPCRs) initiate signaling pathways via trimetric guanine nucleotide-binding proteins. GPCRs are classified based on their ligand-binding properties and molecular phylogenetic analyses. Nonetheless, these later analyses are in most case dependent on multiple sequence alignments, themselves dependent on human intervention and expertise. Alignment-free classifications of GPCR sequences, in addition to being unbiased, present many applications uncovering hidden physicochemical parameters shared among specific groups of receptors, to being used in automated workflows for large-scale molecular modeling applications. Current alignment-free classification methods, however, do not reach a full accuracy. This chapter discusses how GPCRs amino acid sequences can be classified using pseudo amino acid composition and multiscale energy representation of different physiochemical properties of amino acids. A hybrid feature extraction strategy is shown to be suitable to represent GPCRs and to be able to exploit GPCR amino acid sequence discrimination capability in spatial as well as transform domain. Classification strategies such as support vector machine and probabilistic neural network are then discussed in regards to GPCRs classification. The work of GPCR-Hybrid web predictor is also discussed.

  19. Limited proteolysis and sequence analysis of the 2-oxo acid dehydrogenase complexes from Escherichia coli. Cleavage sites and domains in the dihydrolipoamide acyltransferase components.

    PubMed Central

    Packman, L C; Perham, R N

    1987-01-01

    The structures of the dihydrolipoamide acyltransferase (E2) components of the 2-oxo acid dehydrogenase complexes from Escherichia coli were investigated by limited proteolysis. Trypsin and Staphylococcus aureus V8 proteinase were used to excise the three lipoyl domains from the E2p component of the pyruvate dehydrogenase complex and the single lipoyl domain from the E2o component of the 2-oxoglutarate dehydrogenase complex. The principal sites of action of these enzymes on each E2 chain were determined by sequence analysis of the isolated lipoyl fragments and of the truncated E2p and E2o chains. Each of the numerous cleavage sites (12 in E2p, six in E2o) fell within similar segments of the E2 chains, namely stretches of polypeptide rich in alanine, proline and/or charged amino acids. These regions are clearly accessible to proteinases of Mr 24,000-28,000 and, on the basis of n.m.r. spectroscopy, some of them have previously been implicated in facilitating domain movements by virtue of their conformational flexibility. The limited proteolysis data suggest that E2p and E2o possess closer architectural similarities than would be predicted from inspection of their amino acid sequences. As a result of this work, an error was detected in the sequence of E2o inferred from the previously published sequence of the encoding gene, sucB. The relevant peptides from E2o were purified and sequenced by direct means; an amended sequence is presented. Images Fig. 1. Fig. 2. PMID:3297046

  20. Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier.

    PubMed

    Dhole, Kaustubh; Singh, Gurdeep; Pai, Priyadarshini P; Mondal, Sukanta

    2014-05-01

    Protein-protein interactions are of central importance for virtually every process in a living cell. Information about the interaction sites in proteins improves our understanding of disease mechanisms and can provide the basis for new therapeutic approaches. Since a multitude of unique residue-residue contacts facilitate the interactions, protein-protein interaction sites prediction has become one of the most important and challenging problems of computational biology. Although much progress in this field has been reported, this problem is yet to be satisfactorily solved. Here, a novel method (LORIS: L1-regularized LOgistic Regression based protein-protein Interaction Sites predictor) is proposed, that identifies interaction residues, using sequence features and is implemented via the L1-logreg classifier. Results show that LORIS is not only quite effective, but also, performs better than existing state-of-the art methods. LORIS, available as standalone package, can be useful for facilitating drug-design and targeted mutation related studies, which require a deeper knowledge of protein interactions sites. PMID:24486250

  1. Equilibrium model prediction for the scatter in the star-forming main sequence

    NASA Astrophysics Data System (ADS)

    Mitra, Sourav; Davé, Romeel; Simha, Vimal; Finlator, Kristian

    2016-10-01

    The analytic "equilibrium model" for galaxy evolution using a mass balance equation is able to reproduce mean observed galaxy scaling relations between stellar mass, halo mass, star formation rate (SFR) and metallicity across the majority of cosmic time with a small number of parameters related to feedback. Here we aim to test this data-constrained model to quantify deviations from the mean relation between stellar mass and SFR, i.e. the star-forming galaxy main sequence (MS). We implement fluctuation in halo accretion rates parameterised from merger-based simulations, and quantify the intrinsic scatter introduced into the MS under the assumption that fluctuations in star formation follow baryonic inflow fluctuations. We predict the 1-σ MS scatter to be ˜0.2 - 0.25 dex over the stellar mass range 108M⊙ to 1011M⊙ and a redshift range 0.5⪉ z⪉ 3 for SFRs averaged over 100 Myr. The scatter increases modestly at z⪆ 3, as well as by averaging over shorter timescales. The contribution from merger-induced star formation is generally small, around 5% today and 10 - 15% during the peak epoch of cosmic star formation. These results are generally consistent with available observations, suggesting that deviations from the MS primarily reflect stochasticity in the inflow rate owing to halo mergers.

  2. Soft x-ray laser gain prediction at high density for the nitrogen isoelectronic sequence

    SciTech Connect

    Klapisch, M. ); Cohen, M.; Goldstein, W.H. ); Feldman, U. . E.O. Hulburt Center for Space Research)

    1991-01-11

    In model calculation, population inversions for 2s{sup 2}2p{sup 2}3s-2s{sup 2}2p{sup 2}3p and for 2s{sup 2}2p{sup 2}d3p-2s{sup 2}2p{sup 2}3d are predicted for ions of the N-like isoelectronic sequence (Kr{sup 29+}, Zr{sup 33+}, Mo{sup 35+}, Pd{sup 39+}, Sn{sup 43+} and Xe{sup 47+}), at electron densities higher than 10{sup 22} cm{sup {minus}3}. Maximum gains of 46 cm{sup {minus}1} for Kr{sup 29+} at 245{angstrom}, and 231 cm{sup {minus}1} at 126{angstrom} for Xe{sup 47+} are obtained. The electron temperature T{sub e} is fixed at half the ionization potential for all ions. A detailed analysis shows that the mechanisms responsible for the population inversion enhancement are collisional de-excitation and radiation trapping. 16 refs., 6 figs.

  3. Complete amino acid sequence of the myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani.

    PubMed

    Jones, B N; Wang, C C; Dwulet, F E; Lehman, L D; Meuth, J L; Bogardt, R A; Gurd, F R

    1979-04-25

    The complete amino acid sequence of the major component myoglobin from the Pacific spotted dolphin, Stenella attenuata graffmani, was determined by the automated Edman degradation of several large peptides obtained by specific cleavage of the protein. The acetimidated apomyoglobin was selectively cleaved at its two methionyl residues with cyanogen bromide and at its three arginyl residues by trypsin. By subjecting four of these peptides and the apomyoglobin to automated Edman degradation, over 80% of the primary structure of the protein was obtained. The remainder of the covalent structure was determined by the sequence analysis of peptides that resulted from further digestion of the central cyanogen bromide fragment. This fragment was cleaved at its glutamyl residues with staphylococcal protease and its lysyl residues with trypsin. The action of trypsin was restricted to the lysyl residues by chemical modification of the single arginyl residue of the fragment with 1,2-cyclohexanedione. The primary structure of this myoglobin proved to be identical with that from the Atlantic bottlenosed dolphin and Pacific common dolphin but differs from the myoglobins of the killer whale and pilot whale at two positions. The above sequence identities and differences reflect the close taxonomic relationship of these five species of Cetacea. PMID:454657

  4. Isolation and amino acid sequences of squirrel monkey (Saimiri sciurea) insulin and glucagon

    SciTech Connect

    Yu, Jinghua ); Eng, J.; Yalow, R.S. City Univ. of New York, NY )

    1990-12-01

    It was reported two decades ago that insulin was not detectable in the glucose-stimulated state in Saimiri sciurea, the New World squirrel monkey, by a radioimmunoassay system developed with guinea pig anti-pork insulin antibody and labeled park insulin. With the same system, reasonable levels were observed in rhesus monkeys and chimpanzees. This suggested that New World monkeys, like the New World hystricomorph rodents such as the guinea pig and the coypu, might have insulins whose sequences differ markedly from those of Old World mammals. In this report the authors describe the purification and amino acid sequences of squirrel monkey insulin and glucagon. They demonstrate that the substitutions at B29, B27, A2, A4, and A17 of squirrel monkey insulin are identical with those previously found in another New World primate, the owl monkey (Aotus trivirgatus). The immunologic cross-reactivity of this insulin in their immunoassay system is only a few percent of that of human insulin. It appears that the peptides of the New World monkeys have diverged less from those of the Old World mammals than have those of the New World hystricomorph rodents. The striking improvements in peptide purification and sequencing have the potential for adding new information concerning the evolutionary divergence of species.

  5. Purification, amino acid sequence and characterisation of kangaroo IGF-I.

    PubMed

    Yandell, C A; Francis, G L; Wheldrake, J F; Upton, Z

    1998-01-01

    Insulin-like growth factor-I (IGF-I) and IGF-II have been purified to homogeneity from kangaroo (Macropus fuliginosus) serum, thus this represents the first report of the purification, sequencing and characterisation of marsupial IGFs. N-Terminal protein sequencing reveals that there are six amino acid differences between kangaroo and human IGF-I. Kangaroo IGF-II has been partially sequenced and no differences were found between human and kangaroo IGF-II in the 53 residues identified. Thus the IGFs appear to be remarkably structurally conserved during mammalian radiation. In addition, in vitro characterisation of kangaroo IGF-I demonstrated that the functional properties of human, kangaroo and chicken IGF-I are very similar. In an assay measuring the ability of the proteins to stimulate protein synthesis in rat L6 myoblasts, all IGF-I proteins were found to be equally potent. The ability of all three proteins to compete for binding with radiolabelled human IGF-I to type-1 IGF receptors in L6 myoblasts and in Sminthopsis crassicaudata transformed lung fibroblasts, a marsupial cell line, was comparable. Furthermore, kangaroo and human IGF-I react equally in a human IGF-I RIA using a human reference standard, radiolabelled human IGF-I and a polyclonal antibody raised against recombinant human IGF-I. This study indicates that not only is the primary structure of eutherian and metatherian IGF-I conserved, but also the proteins appear to be functionally similar.

  6. Prediction and Analysis of Quorum Sensing Peptides Based on Sequence Features

    PubMed Central

    Rajput, Akanksha; Gupta, Amit Kumar; Kumar, Manoj

    2015-01-01

    Quorum sensing peptides (QSPs) are the signaling molecules used by the Gram-positive bacteria in orchestrating cell-to-cell communication. In spite of their enormous importance in signaling process, their detailed bioinformatics analysis is lacking. In this study, QSPs and non-QSPs were examined according to their amino acid composition, residues position, motifs and physicochemical properties. Compositional analysis concludes that QSPs are enriched with aromatic residues like Trp, Tyr and Phe. At the N-terminal, Ser was a dominant residue at maximum positions, namely, first, second, third and fifth while Phe was a preferred residue at first, third and fifth positions from the C-terminal. A few motifs from QSPs were also extracted. Physicochemical properties like aromaticity, molecular weight and secondary structure were found to be distinguishing features of QSPs. Exploiting above properties, we have developed a Support Vector Machine (SVM) based predictive model. During 10-fold cross-validation, SVM achieves maximum accuracy of 93.00%, Mathew’s correlation coefficient (MCC) of 0.86 and Receiver operating characteristic (ROC) of 0.98 on the training/testing dataset (T200p+200n). Developed models performed equally well on the validation dataset (V20p+20n). The server also integrates several useful analysis tools like “QSMotifScan”, “ProtFrag”, “MutGen” and “PhysicoProp”. Our analysis reveals important characteristics of QSPs and on the basis of these unique features, we have developed a prediction algorithm “QSPpred” (freely available at: http://crdd.osdd.net/servers/qsppred). PMID:25781990

  7. PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection

    PubMed Central

    Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning

    2014-01-01

    X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of

  8. Implicit learning of predictable sound sequences modulates human brain responses at different levels of the auditory hierarchy

    PubMed Central

    Lecaignard, Françoise; Bertrand, Olivier; Gimenez, Gérard; Mattout, Jérémie; Caclin, Anne

    2015-01-01

    Deviant stimuli, violating regularities in a sensory environment, elicit the mismatch negativity (MMN), largely described in the Event-Related Potential literature. While it is widely accepted that the MMN reflects more than basic change detection, a comprehensive description of mental processes modulating this response is still lacking. Within the framework of predictive coding, deviance processing is part of an inference process where prediction errors (the mismatch between incoming sensations and predictions established through experience) are minimized. In this view, the MMN is a measure of prediction error, which yields specific expectations regarding its modulations by various experimental factors. In particular, it predicts that the MMN should decrease as the occurrence of a deviance becomes more predictable. We conducted a passive oddball EEG study and manipulated the predictability of sound sequences by means of different temporal structures. Importantly, our design allows comparing mismatch responses elicited by predictable and unpredictable violations of a simple repetition rule and therefore departs from previous studies that investigate violations of different time-scale regularities. We observed a decrease of the MMN with predictability and interestingly, a similar effect at earlier latencies, within 70 ms after deviance onset. Following these pre-attentive responses, a reduced P3a was measured in the case of predictable deviants. We conclude that early and late deviance responses reflect prediction errors, triggering belief updating within the auditory hierarchy. Beside, in this passive study, such perceptual inference appears to be modulated by higher-level implicit learning of sequence statistical structures. Our findings argue for a hierarchical model of auditory processing where predictive coding enables implicit extraction of environmental regularities. PMID:26441602

  9. Prediction of virological response by pretreatment hepatitis B virus reverse transcriptase quasispecies heterogeneity: the advantage of using next-generation sequencing.

    PubMed

    Han, Y; Gong, L; Sheng, J; Liu, F; Li, X-H; Chen, L; Yu, D-M; Gong, Q-M; Hao, P; Zhang, X-X

    2015-08-01

    Prediction of antiviral efficacy prior to treatment remains largely unavailable. We have previously demonstrated the clinical value of on-treatment hepatitis B virus (HBV) reverse transcriptase (RT) quasispecies (QS) evolution patterns. In this study, we aimed to elucidate the relevance for prediction of pretreatment HBV RT QS characteristics by comparing the performance of next-generation sequencing (NGS) and clone-based Sanger sequencing (CBS). Thirty-six lamivudine-treated patients were retrospectively studied, including 18 responders and 18 non-responders. CBS and NGS data of pretreatment serum HBV were used to generate RT QS genetic complexity and diversity scores, according to our previous studies. The ability of both methods to predict responsiveness was evaluated with receiver operating characteristic (ROC) curves. A cut-off value was generated on the basis of prediction ability. Responders had significantly higher pretreatment RT QS genetic complexity and diversity (in the first two parts, which overlapped with the S gene, at both the nucleotide and amino acid levels) than non-responders by NGS-based testing. NGS-based algorithms predicted response better than CBS in the ROC curve analysis. The mean distance of the second contig had the highest area under the curve (AUC) value. When the cut-off value was set to 0.007186, the difference between survival curves was significant (p 0.0090). Pretreatment HBV RT QS heterogeneity in the overlapping region of the RT and S genes could be a predictor of antiviral efficacy. NGS improves the predictions of virological outcomes relative to CBS algorithms. This may have important implications for the clinical management of subjects chronically infected with HBV.

  10. Sequence Design for a Test Tube of Interacting Nucleic Acid Strands.

    PubMed

    Wolfe, Brian R; Pierce, Niles A

    2015-10-16

    We describe an algorithm for designing the equilibrium base-pairing properties of a test tube of interacting nucleic acid strands. A target test tube is specified as a set of desired "on-target" complexes, each with a target secondary structure and target concentration, and a set of undesired "off-target" complexes, each with vanishing target concentration. Sequence design is performed by optimizing the test tube ensemble defect, corresponding to the concentration of incorrectly paired nucleotides at equilibrium evaluated over the ensemble of the test tube. To reduce the computational cost of accepting or rejecting mutations to a random initial sequence, the structural ensemble of each on-target complex is hierarchically decomposed into a tree of conditional subensembles, yielding a forest of decomposition trees. Candidate sequences are evaluated efficiently at the leaf level of the decomposition forest by estimating the test tube ensemble defect from conditional physical properties calculated over the leaf subensembles. As optimized subsequences are merged toward the root level of the forest, any emergent defects are eliminated via ensemble redecomposition and sequence reoptimization. After successfully merging subsequences to the root level, the exact test tube ensemble defect is calculated for the first time, explicitly checking for the effect of the previously neglected off-target complexes. Any off-target complexes that form at appreciable concentration are hierarchically decomposed, added to the decomposition forest, and actively destabilized during subsequent forest reoptimization. For target test tubes representative of design challenges in the molecular programming and synthetic biology communities, our test tube design algorithm typically succeeds in achieving a normalized test tube ensemble defect ≤1% at a design cost within an order of magnitude of the cost of test tube analysis.

  11. Sequence-Specific Electrical Purification of Nucleic Acids with Nanoporous Gold Electrodes.

    PubMed

    Daggumati, Pallavi; Appelt, Sandra; Matharu, Zimple; Marco, Maria L; Seker, Erkin

    2016-06-22

    Nucleic-acid-based biosensors have enabled rapid and sensitive detection of pathogenic targets; however, these devices often require purified nucleic acids for analysis since the constituents of complex biological fluids adversely affect sensor performance. This purification step is typically performed outside the device, thereby increasing sample-to-answer time and introducing contaminants. We report a novel approach using a multifunctional matrix, nanoporous gold (np-Au), which enables both detection of specific target sequences in a complex biological sample and their subsequent purification. The np-Au electrodes modified with 26-mer DNA probes (via thiol-gold chemistry) enabled sensitive detection and capture of complementary DNA targets in the presence of complex media (fetal bovine serum) and other interfering DNA fragments in the range of 50-1500 base pairs. Upon capture, the noncomplementary DNA fragments and serum constituents of varying sizes were washed away. Finally, the surface-bound DNA-DNA hybrids were released by electrochemically cleaving the thiol-gold linkage, and the hybrids were iontophoretically eluted from the nanoporous matrix. The optical and electrophoretic characterization of the analytes before and after the detection-purification process revealed that low target DNA concentrations (80 pg/μL) can be successfully detected in complex biological fluids and subsequently released to yield pure hybrids free of polydisperse digested DNA fragments and serum biomolecules. Taken together, this multifunctional platform is expected to enable seamless integration of detection and purification of nucleic acid biomarkers of pathogens and diseases in miniaturized diagnostic devices.

  12. Recognition of protein phosphorylation site based on amino acids sequence features

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Ding, Changjiang; Lu, Jun

    2012-09-01

    Protein phosphorylation is one of the most important reversible post-translational modifications (PTMs), and the theoretical recognition of the phosphorylation site is one of the important content of the computational biology. In the paper, we use the amino acid component, the position-dependent residue statistics and the non-adjacent residue pair frequency as the recognition parameters, and use Jensen-Shannon Divergence with Quadratic Discriminant analysis(JSDQD) as the method for predicting the phosphorylation sites. The 7-fold cross-validation test accuracies for the CK2, PKA and PKC kinase families are 90%, 90% and 86%, respectively.

  13. Prediction of liquid-liquid equilibrium for systems of vegetable oils, fatty acids, and ethanol

    SciTech Connect

    Batista, E.; Monnerat, S.; Stragevitch, L.; Pina, C.G.; Goncalves, C.B.; Meirelles, A.J.A.

    1999-12-01

    Group interaction parameters for the UNIFAC and ASOG models were specially adjusted for predicting liquid-liquid equilibrium (LLE) for systems of vegetable oils, fatty acids, and ethanol at temperatures ranging from 20 to 45 C. Experimental liquid-liquid equilibrium data for systems of triolein, oleic acid, and ethanol and of triolein, stearic acid, and ethanol were measured and utilized in the adjustment. The average percent deviation between experimental and calculated compositions was 0.79% and 0.52% for the UNIFAC and ASOG models, respectively. The prediction of liquid-liquid equilibrium for systems of vegetable oils, fatty acids, and ethanol was quite successful, with an average deviation of 1.31% and 1.32% for the UNIFAC and ASOG models, respectively.

  14. A framework for establishing predictive relationships between specific bacterial 16S rRNA sequence abundances and biotransformation rates.

    PubMed

    Helbling, Damian E; Johnson, David R; Lee, Tae Kwon; Scheidegger, Andreas; Fenner, Kathrin

    2015-03-01

    The rates at which wastewater treatment plant (WWTP) microbial communities biotransform specific substrates can differ by orders of magnitude among WWTP communities. Differences in taxonomic compositions among WWTP communities may predict differences in the rates of some types of biotransformations. In this work, we present a novel framework for establishing predictive relationships between specific bacterial 16S rRNA sequence abundances and biotransformation rates. We selected ten WWTPs with substantial variation in their environmental and operational metrics and measured the in situ ammonia biotransformation rate constants in nine of them. We isolated total RNA from samples from each WWTP and analyzed 16S rRNA sequence reads. We then developed multivariate models between the measured abundances of specific bacterial 16S rRNA sequence reads and the ammonia biotransformation rate constants. We constructed model scenarios that systematically explored the effects of model regularization, model linearity and non-linearity, and aggregation of 16S rRNA sequences into operational taxonomic units (OTUs) as a function of sequence dissimilarity threshold (SDT). A large percentage (greater than 80%) of model scenarios resulted in well-performing and significant models at intermediate SDTs of 0.13-0.14 and 0.26. The 16S rRNA sequences consistently selected into the well-performing and significant models at those SDTs were classified as Nitrosomonas and Nitrospira groups. We then extend the framework by applying it to the biotransformation rate constants of ten micropollutants measured in batch reactors seeded with the ten WWTP communities. We identified phylogenetic groups that were robustly selected into all well-performing and significant models constructed with biotransformation rates of isoproturon, propachlor, ranitidine, and venlafaxine. These phylogenetic groups can be used as predictive biomarkers of WWTP microbial community activity towards these specific

  15. Complete amino acid sequence of the variable domains of two human IgM anti-gamma globulins (Lay/Pom) with shared idiotypic specificities.

    PubMed

    Capra, J D; Klapper, D G

    1976-01-01

    On the basis of extensive shared idiotypic specificities, two human IgM anti-gamma-globulins (Lay/Pom) were selected for complete amino acid sequence analysis of their variable domains. Previous studies on the variable regions of the heavy chains of these proteins had shown but eight amino acid differences, only one of which was within a complementarity-determining hypervariable region. The complete amino acid sequence of the variable regions of the light chains of these two proteins is the subject of this report. Protein Lay is a typical VchiI protein with only five 'framework' differences when compared with protein Roy. Protein Pom is best classified as a VchiII, but in the 'framework' there are 16 differences between it and protein Ti. Although there are extensive differences in the first hypervariable region, the second and third light-chain hypervariable regions have an identical sequence. The finding of two identical light-chain and two identical heavy-chain hypervariable regions in these two proteins, which were selected on the basis of their combining specificities and their idiotypic cross-reactions, strongly implicates hypervariable regions in the constitution of the idiotypic determinants and the antibody combining site. Additionally, the finding of identical hypervariable regions in light chains of different V-region subgroups fulfills a prediction of the gene-interaction concept of antibody variability. PMID:824717

  16. Modeling and prediction of retardance in citric acid coated ferrofluid using artificial neural network

    NASA Astrophysics Data System (ADS)

    Lin, Jing-Fung; Sheu, Jer-Jia

    2016-06-01

    Citric acid coated (citrate-stabilized) magnetite (Fe3O4) magnetic nanoparticles have been conducted and applied in the biomedical fields. Using Taguchi-based measured retardances as the training data, an artificial neural network (ANN) model was developed for the prediction of retardance in citric acid (CA) coated ferrofluid (FF). According to the ANN simulation results in the training stage, the correlation coefficient between predicted retardances and measured retardances was found to be as high as 0.9999998. Based on the well-trained ANN model, the predicted retardance at excellent program from Taguchi method showed less error of 2.17% compared with a multiple regression (MR) analysis of statistical significance. Meanwhile, the parameter analysis at excellent program by the ANN model had the guiding significance to find out a possible program for the maximum retardance. It was concluded that the proposed ANN model had high ability for the prediction of retardance in CA coated FF.

  17. The Genome Sequence of the Highly Acetic Acid-Tolerant Zygosaccharomyces bailii-Derived Interspecies Hybrid Strain ISA1307, Isolated From a Sparkling Wine Plant

    PubMed Central

    Mira, Nuno P.; Münsterkötter, Martin; Dias-Valada, Filipa; Santos, Júlia; Palma, Margarida; Roque, Filipa C.; Guerreiro, Joana F.; Rodrigues, Fernando; Sousa, Maria João; Leão, Cecília; Güldener, Ulrich; Sá-Correia, Isabel

    2014-01-01

    In this work, it is described the sequencing and annotation of the genome of the yeast strain ISA1307, isolated from a sparkling wine continuous production plant. This strain, formerly considered of the Zygosaccharomyces bailii species, has been used to study Z. bailii physiology, in particular, its extreme tolerance to acetic acid stress at low pH. The analysis of the genome sequence described in this work indicates that strain ISA1307 is an interspecies hybrid between Z. bailii and a closely related species. The genome sequence of ISA1307 is distributed through 154 scaffolds and has a size of around 21.2 Mb, corresponding to 96% of the genome size estimated by flow cytometry. Annotation of ISA1307 genome includes 4385 duplicated genes (∼90% of the total number of predicted genes) and 1155 predicted single-copy genes. The functional categories including a higher number of genes are ‘Metabolism and generation of energy’, ‘Protein folding, modification and targeting’ and ‘Biogenesis of cellular components’. The knowledge of the genome sequence of the ISA1307 strain is expected to contribute to accelerate systems-level understanding of stress resistance mechanisms in Z. bailii and to inspire and guide novel biotechnological applications of this yeast species/strain in fermentation processes, given its high resilience to acidic stress. The availability of the ISA1307 genome sequence also paves the way to a better understanding of the genetic mechanisms underlying the generation and selection of more robust hybrid yeast strains in the stressful environment of wine fermentations. PMID:24453040

  18. Amino acid substitutions in genetic variants of human serum albumin and in sequences inferred from molecular cloning

    SciTech Connect

    Takahashi, N.; Takahashi, Y.; Blumberg, B.S.; Putnam, F.W.

    1987-07-01

    The structural changes in four genetic variants of human serum albumin were analyzed by tandem high-pressure liquid chromatography (HPLC) of the tryptic peptides, HPLC mapping and isoelectric focusing of the CNBr fragments, and amino acid sequence analysis of the purified peptides. Lysine-372 of normal (common) albumin A was changed to glutamic acid both in albumin Naskapi, a widespread polymorphic variant of North American Indians, and in albumin Mersin found in Eti Turks. The two variants also exhibited anomalous migration in NaDodSO/sub 4//PAGE, which is attributed to a conformational change. The identity of albumins Naskapi and Mersin may have originated through descent from a common mid-Asiatic founder of the two migrating ethnic groups, or it may represent identical but independent mutations of the albumin gene. In albumin Adana, from Eti Turks, the substitution site was not identified but was localized to the region from positions 447 through 548. The substitution of aspartic acid-550 by glycine was found in albumin Mexico-2 from four individuals of the Pima tribe. Although only single-point substitutions have been found in these and in certain other genetic variants of human albumin, five differences exist in the amino acid sequences inferred from cDNA sequences by workers in three other laboratories. However, our results on albumin A and on 14 different genetic variants accord with the amino acid sequence of albumin deduced from the genomic sequence. The apparent amino acid substitutions inferred from comparison of individual cDNA sequences probably reflect artifacts in cloning or in cDNA sequence analysis rather than polymorphism of the coding sections of the albumin gene.

  19. Amino acid substitutions in genetic variants of human serum albumin and in sequences inferred from molecular cloning.

    PubMed

    Takahashi, N; Takahashi, Y; Blumberg, B S; Putnam, F W

    1987-07-01

    The structural changes in four genetic variants of human serum albumin were analyzed by tandem high-pressure liquid chromatography (HPLC) of the tryptic peptides, HPLC mapping and isoelectric focusing of the CNBr fragments, and amino acid sequence analysis of the purified peptides. Lysine-372 of normal (common) albumin A was changed to glutamic acid both in albumin Naskapi, a widespread polymorphic variant of North American Indians, and in albumin Mersin found in Eti Turks. The two variants also exhibited anomalous migration in NaDodSO4/PAGE, which is attributed to a conformational change. The identity of albumins Naskapi and Mersin may have originated through descent from a common mid-Asiatic founder of the two migrating ethnic groups, or it may represent identical but independent mutations of the albumin gene. In albumin Adana, from Eti Turks, the substitution site was not identified but was localized to the region from positions 447 through 548. The substitution of aspartic acid-550 by glycine was found in albumin Mexico-2 from four individuals of the Pima tribe. Although only single-point substitutions have been found in these and in certain other genetic variants of human albumin, five differences exist in the amino acid sequences inferred from cDNA sequences by workers in three other laboratories. However, our results on albumin A and on 14 different genetic variants accord with the amino acid sequence of albumin deduced from the genomic sequence. The apparent amino acid substitutions inferred from comparison of individual cDNA sequences probably reflect artifacts in cloning or in cDNA sequence analysis rather than polymorphism of the coding sections of the albumin gene.

  20. Amino acid sequences of neuropeptides in the sinus gland of the land crab Cardisoma carnifex: a novel neuropeptide proteolysis site.

    PubMed

    Newcomb, R W

    1987-08-01

    The sinus gland is a major neurosecretory structure in Crustacea. Five peptides, labeled C, D, E, F, and I, isolated from the sinus gland of the land crab have been hypothesized to arise from the incomplete proteolysis at two internal sites on a single biosynthetic intermediate peptide "H", based on amino acid composition additivities and pulse-chase radiolabeling studies. The presence of only a single major precursor for the sinus gland peptides implies that peptide H may be synthesized on a common precursor with crustacean hyperglycemic hormone forms, "J" and "L," and a peptide, "K," similar to peptides with molt inhibiting activity. Here I report amino acid sequences of these peptides. The amino terminal sequence of the parent peptide, H, (and the homologous fragments) proved refractory to Edman degradation. Data from amino acid analysis and carboxypeptidase digestion of the naturally occurring fragments and of fragments produced by endopeptidase digestion were used together with Edman degradation to obtain the sequences. Amino acid analysis of fragments of the naturally occurring "overlap" peptides (those produced by internal cleavage at one site on H) was used to obtain the sequences across the cleavage sites. The amino acid sequence of the land crab peptide H is Arg-Ser-Ala-Asp-Gly-Phe-Gly-Arg-Met-Glu-Ser-Leu-Leu-Thr-Ser-Leu-Arg-Gly- Ser-Ala-Glu- Ser-Pro-Ala-Ala-Leu-Gly-Glu-Ala-Ser-Ala-Ala-His-Pro-Leu-Glu. In vivo cleavage at one site involves excision of arginine from the sequence Leu-Arg-Gly, whereas cleavage at the other site involves excision of serine from the sequence Glu-Ser-Leu. Proteolysis at the latter sequence has not been previously reported in intact secretory granules. The aspartate at position 4 is possibly covalently modified.

  1. Plasma fatty acids in chronic kidney disease: nervonic acid predicts mortality.

    PubMed

    Shearer, Gregory C; Carrero, Juan J; Heimbürger, Olof; Barany, Peter; Stenvinkel, Peter

    2012-03-01

    Although the value of red blood cell fatty acids (FAs) in estimating risk for acute coronary syndrome in the general population is evident, the value of FAs in chronic kidney disease (CKD) is unknown. Here, we provide a pilot analysis in a spectrum of CKD patients. Plasma samples were obtained from 20 incident dialysis patients (CKD stage 5), matched with samples from 10 CKD stage 3-4 patients, and 10 control subjects. Whole plasma FAs were measured using gas chromatography. Whereas neither linoleic acid nor arachidonate acid were altered in CKD, metabolic intermediates of arachidonate synthesis (γ-linolenate and dihomo γ-linolenate) were reduced in CKD. Demming (orthogonal) correlation of FA abundance with estimated GFR identified several saturated and unsaturated FAs in addition to the intermediates; again, neither linoleate nor arachidonate were related. Follow-up data within the CKD stage 5 patients revealed that nervonic acid, a component of membrane sphingolipids and phosphatidylethanolamines, was a significant predictor of all-cause mortality; the age-adjusted relative risk for a 0.15% change is 2.1 (1.4, 3.7; 95% CI; P = .0008). These findings support the exploration of FAs in larger studies for validation of the role FAs in cardiovascular risk and mortality in CKD.

  2. Knowledge discovery and sequence-based prediction of pandemic influenza using an integrated classification and association rule mining (CBA) algorithm.

    PubMed

    Kargarfard, Fatemeh; Sami, Ashkan; Ebrahimie, Esmaeil

    2015-10-01

    Pandemic influenza is a major concern worldwide. Availability of advanced technologies and the nucleotide sequences of a large number of pandemic and non-pandemic influenza viruses in 2009 provide a great opportunity to investigate the underlying rules of pandemic induction through data mining tools. Here, for the first time, an integrated classification and association rule mining algorithm (CBA) was used to discover the rules underpinning alteration of non-pandemic sequences to pandemic ones. We hypothesized that the extracted rules can lead to the development of an efficient expert system for prediction of influenza pandemics. To this end, we used a large dataset containing 5373 HA (hemagglutinin) segments of the 2009 H1N1 pandemic and non-pandemic influenza sequences. The analysis was carried out for both nucleotide and protein sequences. We found a number of new rules which potentially present the undiscovered antigenic sites at influenza structure. At the nucleotide level, alteration of thymine (T) at position 260 was the key discriminating feature in distinguishing non-pandemic from pandemic sequences. At the protein level, rules including I233K, M334L were the differentiating features. CBA efficiently classifies pandemic and non-pandemic sequences with high accuracy at both the nucleotide and protein level. Finding hotspots in influenza sequences is a significant finding as they represent the regions with low antibody reactivity. We argue that the virus breaks host immunity response by mutation at these spots. Based on the discovered rules, we developed the software, "Prediction of Pandemic Influenza" for discrimination of pandemic from non-pandemic sequences. This study opens a new vista in discovery of association rules between mutation points during evolution of pandemic influenza.

  3. Prediction of Coal ash leaching behavior in acid mine water, comparison of laboratory and field studies

    SciTech Connect

    ANNA, KNOX

    2005-01-10

    Strongly alkaline fluidized bed combustion ash is commonly used to control acid mine drainage in West Virginia coal mines. Objectives include acid neutralization and immobilization of the primary AMD pollutants: iron, aluminum and manganese. The process has been successful in controlling AMD though doubts remain regarding mobilization of other toxic elements present in the ash. In addition, AMD contains many toxic elements in low concentrations. And, each mine produces AMD of widely varying quality. So, predicting the effect of a particular ash on a given coal mine's drainage quality is of particular interest. In this chapter we compare the results of a site-specific ash leaching procedure with two large-scale field applications of FBC ash. The results suggested a high degree of predictability for roughly half of the 25 chemical parameters and poor predictability for the remainder. Of these, seven parameters were successfully predicted on both sites: acidity, Al, B, Ba, Fe, Ni and Zn while electrical conductivity, Ca, Cd, SO4, Pb and Sb were not successfully predicted on either site. Trends for the remaining elements: As, Ag, Be, Cu, Cr, Hg, Mg, Mn, pH, Se Tl and V were successfully predicted on one but not both mine sites.

  4. Enzyme-free translation of DNA into sequence-defined synthetic polymers structurally unrelated to nucleic acids

    NASA Astrophysics Data System (ADS)

    Niu, Jia; Hili, Ryan; Liu, David R.

    2013-04-01

    The translation of DNA sequences into corresponding biopolymers enables the production, function and evolution of the macromolecules of life. In contrast, methods to generate sequence-defined synthetic polymers with similar levels of control have remained elusive. Here, we report the development of a DNA-templated translation system that enables the enzyme-free translation of DNA templates into sequence-defined synthetic polymers that have no necessary structural relationship with nucleic acids. We demonstrate the efficiency, sequence-specificity and generality of this translation system by oligomerizing building blocks including polyethylene glycol, α-(D)-peptides, and β-peptides in a DNA-programmed manner. Sequence-defined synthetic polymers with molecular weights of 26 kDa containing 16 consecutively coupled building blocks and 90 densely functionalized β-amino acid residues were translated from DNA templates using this strategy. We integrated the DNA-templated translation system developed here into a complete cycle of translation, coding sequence replication, template regeneration and re-translation suitable for the iterated in vitro selection of functional sequence-defined synthetic polymers unrelated in structure to nucleic acids.

  5. Boronic acid functionalized peptidyl synthetic lectins: Combinatorial library design, peptide sequencing, and selective glycoprotein recognition

    PubMed Central

    Bicker, Kevin L.; Sun, Jing; Lavigne, John J.; Thompson, Paul R.

    2011-01-01

    Aberrant glycosylation of cell membrane and secreted glycoproteins is a hallmark of various disease states, including cancer. The natural lectins currently used in the recognition of these glycoproteins are costly, difficult to produce, and unstable towards rigorous use. Herein we describe the design and synthesis of several boronic acid functionalized peptide-based synthetic lectin (SL) libraries, as well as the optimized methodology for obtaining peptide sequences of these SLs. SL libraries were subsequently used to identify SLs with as high as 5-fold selectivity for various glycoproteins. SLs will inevitably find a role in cancer diagnositics, given that they do not suffer from the drawbacks of natural lectins and that the combinatorial nature of these libraries allows for the identification of an SL for nearly any glycosylated biomolecule. PMID:21405093

  6. Kinetics of amyloid aggregation of mammal apomyoglobins and correlation with their amino acid sequences.

    PubMed

    Vilasi, Silvia; Dosi, Roberta; Iannuzzi, Clara; Malmo, Clorinda; Parente, Augusto; Irace, Gaetano; Sirangelo, Ivana

    2006-03-01

    In protein deposition disorders, a normally soluble protein is deposited as insoluble aggregates, referred to as amyloid. The intrinsic effects of specific mutations on the rates of protein aggregation and amyloid formation of unfolded polypeptide chains can be correlated with changes in hydrophobicity, propensity to convert alpha-helical to beta sheet conformation and charge. In this paper, we report the aggregation rates of buffalo, horse and bovine apomyoglobins. The experimental values were compared with the theoretical ones evaluated considering the amino acid differences among the sequences. Our results show that the mutations which play critical roles in the rate-determining step of apomyoglobin aggregation are those located within the N-terminal region of the molecule.

  7. GAWK, a novel human pituitary polypeptide: isolation, immunocytochemical localization and complete amino acid sequence.

    PubMed

    Benjannet, S; Leduc, R; Lazure, C; Seidah, N G; Marcinkiewicz, M; Chrétien, M

    1985-01-16

    During the course of reverse-phase high pressure liquid chromatography (RP-HPLC) purification of a postulated big ACTH (1) from human pituitary gland extracts, a highly purified peptide bearing no resemblance to any known polypeptide was isolated. The complete sequence of this 74 amino acid polypeptide, called GAWK, has been determined. Search on a computer data bank on the possible homology to any known protein or fragment, using a mutation data matrix, failed to reveal any homology greater than 30%. An antibody produced against a synthetic fragment allowed us to detect several immunoreactive forms. The antisera also enabled us to localize the polypeptide, by immunocytochemistry, in the anterior lobe of the pituitary gland.

  8. Evolutionary connections of biological kingdoms based on protein and nucleic acid sequence evidence

    NASA Technical Reports Server (NTRS)

    Dayhoff, M. O.

    1983-01-01

    Prokaryotic and eukaryotic evolutionary trees are developed from protein and nucleic-acid sequences by the methods of numerical taxonomy. Trees are presented for bacterial ferredoxins, 5S ribosomal RNA, c-type cytochromes , cytochromes c2 and c', and 5.8S ribosomal RNA; the implications for early evolution are discussed; and a composite tree showing the branching of the anaerobes, aerobes, archaebacteria, and eukaryotes is shown. Single lines are found for all oxygen-evolving photosynthetic forms and for the salt-loving and high-temperature forms of archaebacteria. It is argued that the eukaryote mitochondria, chloroplasts, and cytoplasmic host material are descended from free-living prokaryotes that formed symbiotic associations, with more than one symbiotic event involved in the evolution of each organelle.

  9. Identification of amino acid sequences in the polyomavirus capsid proteins that serve as nuclear localization signals

    NASA Technical Reports Server (NTRS)

    Chang, D.; Haynes, J. I. Jr; Brady, J. N.; Consigli, R. A.; Spooner, B. S. (Principal Investigator)

    1993-01-01

    The molecular mechanism participating in the transport of newly synthesized proteins from the cytoplasm to the nucleus in mammalian cells is poorly understood. Recently, the nuclear localization signal sequences (NLS) of many nuclear proteins have been identified, and most have been found to be composed of a highly basic amino acid stretch. A genetic "subtractive" and a biochemical "additive" approach were used in our studies to identify the NLS's of the polyomavirus structural capsid proteins. An NLS was identified at the N-terminus (Ala1-Pro-Lys-Arg-Lys-Ser-Gly-Val-Ser-Lys-Cys11) of the major capsid protein VP1 and at the C-terminus (Glu307 -Glu-Asp-Gly-Pro-Glu-Lys-Lys-Lys-Arg-Arg-Leu318) of the VP2/VP3 minor capsid proteins.

  10. Purification, properties and complete amino acid sequence of the ferredoxin from a green alga, Chlamydomonas reinhardtii.

    PubMed

    Schmitter, J M; Jacquot, J P; de Lamotte-Guéry, F; Beauvallet, C; Dutka, S; Gadal, P; Decottignies, P

    1988-03-01

    The ferredoxin was purified from the green alga, Chlamydomonas reinhardtii. The protein showed typical absorption and circular dichroism spectra of a [2Fe-2S] ferredoxin. When compared with spinach ferredoxin, the C. reinhardtii protein was less effective in the catalysis of NADP+ photoreduction, but its activity was higher in the light activation of C. reinhardtii malate dehydrogenase (NADP). The complete amino acid sequence was determined by automated Edman degradation of the whole protein and of peptides obtained by trypsin and chymotrypsin digestions and by CNBr cleavage. The protein consists of 94 residues, with Tyr at both NH2 and COOH termini. The positions of the four cysteines binding the two iron atoms are similar to those found in other [2Fe-2S] ferredoxins. The primary structure of C. reinhardtii ferredoxin showed a great homology (about 80%) with ferredoxins from two other green algae.

  11. Real-time nucleic acid sequence-based amplification in nanoliter volumes.

    PubMed

    Gulliksen, Anja; Solli, Lars; Karlsen, Frank; Rogne, Henrik; Hovig, Eivind; Nordstrøm, Trine; Sirevåg, Reidun

    2004-01-01

    Real-time nucleic acid sequence-based amplification (NASBA) is an isothermal method specifically designed for amplification of RNA. Fluorescent molecular beacon probes enable real-time monitoring of the amplification process. Successful identification, utilizing the real-time NASBA technology, was performed on a microchip with oligonucleotides at a concentration of 1.0 and 0.1 microM, in 10- and 50-nL reaction chambers, respectively. The microchip was developed in a silicon-glass structure. An instrument providing thermal control and an optical detection system was built for amplification readout. Experimental results demonstrate distinct amplification processes. Miniaturized real-time NASBA in microchips makes high-throughput diagnostics of bacteria, viruses, and cancer markers possible, at reduced cost and without contamination.

  12. Real-time nucleic acid sequence-based amplification assay for detection of hepatitis A virus.

    PubMed

    Abd el-Galil, Khaled H; el-Sokkary, M A; Kheira, S M; Salazar, Andre M; Yates, Marylynn V; Chen, Wilfred; Mulchandani, Ashok

    2005-11-01

    A nucleic acid sequence-based amplification (NASBA) assay in combination with a molecular beacon was developed for the real-time detection and quantification of hepatitis A virus (HAV). A 202-bp, highly conserved 5' noncoding region of HAV was targeted. The sensitivity of the real-time NASBA assay was tested with 10-fold dilutions of viral RNA, and a detection limit of 1 PFU was obtained. The specificity of the assay was demonstrated by testing with other environmental pathogens and indicator microorganisms, with only HAV positively identified. When combined with immunomagnetic separation, the NASBA assay successfully detected as few as 10 PFU from seeded lake water samples. Due to its isothermal nature, its speed, and its similar sensitivity compared to the real-time RT-PCR assay, this newly reported real-time NASBA method will have broad applications for the rapid detection of HAV in contaminated food or water.

  13. Detection of infectious salmon anaemia virus by real-time nucleic acid sequence based amplification.

    PubMed

    Starkey, William G; Smail, David A; Bleie, Hogne; Muir, K Fiona; Ireland, Jacqueline H; Richards, Randolph H

    2006-10-17

    We have developed a real-time nucleic acid sequence based amplification (NASBA) procedure for detection of infectious salmon anaemia virus (ISAV). Primers were designed to target a 124 nucleotide region of ISAV genome segment 8. Amplification products were detected in real-time with a molecular beacon (carboxyfluorescin [FAM]-labelled and methyl-red quenched) that recognised an internal region of the target amplicon. Amplification and detection were performed at 41 degrees C for 90 min in a Corbett Research Rotorgene. The real-time NASBA assay was compared to a conventional RT-PCR for ISAV detection. From a panel of 45 clinical samples, both assays detected ISAV in the same 19 samples. Based on the detection of a synthetic RNA target, the real-time NASBA procedure was approximately 100x more sensitive than conventional RT-PCR. These results suggest that real-time NASBA may represent a useful diagnostic procedure for ISAV.

  14. Sequence-defined shuttles for targeted nucleic acid and protein delivery.

    PubMed

    Röder, Ruth; Wagner, Ernst

    2014-01-01

    Molecular medicine opens into a space of novel specific therapeutic agents: intracellularly active drugs such as peptides, proteins or nucleic acids, which are not able to cross cell membranes and enter the intracellular space on their own. Through the development of cell-targeted shuttles for specific delivery, this restriction in delivery has the potential to be converted into an advantage. On the one hand, due to the multiple extra- and intracellular barriers, such carrier systems need to be multifunctional. On the other hand, they must be precise and reproducibly manufactured due to pharmaceutical reasons. Here we review the design of precise sequence-defined delivery carriers, including solid-phase synthesized peptides and nonpeptidic oligomers, or nucleotide-based carriers such as aptamers and origami nanoboxes.

  15. Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition.

    PubMed

    Wang, Meng; Yang, Jie; Liu, Guo-Ping; Xu, Zhi-Jie; Chou, Kuo-Chen

    2004-06-01

    Membrane proteins are generally classified into the following five types: (1) type I membrane proteins, (2) type II membrane proteins, (3) multipass transmembrane proteins, (4) lipid chain-anchored membrane proteins and (5) GPI-anchored membrane proteins. Prediction of membrane protein types has become one of the growing hot topics in bioinformatics. Currently, we are facing two critical challenges in this area: first, how to take into account the extremely complicated sequence-order effects, and second, how to deal with the highly uneven sizes of the subsets in a training dataset. In this paper, stimulated by the concept of using the pseudo-amino acid composition to incorporate the sequence-order effects, the spectral analysis technique is introduced to represent the statistical sample of a protein. Based on such a framework, the weighted support vector machine (SVM) algorithm is applied. The new approach has remarkable power in dealing with the bias caused by the situation when one subset in the training dataset contains many more samples than the other. The new method is particularly useful when our focus is aimed at proteins belonging to small subsets. The results obtained by the self-consistency test, jackknife test and independent dataset test are encouraging, indicating that the current approach may serve as a powerful complementary tool to other existing methods for predicting the types of membrane proteins.

  16. Parameters of proteome evolution from histograms of amino-acid sequence identities of paralogous proteins

    PubMed Central

    Axelsen, Jacob Bock; Yan, Koon-Kiu; Maslov, Sergei

    2007-01-01

    Background The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic parameters of these three basic processes is contained in the proteome-wide distribution of sequence identities of pairs of paralogous proteins. Results We introduce a simple mathematical framework based on a stochastic birth-and-death model that allows one to extract some of this information and apply it to the set of all pairs of paralogous proteins in H. pylori, E. coli, S. cerevisiae, C. elegans, D. melanogaster, and H. sapiens. It was found that the histogram of sequence identities p generated by an all-to-all alignment of all protein sequences encoded in a genome is well fitted with a power-law form ~ p-γ with the value of the exponent γ around 4 for the majority of organisms used in this study. This implies that the intra-protein variability of substitution rates is best described by the Gamma-distribution with the exponent α ≈ 0.33. Different features of the shape of such histograms allow us to quantify the ratio between the genome-wide average deletion/duplication rates and the amino-acid substitution rate. Conclusion We separately measure the short-term ("raw") duplication and deletion rates rdup∗, rdel∗ which include gene copies that will be removed soon after the duplication event and their dramatically reduced long-term counterparts rdup, rdel. High deletion rate among recently duplicated proteins is consistent with a scenario in which they didn't have enough time to significantly change their functional roles and thus are to a large degree disposable. Systematic trends of each of the four duplication/deletion rates with the total number of genes in the genome were analyzed. All but the deletion rate of recent duplicates rdel∗ were shown to systematically increase with Ngenes. Abnormally flat shapes

  17. Prediction algorithm for amino acid types with their secondary structure in proteins (PLATON) using chemical shifts.

    PubMed

    Labudde, D; Leitner, D; Krüger, M; Oschkinat, H

    2003-01-01

    The algorithm PLATON is able to assign sets of chemical shifts derived from a single residue to amino acid types with its secondary structure (amino acid species). A subsequent ranking procedure using optionally two different penalty functions yields predictions for possible amino acid species for the given set of chemical shifts. This was demonstrated in the case of the alpha-spectrin SH3 domain and applied to 9 further protein data sets taken from the BioMagRes database. A database consisting of reference chemical shift patterns (reference CSPs) was generated from assigned chemical shifts of proteins with known 3D-structure. This reference CSP database is used in our approach for extracting distributions of amino acid types with their most likely secondary structure elements (namely alpha-helix, beta-sheet, and coil) for single amino acids by comparison with query CSPs. Results obtained for the 10 investigated proteins indicates that the percentage of correct amino acid species in the first three positions in the ranking list, ranges from 71.4% to 93.2% for the more favorable penalty function. Where only the top result of the ranking list for these 10 proteins is considered, 36.5% to 83.1% of the amino acid species are correctly predicted. The main advantage of our approach, over other methods that rely on average chemical shift values is the ability to increase database content by incorporating newly derived CSPs, and therefore to improve PLATON's performance over time.

  18. Evaluating and predicting the oxidative stability of vegetable oils with different fatty acid compositions.

    PubMed

    Li, Hongyan; Fan, Ya-wei; Li, Jing; Tang, Liang; Hu, Jiang-ning; Deng, Ze-yuan

    2013-04-01

    The aim of this research was to evaluate the oxidative stabilities and qualities of different vegetable oils (almond, blend 1-8, camellia, corn, palm, peanut, rapeseed, sesame, soybean, sunflower, and zanthoxylum oil) based on peroxide value (PV), vitamin E content, free fatty acid, and fatty acid composition. The vegetable oils with different initial fatty acid compositions were studied under accelerated oxidation condition. It showed that PV and n-3 polyunsaturated fatty acid (PUFA) changed significantly during 21 d accelerated oxidation storage. Based on the changes of PV and fatty acid composition during the oxidation process, mathematical models were hypothesized and the models were simulated by Matlab to generate the proposed equations. These equations were established on the basis of the different PUFA contents as 10% to 28%, 28% to 46%, and 46% to 64%, respectively. The simulated models were proven to be validated and valuable for assessing the degree of oxidation and predicting the shelf life of vegetable oils.

  19. Trypsin inhibitors from ridged gourd (Luffa acutangula Linn.) seeds: purification, properties, and amino acid sequences.

    PubMed

    Haldar, U C; Saha, S K; Beavis, R C; Sinha, N K

    1996-02-01

    Two trypsin inhibitors, LA-1 and LA-2, have been isolated from ridged gourd (Luffa acutangula Linn.) seeds and purified to homogeneity by gel filtration followed by ion-exchange chromatography. The isoelectric point is at pH 4.55 for LA-1 and at pH 5.85 for LA-2. The Stokes radius of each inhibitor is 11.4 A. The fluorescence emission spectrum of each inhibitor is similar to that of the free tyrosine. The biomolecular rate constant of acrylamide quenching is 1.0 x 10(9) M-1 sec-1 for LA-1 and 0.8 x 10(9) M-1 sec-1 for LA-2 and that of K2HPO4 quenching is 1.6 x 10(11) M-1 sec-1 for LA-1 and 1.2 x 10(11) M-1 sec-1 for LA-2. Analysis of the circular dichroic spectra yields 40% alpha-helix and 60% beta-turn for La-1 and 45% alpha-helix and 55% beta-turn for LA-2. Inhibitors LA-1 and LA-2 consist of 28 and 29 amino acid residues, respectively. They lack threonine, alanine, valine, and tryptophan. Both inhibitors strongly inhibit trypsin by forming enzyme-inhibitor complexes at a molar ratio of unity. A chemical modification study suggests the involvement of arginine of LA-1 and lysine of LA-2 in their reactive sites. The inhibitors are very similar in their amino acid sequences, and show sequence homology with other squash family inhibitors. PMID:8924202

  20. Microfluidic platform for isolating nucleic acid targets using sequence specific hybridization

    PubMed Central

    Wang, Jingjing; Morabito, Kenneth; Tang, Jay X.; Tripathi, Anubhav

    2013-01-01

    The separation of target nucleic acid sequences from biological samples has emerged as a significant process in today's diagnostics and detection strategies. In addition to the possible clinical applications, the fundamental understanding of target and sequence specific hybridization on surface modified magnetic beads is of high value. In this paper, we describe a novel microfluidic platform that utilizes a mobile magnetic field in static microfluidic channels, where single stranded DNA (ssDNA) molecules are isolated via nucleic acid hybridization. We first established efficient isolation of biotinylated capture probe (BP) using streptavidin-coated magnetic beads. Subsequently, we investigated the hybridization of target ssDNA with BP bound to beads and explained these hybridization kinetics using a dual-species kinetic model. The number of hybridized target ssDNA molecules was determined to be about 6.5 times less than that of BP on the bead surface, due to steric hindrance effects. The hybridization of target ssDNA with non-complementary BP bound to bead was also examined, and non-specific hybridization was found to be insignificant. Finally, we demonstrated highly efficient capture and isolation of target ssDNA in the presence of non-target ssDNA, where as low as 1% target ssDNA can be detected from mixture. The microfluidic method described in this paper is significantly relevant and is broadly applicable, especially towards point-of-care biological diagnostic platforms that require binding and separation of known target biomolecules, such as RNA, ssDNA, or protein. PMID:24404041

  1. Detection of Vibrio cholerae by real-time nucleic acid sequence-based amplification.

    PubMed

    Fykse, Else M; Skogan, Gunnar; Davies, William; Olsen, Jaran Strand; Blatny, Janet M

    2007-03-01

    A multitarget molecular beacon-based real-time nucleic acid sequence-based amplification (NASBA) assay for the specific detection of Vibrio cholerae has been developed. The genes encoding the cholera toxin (ctxA), the toxin-coregulated pilus (tcpA; colonization factor), the ctxA toxin regulator (toxR), hemolysin (hlyA), and the 60-kDa chaperonin product (groEL) were selected as target sequences for detection. The beacons for the five different genetic targets were evaluated by serial dilution of RNA from V. cholerae cells. RNase treatment of the nucleic acids eliminated all NASBA, whereas DNase treatment had no effect, showing that RNA and not DNA was amplified. The specificity of the assay was investigated by testing several isolates of V. cholerae, other Vibrio species, and Bacillus cereus, Salmonella enterica, and Escherichia coli strains. The toxR, groEL, and hlyA beacons identified all V. cholerae isolates, whereas the ctxA and tcpA beacons identified the O1 toxigenic clinical isolates. The NASBA assay detected V. cholerae at 50 CFU/ml by using the general marker groEL and tcpA that specifically indicates toxigenic strains. A correlation between cell viability and NASBA was demonstrated for the ctxA, toxR, and hlyA targets. RNA isolated from different environmental water samples spiked with V. cholerae was specifically detected by NASBA. These results indicate that NASBA can be used in the rapid detection of V. cholerae from various environmental water samples. This method has a strong potential for detecting toxigenic strains by using the tcpA and ctxA markers. The entire assay including RNA extraction and NASBA was completed within 3 h.

  2. Phylogenetic analysis of beta-papillomaviruses as inferred from nucleotide and amino acid sequence data.

    PubMed

    Gottschling, Marc; Köhler, Anja; Stockfleth, Eggert; Nindl, Ingo

    2007-01-01

    Human papillomaviruses (HPV) of the beta-group seem to be involved in the pathogenesis of non-melanoma skin cancer. Papillomaviruses are host specific and are considered closely co-evolving with their hosts. Evolutionary incongruence between early genes and late genes has been reported among oncogenic genital alpha-papillomaviruses and considerably challenge phylogenetic reconstructions. We investigated the relationships of 29 beta-HPV (25 types plus four putative new types, subtypes, or variants) as inferred from codon aligned and amino acid sequence data of the genes E1, E2, E6, E7, L1, and L2 using likelihood, distance, and parsimony approaches. An analysis of a L1 fragment included additional nucleotide and amino acid sequences from seven non-human beta-papillomaviruses. Early genes and late genes evolution did not conflict significantly in beta-papillomaviruses based on partition homogeneity tests (p > or = 0.001). As inferred from the complete genome analyses, beta-papillomaviruses were monophyletic and segregated into four highly supported monophyletic assemblages corresponding to the species 1, 2, 3, and fused 4/5. They basically split into the species 1 and the remainder of beta-papillomaviruses, whose species 3, 4, and 5 constituted the sistergroup of species 2. beta-Papillomaviruses have been isolated from humans, apes, and monkeys, and phylogenetic analyses of the L1 fragment showed non-human papillomaviruses highly polyphyletic nesting within the HPV species. Thus, host and virus phylogenies were not congruent in beta-papillomaviruses, and multiple invasions across species borders may contribute (additionally to host-linked evolution) to their diversification.

  3. Identification of Protein-Protein Interactions via a Novel Matrix-Based Sequence Representation Model with Amino Acid Contact Information.

    PubMed

    Ding, Yijie; Tang, Jijun; Guo, Fei

    2016-09-24

    Identification of protein-protein interactions (PPIs) is a difficult and important problem in biology. Since experimental methods for predicting PPIs are both expensive and time-consuming, many computational methods have been developed to predict PPIs and interaction networks, which can be used to complement experimental approaches. However, these methods have limitations to overcome. They need a large number of homology proteins or literature to be applied in their method. In this paper, we propose a novel matrix-based protein sequence representation approach to predict PPIs, using an ensemble learning method for classification. We construct the matrix of Amino Acid Contact (AAC), based on the statistical analysis of residue-pairing frequencies in a database of 6323 protein-protein complexes. We first represent the protein sequence as a Substitution Matrix Representation (SMR) matrix. Then, the feature vector is extracted by applying algorithms of Histogram of Oriented