Science.gov

Sample records for acid sequences predicted

  1. Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs

    PubMed Central

    Chen, Ke; Kurgan, Lukasz A; Ruan, Jishou

    2007-01-01

    Background Traditionally, it is believed that the native structure of a protein corresponds to a global minimum of its free energy. However, with the growing number of known tertiary (3D) protein structures, researchers have discovered that some proteins can alter their structures in response to a change in their surroundings or with the help of other proteins or ligands. Such structural shifts play a crucial role with respect to the protein function. To this end, we propose a machine learning method for the prediction of the flexible/rigid regions of proteins (referred to as FlexRP); the method is based on a novel sequence representation and feature selection. Knowledge of the flexible/rigid regions may provide insights into the protein folding process and the 3D structure prediction. Results The flexible/rigid regions were defined based on a dataset, which includes protein sequences that have multiple experimental structures, and which was previously used to study the structural conservation of proteins. Sequences drawn from this dataset were represented based on feature sets that were proposed in prior research, such as PSI-BLAST profiles, composition vector and binary sequence encoding, and a newly proposed representation based on frequencies of k-spaced amino acid pairs. These representations were processed by feature selection to reduce the dimensionality. Several machine learning methods for the prediction of flexible/rigid regions and two recently proposed methods for the prediction of conformational changes and unstructured regions were compared with the proposed method. The FlexRP method, which applies Logistic Regression and collocation-based representation with 95 features, obtained 79.5% accuracy. The two runner-up methods, which apply the same sequence representation and Support Vector Machines (SVM) and Naïve Bayes classifiers, obtained 79.2% and 78.4% accuracy, respectively. The remaining considered methods are characterized by accuracies below 70

  2. Fast computational methods for predicting protein structure from primary amino acid sequence

    DOEpatents

    Agarwal, Pratul Kumar

    2011-07-19

    The present invention provides a method utilizing primary amino acid sequence of a protein, energy minimization, molecular dynamics and protein vibrational modes to predict three-dimensional structure of a protein. The present invention also determines possible intermediates in the protein folding pathway. The present invention has important applications to the design of novel drugs as well as protein engineering. The present invention predicts the three-dimensional structure of a protein independent of size of the protein, overcoming a significant limitation in the prior art.

  3. Analysis of protein function and its prediction from amino acid sequence.

    PubMed

    Clark, Wyatt T; Radivojac, Predrag

    2011-07-01

    Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in the context of human disease because many conditions arise as a consequence of alterations of protein function. The recent availability of relatively inexpensive sequencing technology has resulted in thousands of complete or partially sequenced genomes with millions of functionally uncharacterized proteins. Such a large volume of data, combined with the lack of high-throughput experimental assays to functionally annotate proteins, attributes to the growing importance of automated function prediction. Here, we study proteins annotated by Gene Ontology (GO) terms and estimate the accuracy of functional transfer from protein sequence only. We find that the transfer of GO terms by pairwise sequence alignments is only moderately accurate, showing a surprisingly small influence of sequence identity (SID) in a broad range (30-100%). We developed and evaluated a new predictor of protein function, functional annotator (FANN), from amino acid sequence. The predictor exploits a multioutput neural network framework which is well suited to simultaneously modeling dependencies between functional terms. Experiments provide evidence that FANN-GO (predictor of GO terms; available from http://www.informatics.indiana.edu/predrag) outperforms standard methods such as transfer by global or local SID as well as GOtcha, a method that incorporates the structure of GO.

  4. Gene sequence and predicted amino acid sequence of the motA protein, a membrane-associated protein required for flagellar rotation in Escherichia coli.

    PubMed Central

    Dean, G E; Macnab, R M; Stader, J; Matsumura, P; Burks, C

    1984-01-01

    The motA and motB gene products of Escherichia coli are integral membrane proteins necessary for flagellar rotation. We determined the DNA sequence of the region containing the motA gene and its promoter. Within this sequence, there is an open reading frame of 885 nucleotides, which with high probability (98% confidence level) meets criteria for a coding sequence. The 295-residue amino acid translation product had a molecular weight of 31,974, in good agreement with the value determined experimentally by gel electrophoresis. The amino acid sequence, which was quite hydrophobic, was subjected to a theoretical analysis designed to predict membrane-spanning alpha-helical segments of integral membrane proteins; four such hydrophobic helices were predicted by this treatment. Additional amphipathic helices may also be present. A remarkable feature of the sequence is the existence of two segments of high uncompensated charge density, one positive and the other negative. Possible organization of the protein in the membrane is discussed. Asymmetry in the amino acid composition of translated DNA sequences was used to distinguish between two possible initiation codons. The use of this method as a criterion for authentication of coding regions is described briefly in an Appendix. PMID:6090403

  5. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.

    PubMed

    Emanuelsson, O; Nielsen, H; Brunak, S; von Heijne, G

    2000-07-21

    A neural network-based tool, TargetP, for large-scale subcellular location prediction of newly identified proteins has been developed. Using N-terminal sequence information only, it discriminates between proteins destined for the mitochondrion, the chloroplast, the secretory pathway, and "other" localizations with a success rate of 85% (plant) or 90% (non-plant) on redundancy-reduced test sets. From a TargetP analysis of the recently sequenced Arabidopsis thaliana chromosomes 2 and 4 and the Ensembl Homo sapiens protein set, we estimate that 10% of all plant proteins are mitochondrial and 14% chloroplastic, and that the abundance of secretory proteins, in both Arabidopsis and Homo, is around 10%. TargetP also predicts cleavage sites with levels of correctly predicted sites ranging from approximately 40% to 50% (chloroplastic and mitochondrial presequences) to above 70% (secretory signal peptides). TargetP is available as a web-server at http://www.cbs.dtu.dk/services/TargetP/.

  6. Predicting Protein–Protein Interaction Sites Using Sequence Descriptors and Site Propensity of Neighboring Amino Acids

    PubMed Central

    Kuo, Tzu-Hao; Li, Kuo-Bin

    2016-01-01

    Information about the interface sites of Protein–Protein Interactions (PPIs) is useful for many biological research works. However, despite the advancement of experimental techniques, the identification of PPI sites still remains as a challenging task. Using a statistical learning technique, we proposed a computational tool for predicting PPI interaction sites. As an alternative to similar approaches requiring structural information, the proposed method takes all of the input from protein sequences. In addition to typical sequence features, our method takes into consideration that interaction sites are not randomly distributed over the protein sequence. We characterized this positional preference using protein complexes with known structures, proposed a numerical index to estimate the propensity and then incorporated the index into a learning system. The resulting predictor, without using structural information, yields an area under the ROC curve (AUC) of 0.675, recall of 0.597, precision of 0.311 and accuracy of 0.583 on a ten-fold cross-validation experiment. This performance is comparable to the previous approach in which structural information was used. Upon introducing the B-factor data to our predictor, we demonstrated that the AUC can be further improved to 0.750. The tool is accessible at http://bsaltools.ym.edu.tw/predppis. PMID:27792167

  7. Prediction of posttranslational modification sites from amino acid sequences with kernel methods.

    PubMed

    Xu, Yan; Wang, Xiaobo; Wang, Yongcui; Tian, Yingjie; Shao, Xiaojian; Wu, Ling-Yun; Deng, Naiyang

    2014-03-07

    Post-translational modification (PTM) is the chemical modification of a protein after its translation and one of the later steps in protein biosynthesis for many proteins. It plays an important role which modifies the end product of gene expression and contributes to biological processes and diseased conditions. However, the experimental methods for identifying PTM sites are both costly and time-consuming. Hence computational methods are highly desired. In this work, a novel encoding method PSPM (position-specific propensity matrices) is developed. Then a support vector machine (SVM) with the kernel matrix computed by PSPM is applied to predict the PTM sites. The experimental results indicate that the performance of new method is better or comparable with the existing methods. Therefore, the new method is a useful computational resource for the identification of PTM sites. A unified standalone software PTMPred is developed. It can be used to predict all types of PTM sites if the user provides the training datasets. The software can be freely downloaded from http://www.aporc.org/doc/wiki/PTMPred.

  8. Hybridization properties of long nucleic acid probes for detection of variable target sequences, and development of a hybridization prediction algorithm.

    PubMed

    Ohrmalm, Christina; Jobs, Magnus; Eriksson, Ronnie; Golbob, Sultan; Elfaitouri, Amal; Benachenhou, Farid; Strømme, Maria; Blomberg, Jonas

    2010-11-01

    One of the main problems in nucleic acid-based techniques for detection of infectious agents, such as influenza viruses, is that of nucleic acid sequence variation. DNA probes, 70-nt long, some including the nucleotide analog deoxyribose-Inosine (dInosine), were analyzed for hybridization tolerance to different amounts and distributions of mismatching bases, e.g. synonymous mutations, in target DNA. Microsphere-linked 70-mer probes were hybridized in 3M TMAC buffer to biotinylated single-stranded (ss) DNA for subsequent analysis in a Luminex® system. When mismatches interrupted contiguous matching stretches of 6 nt or longer, it had a strong impact on hybridization. Contiguous matching stretches are more important than the same number of matching nucleotides separated by mismatches into several regions. dInosine, but not 5-nitroindole, substitutions at mismatching positions stabilized hybridization remarkably well, comparable to N (4-fold) wobbles in the same positions. In contrast to shorter probes, 70-nt probes with judiciously placed dInosine substitutions and/or wobble positions were remarkably mismatch tolerant, with preserved specificity. An algorithm, NucZip, was constructed to model the nucleation and zipping phases of hybridization, integrating both local and distant binding contributions. It predicted hybridization more exactly than previous algorithms, and has the potential to guide the design of variation-tolerant yet specific probes.

  9. Immunoreactivity of polyclonal antibodies generated against the carboxy terminus of the predicted amino acid sequence of the Huntington disease gene

    SciTech Connect

    Alkatib, G.; Graham, R.; Pelmear-Telenius, A.

    1994-09-01

    A cDNA fragment spanning the 3{prime}-end of the Huntington disease gene (from 8052 to 9252) was cloned into a prokaryotic expression vector containing the E. Coli lac promoter and a portion of the coding sequence for {beta}-galactosidase. The truncated {beta}-galactosidase gene was cleaved with BamHl and fused in frame to the BamHl fragment of the Huntington disease gene 3{prime}-end. Expression analysis of proteins made in E. Coli revealed that 20-30% of the total cellular proteins was represented by the {beta}-galactosidase-huntingtin fusion protein. The identity of the Huntington disease protein amino acid sequences was confirmed by protein sequence analysis. Affinity chromatography was used to purify large quantities of the fusion protein from bacterial cell lysates. Affinity-purified proteins were used to immunize New Zealand white rabbits for antibody production. The generated polyclonal antibodies were used to immunoprecipitate the Huntington disease gene product expressed in a neuroblastoma cell line. In this cell line the antibodies precipitated two protein bands of apparent gel migrations of 200 and 150 kd which together, correspond to the calculated molecular weight of the Huntington disease gene product (350 kd). Immunoblotting experiments revealed the presence of a large precursor protein in the range of 350-750 kd which is in agreement with the predicted molecular weight of the protein without post-translational modifications. These results indicate that the huntingtin protein is cleaved into two subunits in this neuroblastoma cell line and implicate that cleavage of a large precursor protein may contribute to its biological activity. Experiments are ongoing to determine the precursor-product relationship and to examine the synthesis of the huntingtin protein in freshly isolated rat brains, and to determine cellular and subcellular distribution of the gene product.

  10. Nucleotide sequence of the Klebsiella pneumoniae nifD gene and predicted amino acid sequence of the alpha-subunit of nitrogenase MoFe protein.

    PubMed Central

    Ioannidis, I; Buck, M

    1987-01-01

    The nucleotide sequence of the Klebsiella pneumoniae nifD gene is presented and together with the accompanying paper [Holland, Zilberstein, Zamir & Sussman (1987) Biochem. J. 247, 277-285] completes the sequence of the nifHDK genes encoding the nitrogenase polypeptides. The K. pneumoniae nifD gene encodes the 483-amino acid-residue nitrogenase alpha-subunit polypeptide of Mr 54156. The alpha-subunit has five strongly conserved cysteine residues at positions 63, 89, 155, 184 and 275, some occurring in a region showing both primary sequence and potential structural homology to the K. pneumoniae nitrogenase beta-subunit. A comparison with six other alpha-subunit amino acid sequences has been made, which indicates a number of potentially important domains within alpha-subunits. PMID:3322262

  11. Coronavirus genome: prediction of putative functional domains in the non-structural polyprotein by comparative amino acid sequence analysis.

    PubMed Central

    Gorbalenya, A E; Koonin, E V; Donchenko, A P; Blinov, V M

    1989-01-01

    Amino acid sequences of 2 giant non-structural polyproteins (F1 and F2) of infectious bronchitis virus (IBV), a member of Coronaviridae, were compared, by computer-assisted methods, to sequences of a number of other positive strand RNA viral and cellular proteins. By this approach, juxtaposed putative RNA-dependent RNA polymerase, nucleic acid binding ("finger"-like) and RNA helicase domains were identified in F2. Together, these domains might constitute the core of the protein complex involved in the primer-dependent transcription, replication and recombination of coronaviruses. In F1, two cysteine protease-like domains and a growth factor-like one were revealed. One of the putative proteases of IBV is similar to 3C proteases of picornaviruses and related enzymes of como- nepo- and potyviruses. Search of IBV F1 and F2 sequences for sites similar to those cleaved by the latter proteases and intercomparison of the surrounding sequence stretches revealed 13 dipeptides Q/S(G) which are probably cleaved by the coronavirus 3C-like protease. Based on these observations, a partial tentative scheme for the functional organization and expression strategy of the non-structural polyproteins of IBV was proposed. It implies that, despite the general similarity to other positive strand RNA viruses, and particularly to potyviruses, coronaviruses possess a number of unique structural and functional features. PMID:2526320

  12. Sequence Alignment to Predict Across Species Susceptibility ...

    EPA Pesticide Factsheets

    Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target, so it is amenable to variable degrees of protein characterization, depending on available information about the chemical/protein interaction and the molecular target itself. To allow for flexibility in the analysis, a layered strategy was adopted for the tool. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of candidate orthologs), the second level evaluates sequence similarity within selected domains (e.g., ligand-binding domain, DNA binding domain), and the third level of analysis compares individual amino acid residue positions identified as being of importance for protein conformation and/or ligand binding upon chemical perturbation. Each level of the SeqAPASS analysis provides increasing evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further ev

  13. Composition for nucleic acid sequencing

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2008-08-26

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  14. Prediction, sequences and the hippocampus

    PubMed Central

    Lisman, John; Redish, A.D.

    2009-01-01

    Recordings of rat hippocampal place cells have provided information about how the hippocampus retrieves memory sequences. One line of evidence has to do with phase precession, a process organized by theta and gamma oscillations. This precession can be interpreted as the cued prediction of the sequence of upcoming positions. In support of this interpretation, experiments in two-dimensional environments and on a cue-rich linear track demonstrate that many cells represent a position ahead of the animal and that this position is the same irrespective of which direction the rat is coming from. Other lines of investigation have demonstrated that such predictive processes also occur in the non-spatial domain and that retrieval can be internally or externally cued. The mechanism of sequence retrieval and the usefulness of this retrieval to guide behaviour are discussed. PMID:19528000

  15. High speed nucleic acid sequencing

    SciTech Connect

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid. Each type of labeled nucleotide comprises an acceptor fluorophore attached to a phosphate portion of the nucleotide such that the fluorophore is removed upon incorporation into a growing strand. Fluorescent signal is emitted via fluorescent resonance energy transfer between the donor fluorophore and the acceptor fluorophore as each nucleotide is incorporated into the growing strand. The sequence is deduced by identifying which base is being incorporated into the growing strand.

  16. Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou's pseudo amino acid composition.

    PubMed

    Zhang, Lichao; Zhao, Xiqiang; Kong, Liang

    2014-08-21

    Knowledge of protein structural class plays an important role in characterizing the overall folding type of a given protein. At present, it is still a challenge to extract sequence information solely using protein sequence for protein structural class prediction with low similarity sequence in the current computational biology. In this study, a novel sequence representation method is proposed based on position specific scoring matrix for protein structural class prediction. By defined evolutionary difference formula, varying length proteins are expressed as uniform dimensional vectors, which can represent evolutionary difference information between the adjacent residues of a given protein. To perform and evaluate the proposed method, support vector machine and jackknife tests are employed on three widely used datasets, 25PDB, 1189 and 640 datasets with sequence similarity lower than 25%, 40% and 25%, respectively. Comparison of our results with the previous methods shows that our method may provide a promising method to predict protein structural class especially for low-similarity sequences.

  17. Discrete sequence prediction and its applications

    NASA Technical Reports Server (NTRS)

    Laird, Philip

    1992-01-01

    Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We apply sequence prediction using a simple and practical sequence-prediction algorithm, called TDAG. The TDAG algorithm is first tested by comparing its performance with some common data compression algorithms. Then it is adapted to the detailed requirements of dynamic program optimization, with excellent results.

  18. Amino acid composition predicts prion activity.

    PubMed

    Afsar Minhas, Fayyaz Ul Amir; Ross, Eric D; Ben-Hur, Asa

    2017-04-10

    Many prion-forming proteins contain glutamine/asparagine (Q/N) rich domains, and there are conflicting opinions as to the role of primary sequence in their conversion to the prion form: is this phenomenon driven primarily by amino acid composition, or, as a recent computational analysis suggested, dependent on the presence of short sequence elements with high amyloid-forming potential. The argument for the importance of short sequence elements hinged on the relatively-high accuracy obtained using a method that utilizes a collection of length-six sequence elements with known amyloid-forming potential. We weigh in on this question and demonstrate that when those sequence elements are permuted, even higher accuracy is obtained; we also propose a novel multiple-instance machine learning method that uses sequence composition alone, and achieves better accuracy than all existing prion prediction approaches. While we expect there to be elements of primary sequence that affect the process, our experiments suggest that sequence composition alone is sufficient for predicting protein sequences that are likely to form prions. A web-server for the proposed method is available at http://faculty.pieas.edu.pk/fayyaz/prank.html, and the code for reproducing our experiments is available at http://doi.org/10.5281/zenodo.167136.

  19. Chip-based sequencing nucleic acids

    DOEpatents

    Beer, Neil Reginald

    2014-08-26

    A system for fast DNA sequencing by amplification of genetic material within microreactors, denaturing, demulsifying, and then sequencing the material, while retaining it in a PCR/sequencing zone by a magnetic field. One embodiment includes sequencing nucleic acids on a microchip that includes a microchannel flow channel in the microchip. The nucleic acids are isolated and hybridized to magnetic nanoparticles or to magnetic polystyrene-coated beads. Microreactor droplets are formed in the microchannel flow channel. The microreactor droplets containing the nucleic acids and the magnetic nanoparticles are retained in a magnetic trap in the microchannel flow channel and sequenced.

  20. KM+, a mannose-binding lectin from Artocarpus integrifolia: amino acid sequence, predicted tertiary structure, carbohydrate recognition, and analysis of the beta-prism fold.

    PubMed Central

    Rosa, J. C.; De Oliveira, P. S.; Garratt, R.; Beltramini, L.; Resing, K.; Roque-Barreira, M. C.; Greene, L. J.

    1999-01-01

    The complete amino acid sequence of the lectin KM+ from Artocarpus integrifolia (jackfruit), which contains 149 residues/mol, is reported and compared to those of other members of the Moraceae family, particularly that of jacalin, also from jackfruit, with which it shares 52% sequence identity. KM+ presents an acetyl-blocked N-terminus and is not posttranslationally modified by proteolytic cleavage as is the case for jacalin. Rather, it possesses a short, glycine-rich linker that unites the regions homologous to the alpha- and beta-chains of jacalin. The results of homology modeling implicate the linker sequence in sterically impeding rotation of the side chain of Asp141 within the binding site pocket. As a consequence, the aspartic acid is locked into a conformation adequate only for the recognition of equatorial hydroxyl groups on the C4 epimeric center (alpha-D-mannose, alpha-D-glucose, and their derivatives). In contrast, the internal cleavage of the jacalin chain permits free rotation of the homologous aspartic acid, rendering it capable of accepting hydrogen bonds from both possible hydroxyl configurations on C4. We suggest that, together with direct recognition of epimeric hydroxyls and the steric exclusion of disfavored ligands, conformational restriction of the lectin should be considered to be a new mechanism by which selectivity may be built into carbohydrate binding sites. Jacalin and KM+ adopt the beta-prism fold already observed in two unrelated protein families. Despite presenting little or no sequence similarity, an analysis of the beta-prism reveals a canonical feature repeatedly present in all such structures, which is based on six largely hydrophobic residues within a beta-hairpin containing two classic-type beta-bulges. We suggest the term beta-prism motif to describe this feature. PMID:10210179

  1. Dipeptide Sequence Determination: Analyzing Phenylthiohydantoin Amino Acids by HPLC

    NASA Astrophysics Data System (ADS)

    Barton, Janice S.; Tang, Chung-Fei; Reed, Steven S.

    2000-02-01

    Amino acid composition and sequence determination, important techniques for characterizing peptides and proteins, are essential for predicting conformation and studying sequence alignment. This experiment presents improved, fundamental methods of sequence analysis for an upper-division biochemistry laboratory. Working in pairs, students use the Edman reagent to prepare phenylthiohydantoin derivatives of amino acids for determination of the sequence of an unknown dipeptide. With a single HPLC technique, students identify both the N-terminal amino acid and the composition of the dipeptide. This method yields good precision of retention times and allows use of a broad range of amino acids as components of the dipeptide. Students learn fundamental principles and techniques of sequence analysis and HPLC.

  2. Distinguishing Proteins From Arbitrary Amino Acid Sequences

    PubMed Central

    Yau, Stephen S.-T.; Mao, Wei-Guang; Benson, Max; He, Rong Lucy

    2015-01-01

    What kinds of amino acid sequences could possibly be protein sequences? From all existing databases that we can find, known proteins are only a small fraction of all possible combinations of amino acids. Beginning with Sanger's first detailed determination of a protein sequence in 1952, previous studies have focused on describing the structure of existing protein sequences in order to construct the protein universe. No one, however, has developed a criteria for determining whether an arbitrary amino acid sequence can be a protein. Here we show that when the collection of arbitrary amino acid sequences is viewed in an appropriate geometric context, the protein sequences cluster together. This leads to a new computational test, described here, that has proved to be remarkably accurate at determining whether an arbitrary amino acid sequence can be a protein. Even more, if the results of this test indicate that the sequence can be a protein, and it is indeed a protein sequence, then its identity as a protein sequence is uniquely defined. We anticipate our computational test will be useful for those who are attempting to complete the job of discovering all proteins, or constructing the protein universe. PMID:25609314

  3. The complete amino acid sequence of prochymosin.

    PubMed Central

    Foltmann, B; Pedersen, V B; Jacobsen, H; Kauffman, D; Wybrandt, G

    1977-01-01

    The total sequence of 365 amino acid residues in bovine prochymosin is presented. Alignment with the amino acid sequence of porcine pepsinogen shows that 204 amino acid residues are common to the two zymogens. Further comparison and alignment with the amino acid sequence of penicillopepsin shows that 66 residues are located at identical positions in all three proteases. The three enzymes belong to a large group of proteases with two aspartate residues in the active center. This group forms a family derived from one common ancestor. PMID:329280

  4. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-05-30

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  5. Method for sequencing nucleic acid molecules

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2006-06-06

    The present invention is directed to a method of sequencing a target nucleic acid molecule having a plurality of bases. In its principle, the temporal order of base additions during the polymerization reaction is measured on a molecule of nucleic acid, i.e. the activity of a nucleic acid polymerizing enzyme on the template nucleic acid molecule to be sequenced is followed in real time. The sequence is deduced by identifying which base is being incorporated into the growing complementary strand of the target nucleic acid by the catalytic activity of the nucleic acid polymerizing enzyme at each step in the sequence of base additions. A polymerase on the target nucleic acid molecule complex is provided in a position suitable to move along the target nucleic acid molecule and extend the oligonucleotide primer at an active site. A plurality of labelled types of nucleotide analogs are provided proximate to the active site, with each distinguishable type of nucleotide analog being complementary to a different nucleotide in the target nucleic acid sequence. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand at the active site, where the nucleotide analog being added is complementary to the nucleotide of the target nucleic acid at the active site. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The steps of providing labelled nucleotide analogs, polymerizing the growing nucleic acid strand, and identifying the added nucleotide analog are repeated so that the nucleic acid strand is further extended and the sequence of the target nucleic acid is determined.

  6. Protein structure prediction from sequence variation

    PubMed Central

    Marks, Debora S; Hopf, Thomas A; Sander, Chris

    2015-01-01

    Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics. PMID:23138306

  7. Predicting pseudoknotted structures across two RNA sequences

    PubMed Central

    Sperschneider, Jana; Datta, Amitava; Wise, Michael J.

    2012-01-01

    Motivation: Laboratory RNA structure determination is demanding and costly and thus, computational structure prediction is an important task. Single sequence methods for RNA secondary structure prediction are limited by the accuracy of the underlying folding model, if a structure is supported by a family of evolutionarily related sequences, one can be more confident that the prediction is accurate. RNA pseudoknots are functional elements, which have highly conserved structures. However, few comparative structure prediction methods can handle pseudoknots due to the computational complexity. Results: A comparative pseudoknot prediction method called DotKnot-PW is introduced based on structural comparison of secondary structure elements and H-type pseudoknot candidates. DotKnot-PW outperforms other methods from the literature on a hand-curated test set of RNA structures with experimental support. Availability: DotKnot-PW and the RNA structure test set are available at the web site http://dotknot.csse.uwa.edu.au/pw. Contact: janaspe@csse.uwa.edu.au Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23044552

  8. Lossless Video Sequence Compression Using Adaptive Prediction

    NASA Technical Reports Server (NTRS)

    Li, Ying; Sayood, Khalid

    2007-01-01

    We present an adaptive lossless video compression algorithm based on predictive coding. The proposed algorithm exploits temporal, spatial, and spectral redundancies in a backward adaptive fashion with extremely low side information. The computational complexity is further reduced by using a caching strategy. We also study the relationship between the operational domain for the coder (wavelet or spatial) and the amount of temporal and spatial redundancy in the sequence being encoded. Experimental results show that the proposed scheme provides significant improvements in compression efficiencies.

  9. Analysis and Annotation of Nucleic Acid Sequence

    SciTech Connect

    States, David J.

    2004-07-28

    The aims of this project were to develop improved methods for computational genome annotation and to apply these methods to improve the annotation of genomic sequence data with a specific focus on human genome sequencing. The project resulted in a substantial body of published work. Notable contributions of this project were the identification of basecalling and lane tracking as error processes in genome sequencing and contributions to improved methods for these steps in genome sequencing. This technology improved the accuracy and throughput of genome sequence analysis. Probabilistic methods for physical map construction were developed. Improved methods for sequence alignment, alternative splicing analysis, promoter identification and NF kappa B response gene prediction were also developed.

  10. Prediction of protein function from protein sequence and structure.

    PubMed

    Whisstock, James C; Lesk, Arthur M

    2003-08-01

    The sequence of a genome contains the plans of the possible life of an organism, but implementation of genetic information depends on the functions of the proteins and nucleic acids that it encodes. Many individual proteins of known sequence and structure present challenges to the understanding of their function. In particular, a number of genes responsible for diseases have been identified but their specific functions are unknown. Whole-genome sequencing projects are a major source of proteins of unknown function. Annotation of a genome involves assignment of functions to gene products, in most cases on the basis of amino-acid sequence alone. 3D structure can aid the assignment of function, motivating the challenge of structural genomics projects to make structural information available for novel uncharacterized proteins. Structure-based identification of homologues often succeeds where sequence-alone-based methods fail, because in many cases evolution retains the folding pattern long after sequence similarity becomes undetectable. Nevertheless, prediction of protein function from sequence and structure is a difficult problem, because homologous proteins often have different functions. Many methods of function prediction rely on identifying similarity in sequence and/or structure between a protein of unknown function and one or more well-understood proteins. Alternative methods include inferring conservation patterns in members of a functionally uncharacterized family for which many sequences and structures are known. However, these inferences are tenuous. Such methods provide reasonable guesses at function, but are far from foolproof. It is therefore fortunate that the development of whole-organism approaches and comparative genomics permits other approaches to function prediction when the data are available. These include the use of protein-protein interaction patterns, and correlations between occurrences of related proteins in different organisms, as

  11. Phenolic acid esterases, coding sequences and methods

    DOEpatents

    Blum, David L.; Kataeva, Irina; Li, Xin-Liang; Ljungdahl, Lars G.

    2002-01-01

    Described herein are four phenolic acid esterases, three of which correspond to domains of previously unknown function within bacterial xylanases, from XynY and XynZ of Clostridium thermocellum and from a xylanase of Ruminococcus. The fourth specifically exemplified xylanase is a protein encoded within the genome of Orpinomyces PC-2. The amino acids of these polypeptides and nucleotide sequences encoding them are provided. Recombinant host cells, expression vectors and methods for the recombinant production of phenolic acid esterases are also provided.

  12. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-07-21

    A method is disclosed for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe. 11 figs.

  13. Method for identifying and quantifying nucleic acid sequence aberrations

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method for detecting nucleic acid sequence aberrations by detecting nucleic acid sequences having both a first and a second nucleic acid sequence type, the presence of the first and second sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. The method uses a first hybridization probe which includes a nucleic acid sequence that is complementary to a first sequence type and a first complexing agent capable of attaching to a second complexing agent and a second hybridization probe which includes a nucleic acid sequence that selectively hybridizes to the second nucleic acid sequence type over the first sequence type and includes a detectable marker for detecting the second hybridization probe.

  14. Structural gene and complete amino acid sequence of Vibrio alginolyticus collagenase.

    PubMed Central

    Takeuchi, H; Shibano, Y; Morihara, K; Fukushima, J; Inami, S; Keil, B; Gilles, A M; Kawamoto, S; Okuda, K

    1992-01-01

    The DNA encoding the collagenase of Vibrio alginolyticus was cloned, and its complete nucleotide sequence was determined. When the cloned gene was ligated to pUC18, the Escherichia coli expression vector, bacteria carrying the gene exhibited both collagenase antigen and collagenase activity. The open reading frame from the ATG initiation codon was 2442 bp in length for the collagenase structural gene. The amino acid sequence, deduced from the nucleotide sequence, revealed that the mature collagenase consists of 739 amino acids with an Mr of 81875. The amino acid sequences of 20 polypeptide fragments were completely identical with the deduced amino acid sequences of the collagenase gene. The amino acid composition predicted from the DNA sequence was similar to the chemically determined composition of purified collagenase reported previously. The analyses of both the DNA and amino acid sequences of the collagenase gene were rigorously performed, but we could not detect any significant sequence similarity to other collagenases. Images Fig. 2. PMID:1311172

  15. Extensive amino acid sequence homologies between animal lectins

    SciTech Connect

    Paroutaud, P.; Levi, G.; Teichberg, V.I.; Strosberg, A.D.

    1987-09-01

    The authors have established the amino acid sequence of the ..beta..-D-galactoside binding lectin from the electric eel and the sequences of several peptides from a similar lectin isolated from human placenta. These sequences were compared with the published sequences of peptides derived from the ..beta..-D-galactoside binding lectin from human lung and with sequences deduced from cDNAs assigned to the ..beta..-D-galactoside binding lectins from chicken embryo skin and human hepatomas. Significant homologies were observed. One of the highly conserved regions that contains a tryptophan residue and two glutamic acid resides is probably part of the ..beta..-D-galactoside binding site, which, on the basis of spectroscopic studies of the electric eel lectin, is expected to contain such residues. The similarity of the hydropathy profiles and the predicted secondary structure of the lectins from chicken skin and electric eel, in spite of differences in their amino acid sequences, strongly suggests that these proteins have maintained structural homologies during evolution and together with the other ..beta..-D-galactoside binding lectins were derived form a common ancestor gene.

  16. Methods for analyzing nucleic acid sequences

    DOEpatents

    Korlach, Jonas; Webb, Watt W.; Levene, Michael; Turner, Stephen; Craighead, Harold G.; Foquet, Mathieu

    2011-05-17

    The present invention is directed to a method of sequencing a target nucleic acid. The method provides a complex comprising a polymerase enzyme, a target nucleic acid molecule, and a primer, wherein the complex is immobilized on a support Fluorescent label is attached to a terminal phosphate group of the nucleotide or nucleotide analog. The growing nucleic acid strand is extended by using the polymerase to add a nucleotide analog to the nucleic acid strand. The nucleotide analog added to the oligonucleotide primer as a result of the polymerizing step is identified. The time duration of the signal from labeled nucleotides or nucleotide analogs that become incorporated is distinguished from freely diffusing labels by a longer retention in the observation volume for the nucleotides or nucleotide analogs that become incorporated than for the freely diffusing labels.

  17. Predictive uncertainty in auditory sequence processing

    PubMed Central

    Hansen, Niels Chr.; Pearce, Marcus T.

    2014-01-01

    Previous studies of auditory expectation have focused on the expectedness perceived by listeners retrospectively in response to events. In contrast, this research examines predictive uncertainty—a property of listeners' prospective state of expectation prior to the onset of an event. We examine the information-theoretic concept of Shannon entropy as a model of predictive uncertainty in music cognition. This is motivated by the Statistical Learning Hypothesis, which proposes that schematic expectations reflect probabilistic relationships between sensory events learned implicitly through exposure. Using probability estimates from an unsupervised, variable-order Markov model, 12 melodic contexts high in entropy and 12 melodic contexts low in entropy were selected from two musical repertoires differing in structural complexity (simple and complex). Musicians and non-musicians listened to the stimuli and provided explicit judgments of perceived uncertainty (explicit uncertainty). We also examined an indirect measure of uncertainty computed as the entropy of expectedness distributions obtained using a classical probe-tone paradigm where listeners rated the perceived expectedness of the final note in a melodic sequence (inferred uncertainty). Finally, we simulate listeners' perception of expectedness and uncertainty using computational models of auditory expectation. A detailed model comparison indicates which model parameters maximize fit to the data and how they compare to existing models in the literature. The results show that listeners experience greater uncertainty in high-entropy musical contexts than low-entropy contexts. This effect is particularly apparent for inferred uncertainty and is stronger in musicians than non-musicians. Consistent with the Statistical Learning Hypothesis, the results suggest that increased domain-relevant training is associated with an increasingly accurate cognitive model of probabilistic structure in music. PMID:25295018

  18. SeqAPASS: Sequence alignment to predict across-species ...

    EPA Pesticide Factsheets

    Efforts to shift the toxicity testing paradigm from whole organism studies to those focused on the initiation of toxicity and relevant pathways have led to increased utilization of in vitro and in silico methods. Hence the emergence of high through-put screening (HTS) programs, such as U.S. EPA ToxCast, and application of the adverse outcome pathway (AOP) framework for identifying and defining biological key events triggered upon perturbation of molecular initiating events and leading to adverse outcomes occuring at a level of organization relevant for risk assessment [1]. With these recent initiatives to harness the power of “the pathway” in describing and evaluating toxicity comes the need to extrapolate data beyond the model species. Sequence alignment to predict across-species susceptibilty (SeqAPASS) is a web-based tool that allows the user to begin to understand how broadly HTS data or AOP constructs may plausibly be extrapolated across species, while describing the relative intrinsic susceptibiltiy of different taxa to chemicals with known modes of action (e.g., pharmaceuticals and pesticides). The tool rapidly and strategically assesses available molecular target information to describe protein sequence similarity at the primary amino acid sequence, conserved domain, and individual amino acid residue levels. This in silico approach to species extrapolation was designed to automate and streamline the relatively complex and time-consuming process of co

  19. 77 FR 65537 - Requirements for Patent Applications Containing Nucleotide Sequence and/or Amino Acid Sequence...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... Amino Acid Sequence Disclosures ACTION: Proposed collection; comment request. SUMMARY: The United States....'' SUPPLEMENTARY INFORMATION: I. Abstract Patent applications that contain nucleotide and/or amino acid...

  20. Amino acid sequence and comparative antigenicity of chicken metallothionein.

    PubMed Central

    McCormick, C C; Fullmer, C S; Garvey, J S

    1988-01-01

    The complete amino acid sequence of metallothionein (MT) from chicken liver is reported. The primary structure was determined by automated sequence analysis of peptides produced by limited acid hydrolysis and by trypsin digestion. The comparative antigenicity of chicken MT was determined by radioimmunoassay using rabbit anti-rat MT polyclonal antibody. Chicken MT consists of 63 amino acids as compared to 61 found in MTs from mammals. One insertion (and two substitutions) occurs in the amino-terminal region, a region considered invariant among mammalian MTs. Eighteen of the 20 cysteines in chicken MT were aligned with cysteines from other mammalian sequences. Two cysteines near the carboxyl terminus are shifted by one residue due to the insertion of proline in that region. Overall, the chicken protein showed approximately equal to 68% sequence identity in a comparison with various mammalian MTs. The affinity of the polyclonal antibody for chicken MT was decreased by 2 orders of magnitude in comparison to that of a mammalian MT (rat MT isoforms). This reduced affinity is attributed to major substitutions in chicken MT in the regions of the principal determinants of mammalian MTs. Theoretical analysis of the primary structure predicted the secondary structure to consist of reverse turns and random coils with no stable beta or helix conformations. There is no evidence that chicken MT differs functionally from mammalian MTs. PMID:2448773

  1. Sequence-Based Prediction of Type III Secreted Proteins

    PubMed Central

    Arnold, Roland; Brandmaier, Stefan; Kleine, Frederick; Tischler, Patrick; Heinz, Eva; Behrens, Sebastian; Niinikoski, Antti; Mewes, Hans-Werner; Horn, Matthias; Rattei, Thomas

    2009-01-01

    The type III secretion system (TTSS) is a key mechanism for host cell interaction used by a variety of bacterial pathogens and symbionts of plants and animals including humans. The TTSS represents a molecular syringe with which the bacteria deliver effector proteins directly into the host cell cytosol. Despite the importance of the TTSS for bacterial pathogenesis, recognition and targeting of type III secreted proteins has up until now been poorly understood. Several hypotheses are discussed, including an mRNA-based signal, a chaperon-mediated process, or an N-terminal signal peptide. In this study, we systematically analyzed the amino acid composition and secondary structure of N-termini of 100 experimentally verified effector proteins. Based on this, we developed a machine-learning approach for the prediction of TTSS effector proteins, taking into account N-terminal sequence features such as frequencies of amino acids, short peptides, or residues with certain physico-chemical properties. The resulting computational model revealed a strong type III secretion signal in the N-terminus that can be used to detect effectors with sensitivity of ∼71% and selectivity of ∼85%. This signal seems to be taxonomically universal and conserved among animal pathogens and plant symbionts, since we could successfully detect effector proteins if the respective group was excluded from training. The application of our prediction approach to 739 complete bacterial and archaeal genome sequences resulted in the identification of between 0% and 12% putative TTSS effector proteins. Comparison of effector proteins with orthologs that are not secreted by the TTSS showed no clear pattern of signal acquisition by fusion, suggesting convergent evolutionary processes shaping the type III secretion signal. The newly developed program EffectiveT3 (http://www.chlamydiaedb.org) is the first universal in silico prediction program for the identification of novel TTSS effectors. Our findings will

  2. Detection of nucleic acid sequences by invader-directed cleavage

    DOEpatents

    Brow, Mary Ann D.; Hall, Jeff Steven Grotelueschen; Lyamichev, Victor; Olive, David Michael; Prudent, James Robert

    1999-01-01

    The present invention relates to means for the detection and characterization of nucleic acid sequences, as well as variations in nucleic acid sequences. The present invention also relates to methods for forming a nucleic acid cleavage structure on a target sequence and cleaving the nucleic acid cleavage structure in a site-specific manner. The 5' nuclease activity of a variety of enzymes is used to cleave the target-dependent cleavage structure, thereby indicating the presence of specific nucleic acid sequences or specific variations thereof. The present invention further relates to methods and devices for the separation of nucleic acid molecules based by charge.

  3. Quantitative assessment of protein function prediction from metagenomics shotgun sequences.

    PubMed

    Harrington, E D; Singh, A H; Doerks, T; Letunic, I; von Mering, C; Jensen, L J; Raes, J; Bork, P

    2007-08-28

    To assess the potential of protein function prediction in environmental genomics data, we analyzed shotgun sequences from four diverse and complex habitats. Using homology searches as well as customized gene neighborhood methods that incorporate intergenic and evolutionary distances, we inferred specific functions for 76% of the 1.4 million predicted ORFs in these samples (83% when nonspecific functions are considered). Surprisingly, these fractions are only slightly smaller than the corresponding ones in completely sequenced genomes (83% and 86%, respectively, by using the same methodology) and considerably higher than previously thought. For as many as 75,448 ORFs (5% of the total), only neighborhood methods can assign functions, illustrated here by a previously undescribed gene associated with the well characterized heme biosynthesis operon and a potential transcription factor that might regulate a coupling between fatty acid biosynthesis and degradation. Our results further suggest that, although functions can be inferred for most proteins on earth, many functions remain to be discovered in numerous small, rare protein families.

  4. Selecting sequence variants to improve genomic predictions for dairy cattle

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Millions of genetic variants have been identified by population-scale sequencing projects, but subsets are needed for routine genomic predictions or to include on genotyping arrays. Methods of selecting sequence variants were compared using both simulated sequence genotypes and actual data from run ...

  5. Los Alamos sequence analysis package for nucleic acids and proteins.

    PubMed Central

    Kanehisa, M I

    1982-01-01

    An interactive system for computer analysis of nucleic acid and protein sequences has been developed for the Los Alamos DNA Sequence Database. It provides a convenient way to search or verify various sequence features, e.g., restriction enzyme sites, protein coding frames, and properties of coded proteins. Further, the comprehensive analysis package on a large-scale database can be used for comparative studies on sequence and structural homologies in order to find unnoted information stored in nucleic acid sequences. PMID:6174934

  6. Gene and translation initiation site prediction in metagenomic sequences

    SciTech Connect

    Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John; Uberbacher, Edward C

    2012-01-01

    Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translation initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.

  7. Hybridization and sequencing of nucleic acids using base pair mismatches

    DOEpatents

    Fodor, Stephen P. A.; Lipshutz, Robert J.; Huang, Xiaohua

    2001-01-01

    Devices and techniques for hybridization of nucleic acids and for determining the sequence of nucleic acids. Arrays of nucleic acids are formed by techniques, preferably high resolution, light-directed techniques. Positions of hybridization of a target nucleic acid are determined by, e.g., epifluorescence microscopy. Devices and techniques are proposed to determine the sequence of a target nucleic acid more efficiently and more quickly through such synthesis and detection techniques.

  8. Predictability affects the perception of audiovisual synchrony in complex sequences.

    PubMed

    Cook, Laura A; Van Valkenburg, David L; Badcock, David R

    2011-10-01

    The ability to make accurate audiovisual synchrony judgments is affected by the "complexity" of the stimuli: We are much better at making judgments when matching single beeps or flashes as opposed to video recordings of speech or music. In the present study, we investigated whether the predictability of sequences affects whether participants report that auditory and visual sequences appear to be temporally coincident. When we reduced their ability to predict both the next pitch in the sequence and the temporal pattern, we found that participants were increasingly likely to report that the audiovisual sequences were synchronous. However, when we manipulated pitch and temporal predictability independently, the same effect did not occur. By altering the temporal density (items per second) of the sequences, we further determined that the predictability effect occurred only in temporally dense sequences: If the sequences were slow, participants' responses did not change as a function of predictability. We propose that reduced predictability affects synchrony judgments by reducing the effective pitch and temporal acuity in perception of the sequences.

  9. Draft Genome Sequence of Cyanobacterium sp. Strain IPPAS B-1200 with a Unique Fatty Acid Composition

    PubMed Central

    Starikov, Alexander Y.; Usserbaeva, Aizhan A.; Sinetova, Maria A.; Sarsekeyeva, Fariza K.; Zayadan, Bolatkhan K.; Ustinova, Vera V.; Kupriyanova, Elena V.; Los, Dmitry A.

    2016-01-01

    Here, we report the draft genome of Cyanobacterium sp. IPPAS strain B-1200, isolated from Lake Balkhash, Kazakhstan, and characterized by the unique fatty acid composition of its membrane lipids, which are enriched with myristic and myristoleic acids. The approximate genome size is 3.4 Mb, and the predicted number of coding sequences is 3,119. PMID:27856596

  10. Selection of sequence variants to improve dairy cattle genomic predictions

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genomic prediction reliabilities improved when adding selected sequence variants from run 5 of the 1,000 bull genomes project. High density (HD) imputed genotypes for 26,970 progeny tested Holstein bulls were combined with sequence variants for 444 Holstein animals. The first test included 481,904 c...

  11. A Statistical Model of Protein Sequence Similarity and Function Similarity Reveals Overly-Specific Function Predictions

    PubMed Central

    Kolker, Eugene

    2009-01-01

    Background Predicting protein function from primary sequence is an important open problem in modern biology. Not only are there many thousands of proteins of unknown function, current approaches for predicting function must be improved upon. One problem in particular is overly-specific function predictions which we address here with a new statistical model of the relationship between protein sequence similarity and protein function similarity. Methodology Our statistical model is based on sets of proteins with experimentally validated functions and numeric measures of function specificity and function similarity derived from the Gene Ontology. The model predicts the similarity of function between two proteins given their amino acid sequence similarity measured by statistics from the BLAST sequence alignment algorithm. A novel aspect of our model is that it predicts the degree of function similarity shared between two proteins over a continuous range of sequence similarity, facilitating prediction of function with an appropriate level of specificity. Significance Our model shows nearly exact function similarity for proteins with high sequence similarity (bit score >244.7, e-value >1e−62, non-redundant NCBI protein database (NRDB)) and only small likelihood of specific function match for proteins with low sequence similarity (bit score <54.6, e-value <1e−05, NRDB). For sequence similarity ranges in between our annotation model shows an increasing relationship between function similarity and sequence similarity, but with considerable variability. We applied the model to a large set of proteins of unknown function, and predicted functions for thousands of these proteins ranging from general to very specific. We also applied the model to a data set of proteins with previously assigned, specific functions that were electronically based. We show that, on average, these prior function predictions are more specific (quite possibly overly-specific) compared to

  12. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2006-07-04

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  13. Methods and compositions for efficient nucleic acid sequencing

    DOEpatents

    Drmanac, Radoje

    2002-01-01

    Disclosed are novel methods and compositions for rapid and highly efficient nucleic acid sequencing based upon hybridization with two sets of small oligonucleotide probes of known sequences. Extremely large nucleic acid molecules, including chromosomes and non-amplified RNA, may be sequenced without prior cloning or subcloning steps. The methods of the invention also solve various current problems associated with sequencing technology such as, for example, high noise to signal ratios and difficult discrimination, attaching many nucleic acid fragments to a surface, preparing many, longer or more complex probes and labelling more species.

  14. Kit for detecting nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2001-01-01

    A kit is provided for detecting a target nucleic acid sequence in a sample, the kit comprising: a first hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the first hybridization probe including a first complexing agent for forming a binding pair with a second complexing agent; and a second hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the first hybridization probe does not selectively hybridize, the second hybridization probe including a detectable marker; a third hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a first portion of the target sequence, the third hybridization probe including the same detectable marker as the second hybridization probe; and a fourth hybridization probe which includes a nucleic acid sequence that is sufficiently complementary to selectively hybridize to a second portion of the target sequence to which the third hybridization probe does not selectively hybridize, the fourth hybridization probe including the first complexing agent for forming a binding pair with the second complexing agent; wherein the first and second hybridization probes are capable of simultaneously hybridizing to the target sequence and the third and fourth hybridization probes are capable of simultaneously hybridizing to the target sequence, the detectable marker is not present on the first or fourth hybridization probes and the first, second, third, and fourth hybridization probes each include a competitive nucleic acid sequence which is sufficiently complementary to a third portion of the target sequence that the competitive sequences of the first, second, third, and fourth hybridization probes compete with each other to hybridize to the third portion of the

  15. The influence of visual training on predicting complex action sequences.

    PubMed

    Cross, Emily S; Stadler, Waltraud; Parkinson, Jim; Schütz-Bosbach, Simone; Prinz, Wolfgang

    2013-02-01

    Linking observed and executable actions appears to be achieved by an action observation network (AON), comprising parietal, premotor, and occipitotemporal cortical regions of the human brain. AON engagement during action observation is thought to aid in effortless, efficient prediction of ongoing movements to support action understanding. Here, we investigate how the AON responds when observing and predicting actions we cannot readily reproduce before and after visual training. During pre- and posttraining neuroimaging sessions, participants watched gymnasts and wind-up toys moving behind an occluder and pressed a button when they expected each agent to reappear. Between scanning sessions, participants visually trained to predict when a subset of stimuli would reappear. Posttraining scanning revealed activation of inferior parietal, superior temporal, and cerebellar cortices when predicting occluded actions compared to perceiving them. Greater activity emerged when predicting untrained compared to trained sequences in occipitotemporal cortices and to a lesser degree, premotor cortices. The occipitotemporal responses when predicting untrained agents showed further specialization, with greater responses within body-processing regions when predicting gymnasts' movements and in object-selective cortex when predicting toys' movements. The results suggest that (1) select portions of the AON are recruited to predict the complex movements not easily mapped onto the observer's body and (2) greater recruitment of these AON regions supports prediction of less familiar sequences. We suggest that the findings inform both the premotor model of action prediction and the predictive coding account of AON function.

  16. Can computationally designed protein sequences improve secondary structure prediction?

    PubMed

    Bondugula, Rajkumar; Wallqvist, Anders; Lee, Michael S

    2011-05-01

    Computational sequence design methods are used to engineer proteins with desired properties such as increased thermal stability and novel function. In addition, these algorithms can be used to identify an envelope of sequences that may be compatible with a particular protein fold topology. In this regard, we hypothesized that sequence-property prediction, specifically secondary structure, could be significantly enhanced by using a large database of computationally designed sequences. We performed a large-scale test of this hypothesis with 6511 diverse protein domains and 50 designed sequences per domain. After analysis of the inherent accuracy of the designed sequences database, we realized that it was necessary to put constraints on what fraction of the native sequence should be allowed to change. With mutational constraints, accuracy was improved vs. no constraints, but the diversity of designed sequences, and hence effective size of the database, was moderately reduced. Overall, the best three-state prediction accuracy (Q(3)) that we achieved was nearly a percentage point improved over using a natural sequence database alone, well below the theoretical possibility for improvement of 8-10 percentage points. Furthermore, our nascent method was used to augment the state-of-the-art PSIPRED program by a percentage point.

  17. Secondary Structure Predictions for Long RNA Sequences Based on Inversion Excursions and MapReduce.

    PubMed

    Yehdego, Daniel T; Zhang, Boyu; Kodimala, Vikram K R; Johnson, Kyle L; Taufer, Michela; Leung, Ming-Ying

    2013-05-01

    Secondary structures of ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Experimental observations and computing limitations suggest that we can approach the secondary structure prediction problem for long RNA sequences by segmenting them into shorter chunks, predicting the secondary structures of each chunk individually using existing prediction programs, and then assembling the results to give the structure of the original sequence. The selection of cutting points is a crucial component of the segmenting step. Noting that stem-loops and pseudoknots always contain an inversion, i.e., a stretch of nucleotides followed closely by its inverse complementary sequence, we developed two cutting methods for segmenting long RNA sequences based on inversion excursions: the centered and optimized method. Each step of searching for inversions, chunking, and predictions can be performed in parallel. In this paper we use a MapReduce framework, i.e., Hadoop, to extensively explore meaningful inversion stem lengths and gap sizes for the segmentation and identify correlations between chunking methods and prediction accuracy. We show that for a set of long RNA sequences in the RFAM database, whose secondary structures are known to contain pseudoknots, our approach predicts secondary structures more accurately than methods that do not segment the sequence, when the latter predictions are possible computationally. We also show that, as sequences exceed certain lengths, some programs cannot computationally predict pseudoknots while our chunking methods can. Overall, our predicted structures still retain the accuracy level of the original prediction programs when compared with known experimental secondary structure.

  18. Solid phase sequencing of double-stranded nucleic acids

    DOEpatents

    Fu, Dong-Jing; Cantor, Charles R.; Koster, Hubert; Smith, Cassandra L.

    2002-01-01

    This invention relates to methods for detecting and sequencing of target double-stranded nucleic acid sequences, to nucleic acid probes and arrays of probes useful in these methods, and to kits and systems which contain these probes. Useful methods involve hybridizing the nucleic acids or nucleic acids which represent complementary or homologous sequences of the target to an array of nucleic acid probes. These probe comprise a single-stranded portion, an optional double-stranded portion and a variable sequence within the single-stranded portion. The molecular weights of the hybridized nucleic acids of the set can be determined by mass spectroscopy, and the sequence of the target determined from the molecular weights of the fragments. Nucleic acids whose sequences can be determined include nucleic acids in biological samples such as patient biopsies and environmental samples. Probes may be fixed to a solid support such as a hybridization chip to facilitate automated determination of molecular weights and identification of the target sequence.

  19. Analysis of cloned cDNA and genomic sequences for phytochrome: complete amino acid sequences for two gene products expressed in etiolated Avena.

    PubMed Central

    Hershey, H P; Barker, R F; Idler, K B; Lissemore, J L; Quail, P H

    1985-01-01

    Cloned cDNA and genomic sequences have been analyzed to deduce the amino acid sequence of phytochrome from etiolated Avena. Restriction endonuclease site polymorphism between clones indicates that at least four phytochrome genes are expressed in this tissue. Sequence analysis of two complete and one partial coding region shows approximately 98% homology at both the nucleotide and amino acid levels, with the majority of amino acid changes being conservative. High sequence homology is also found in the 5'-untranslated region but significant divergence occurs in the 3'-untranslated region. The phytochrome polypeptides are 1128 amino acid residues long corresponding to a molecular mass of 125 kdaltons. The known protein sequence at the chromophore attachment site occurs only once in the polypeptide, establishing that phytochrome has a single chromophore per monomer covalently linked to Cys-321. Computer analyses of the amino acid sequences have provided predictions regarding a number of structural features of the phytochrome molecule. PMID:3001642

  20. Delineation of modular proteins: domain boundary prediction from sequence information.

    PubMed

    Kong, Lesheng; Ranganathan, Shoba

    2004-06-01

    The delineation of domain boundaries of a given sequence in the absence of known 3D structures or detectable sequence homology to known domains benefits many areas in protein science, such as protein engineering, protein 3D structure determination and protein structure prediction. With the exponential growth of newly determined sequences, our ability to predict domain boundaries rapidly and accurately from sequence information alone is both essential and critical from the viewpoint of gene function annotation. Anyone attempting to predict domain boundaries for a single protein sequence is invariably confronted with a plethora of databases that contain boundary information available from the internet and a variety of methods for domain boundary prediction. How are these derived and how well do they work? What definition of 'domain' do they use? We will first clarify the different definitions of protein domains, and then describe the available public databases with domain boundary information. Finally, we will review existing domain boundary prediction methods and discuss their strengths and weaknesses.

  1. An Integrated Sequence-Structure Database incorporating matching mRNA sequence, amino acid sequence and protein three-dimensional structure data.

    PubMed Central

    Adzhubei, I A; Adzhubei, A A; Neidle, S

    1998-01-01

    We have constructed a non-homologous database, termed the Integrated Sequence-Structure Database (ISSD) which comprises the coding sequences of genes, amino acid sequences of the corresponding proteins, their secondary structure and straight phi,psi angles assignments, and polypeptide backbone coordinates. Each protein entry in the database holds the alignment of nucleotide sequence, amino acid sequence and the PDB three-dimensional structure data. The nucleotide and amino acid sequences for each entry are selected on the basis of exact matches of the source organism and cell environment. The current version 1.0 of ISSD is available on the WWW at http://www.protein.bio.msu.su/issd/ and includes 107 non-homologous mammalian proteins, of which 80 are human proteins. The database has been used by us for the analysis of synonymous codon usage patterns in mRNA sequences showing their correlation with the three-dimensional structure features in the encoded proteins. Possible ISSD applications include optimisation of protein expression, improvement of the protein structure prediction accuracy, and analysis of evolutionary aspects of the nucleotide sequence-protein structure relationship. PMID:9399866

  2. Learned spatiotemporal sequence recognition and prediction in primary visual cortex

    PubMed Central

    Gavornik, Jeffrey P.; Bear, Mark F.

    2014-01-01

    Learning to recognize and predict temporal sequences is fundamental to sensory perception, and is impaired in several neuropsychiatric disorders, but little is known about where and how this occurs in the brain. We discovered that repeated presentations of a visual sequence over a course of days causes evoked response potentiation in mouse V1 that is highly specific for stimulus order and timing. Remarkably, after V1 is trained to recognize a sequence, cortical activity regenerates the full sequence even when individual stimulus elements are omitted. This novel neurophysiological report of sequence learning advances the understanding of how the brain makes “intelligent guesses” based on limited information to form visual percepts and suggests that it is possible to study the mechanistic basis of this high–level cognitive ability by studying low–level sensory systems. PMID:24657967

  3. Amino acid sequence of mouse submaxillary gland renin.

    PubMed Central

    Misono, K S; Chang, J J; Inagami, T

    1982-01-01

    The complete amino acid sequences of the heavy chain and light chain of mouse submaxillary gland renin have been determined. The heavy chain consists of 288 amino acid residues having a Mr of 31,036 calculated from the sequence. The light chain contains 48 amino acid residues with a Mr of 5,458. The sequence of the heavy chain was determined by automated Edman degradations of the cyanogen bromide peptides and tryptic peptides generated after citraconylation, as well as other peptides generated therefrom. The sequence of the light chain was derived from sequence analyses of the peptides generated by cyanogen bromide cleavage or by digestion with Staphylococcus aureus protease. The sequences in the active site regions in renin containing two catalytically essential aspartyl residues 32 and 215 were found identical with those in pepsin, chymosin, and penicillopepsin. Comparison of the amino acid sequence of renin with that of porcine pepsin indicated a 42% sequence identity of the heavy chain with the amino-terminal and middle regions and a 46% identity of the light chain with the carboxyl-terminal region of the porcine pepsin sequence. Residues identical in renin and pepsin are distributed throughout the length of the molecules, suggesting a similarity in their overall structures. PMID:6812055

  4. Prediction and prioritization of neoantigens: integration of RNA sequencing data with whole-exome sequencing.

    PubMed

    Karasaki, Takahiro; Nagayama, Kazuhiro; Kuwano, Hideki; Nitadori, Jun-Ichi; Sato, Masaaki; Anraku, Masaki; Hosoi, Akihiro; Matsushita, Hirokazu; Takazawa, Masaki; Ohara, Osamu; Nakajima, Jun; Kakimi, Kazuhiro

    2017-02-01

    The importance of neoantigens for cancer immunity is now well-acknowledged. However, there are diverse strategies for predicting and prioritizing candidate neoantigens, and thus reported neoantigen loads vary a great deal. To clarify this issue, we compared the numbers of neoantigen candidates predicted by four currently utilized strategies. Whole-exome sequencing and RNA sequencing (RNA-Seq) of four non-small-cell lung cancer patients was carried out. We identified 361 somatic missense mutations from which 224 candidate neoantigens were predicted using MHC class I binding affinity prediction software (strategy I). Of these, 207 exceeded the set threshold of gene expression (fragments per kilobase of transcript per million fragments mapped ≥1), resulting in 124 candidate neoantigens (strategy II). To verify mutant mRNA expression, sequencing of amplicons from tumor cDNA including each mutation was undertaken; 204 of the 207 mutations were successfully sequenced, yielding 121 mutant mRNA sequences, resulting in 75 candidate neoantigens (strategy III). Sequence information was extracted from RNA-Seq to confirm the presence of mutated mRNA. Variant allele frequencies ≥0.04 in RNA-Seq were found for 117 of the 207 mutations and regarded as expressed in the tumor, and finally, 72 candidate neoantigens were predicted (strategy IV). Without additional amplicon sequencing of cDNA, strategy IV was comparable to strategy III. We therefore propose strategy IV as a practical and appropriate strategy to predict candidate neoantigens fully utilizing currently available information. It is of note that different neoantigen loads were deduced from the same tumors depending on the strategies applied.

  5. A Fast Algorithm for Exonic Regions Prediction in DNA Sequences

    PubMed Central

    Saberkari, Hamidreza; Shamsi, Mousa; Heravi, Hamed; Sedaaghi, Mohammad Hossein

    2013-01-01

    The main purpose of this paper is to introduce a fast method for gene prediction in DNA sequences based on the period-3 property in exons. First, the symbolic DNA sequences were converted to digital signal using the electron ion interaction potential method. Then, to reduce the effect of background noise in the period-3 spectrum, we used the discrete wavelet transform at three levels and applied it on the input digital signal. Finally, the Goertzel algorithm was used to extract period-3 components in the filtered DNA sequence. The proposed algorithm leads to decrease the computational complexity and hence, increases the speed of the process. Detection of small size exons in DNA sequences, exactly, is another advantage of the algorithm. The proposed algorithm ability in exon prediction was compared with several existing methods at the nucleotide level using: (i) specificity - sensitivity values; (ii) receiver operating curves (ROC); and (iii) area under ROC curve. Simulation results confirmed that the proposed method can be used as a promising tool for exon prediction in DNA sequences. PMID:24672762

  6. Amino Acid Sequence of Human Cholinesterase

    DTIC Science & Technology

    1985-10-01

    liquid chromatography (HPLC). Activity testing of the aged, DFP-labeled cholinesterase showed that 99.8% of the active sites had been labeled, since...acids were quantitated by ninhydrin at the AAA Labs, or by derivatization with phenylisothiocyanate at the University of Michigan. The latter method

  7. Cystatin. Amino acid sequence and possible secondary structure.

    PubMed Central

    Schwabe, C; Anastasi, A; Crow, H; McDonald, J K; Barrett, A J

    1984-01-01

    The amino acid sequence of cystatin, the protein from chicken egg-white that is a tight-binding inhibitor of many cysteine proteinases, is reported. Cystatin is composed of 116 amino acid residues, and the Mr is calculated to be 13 143. No striking similarity to any other known sequence has been detected. The results of computer analysis of the sequence and c.d. spectrometry indicate that the secondary structure includes relatively little alpha-helix (about 20%) and that the remainder is mainly beta-structure. PMID:6712597

  8. Mouse Vk gene classification by nucleic acid sequence similarity.

    PubMed

    Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

    1989-01-01

    Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.

  9. Draft Genome Sequence of Bacillus coagulans NL01, a Wonderful l-Lactic Acid Producer

    PubMed Central

    Zheng, Zhaojuan; Jiang, Ting; Lin, Xi; Zhou, Jie

    2015-01-01

    Here, we report the draft genome sequence of Bacillus coagulans NL01, which could produce high optically pure l-lactic acid using xylose as a sole carbon source. The draft genome is 3,505,081 bp, with 144 contigs. About 3,903 protein-coding genes and 92 rRNAs are predicted from this assembly. PMID:26089419

  10. Shark myelin basic protein: amino acid sequence, secondary structure, and self-association.

    PubMed

    Milne, T J; Atkins, A R; Warren, J A; Auton, W P; Smith, R

    1990-09-01

    Myelin basic protein (MBP) from the Whaler shark (Carcharhinus obscurus) has been purified from acid extracts of a chloroform/methanol pellet from whole brains. The amino acid sequence of the majority of the protein has been determined and compared with the sequences of other MBPs. The shark protein has only 44% homology with the bovine protein, but, in common with other MBPs, it has basic residues distributed throughout the sequence and no extensive segments that are predicted to have an ordered secondary structure in solution. Shark MBP lacks the triproline sequence previously postulated to form a hairpin bend in the molecule. The region containing the putative consensus sequence for encephalitogenicity in the guinea pig contains several substitutions, thus accounting for the lack of activity of the shark protein. Studies of the secondary structure and self-association have shown that shark MBP possesses solution properties similar to those of the bovine protein, despite the extensive differences in primary structure.

  11. QGRS-H Predictor: a web server for predicting homologous quadruplex forming G-rich sequence motifs in nucleotide sequences

    PubMed Central

    Menendez, Camille; Frees, Scott; Bagga, Paramjeet S.

    2012-01-01

    Naturally occurring G-quadruplex structural motifs, formed by guanine-rich nucleic acids, have been reported in telomeric, promoter and transcribed regions of mammalian genomes. G-quadruplex structures have received significant attention because of growing evidence for their role in important biological processes, human disease and as therapeutic targets. Lately, there has been much interest in the potential roles of RNA G-quadruplexes as cis-regulatory elements of post-transcriptional gene expression. Large-scale computational genomics studies on G-quadruplexes have difficulty validating their predictions without laborious testing in ‘wet’ labs. We have developed a bioinformatics tool, QGRS-H Predictor that can map and analyze conserved putative Quadruplex forming 'G'-Rich Sequences (QGRS) in mRNAs, ncRNAs and other nucleotide sequences, e.g. promoter, telomeric and gene flanking regions. Identifying conserved regulatory motifs helps validate computations and enhances accuracy of predictions. The QGRS-H Predictor is particularly useful for mapping homologous G-quadruplex forming sequences as cis-regulatory elements in the context of 5′- and 3′-untranslated regions, and CDS sections of aligned mRNA sequences. QGRS-H Predictor features highly interactive graphic representation of the data. It is a unique and user-friendly application that provides many options for defining and studying G-quadruplexes. The QGRS-H Predictor can be freely accessed at: http://quadruplex.ramapo.edu/qgrs/app/start. PMID:22576365

  12. Prediction of Functional Class of Proteins and Peptides Irrespective of Sequence Homology by Support Vector Machines

    PubMed Central

    Tang, Zhi Qun; Lin, Hong Huang; Zhang, Hai Lei; Han, Lian Yi; Chen, Xin; Chen, Yu Zong

    2007-01-01

    Various computational methods have been used for the prediction of protein and peptide function based on their sequences. A particular challenge is to derive functional properties from sequences that show low or no homology to proteins of known function. Recently, a machine learning method, support vector machines (SVM), have been explored for predicting functional class of proteins and peptides from amino acid sequence derived properties independent of sequence similarity, which have shown promising potential for a wide spectrum of protein and peptide classes including some of the low- and non-homologous proteins. This method can thus be explored as a potential tool to complement alignment-based, clustering-based, and structure-based methods for predicting protein function. This article reviews the strategies, current progresses, and underlying difficulties in using SVM for predicting the functional class of proteins. The relevant software and web-servers are described. The reported prediction performances in the application of these methods are also presented. PMID:20066123

  13. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences

    PubMed Central

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D.; Adir, Noam

    2016-01-01

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel. PMID:27307442

  14. Amino acid sequence repertoire of the bacterial proteome and the occurrence of untranslatable sequences.

    PubMed

    Navon, Sharon Penias; Kornberg, Guy; Chen, Jin; Schwartzman, Tali; Tsai, Albert; Puglisi, Elisabetta Viani; Puglisi, Joseph D; Adir, Noam

    2016-06-28

    Bioinformatic analysis of Escherichia coli proteomes revealed that all possible amino acid triplet sequences occur at their expected frequencies, with four exceptions. Two of the four underrepresented sequences (URSs) were shown to interfere with translation in vivo and in vitro. Enlarging the URS by a single amino acid resulted in increased translational inhibition. Single-molecule methods revealed stalling of translation at the entrance of the peptide exit tunnel of the ribosome, adjacent to ribosomal nucleotides A2062 and U2585. Interaction with these same ribosomal residues is involved in regulation of translation by longer, naturally occurring protein sequences. The E. coli exit tunnel has evidently evolved to minimize interaction with the exit tunnel and maximize the sequence diversity of the proteome, although allowing some interactions for regulatory purposes. Bioinformatic analysis of the human proteome revealed no underrepresented triplet sequences, possibly reflecting an absence of regulation by interaction with the exit tunnel.

  15. Learning to predict: Exposure to temporal sequences facilitates prediction of future events

    PubMed Central

    Baker, Rosalind; Dexter, Matthew; Hardwicke, Tom E.; Goldstone, Aimee; Kourtzi, Zoe

    2014-01-01

    Previous experience is thought to facilitate our ability to extract spatial and temporal regularities from cluttered scenes. However, little is known about how we may use this knowledge to predict future events. Here we test whether exposure to temporal sequences facilitates the visual recognition of upcoming stimuli. We presented observers with a sequence of leftwards and rightwards oriented gratings that was interrupted by a test stimulus. Observers were asked to indicate whether the orientation of the test stimulus matched their expectation based on the preceding sequence. Our results demonstrate that exposure to temporal sequences without feedback facilitates our ability to predict an upcoming stimulus. In particular, observers’ performance improved following exposure to structured but not random sequences. Improved performance lasted for a prolonged period and generalized to untrained stimulus orientations rather than sequences of different global structure, suggesting that observers acquire knowledge of the sequence structure rather than its items. Further, this learning was compromised when observers performed a dual task resulting in increased attentional load. These findings suggest that exposure to temporal regularities in a scene allows us to accumulate knowledge about its global structure and predict future events. PMID:24231115

  16. Improved nucleic acid descriptors for siRNA efficacy prediction

    PubMed Central

    Sciabola, Simone; Cao, Qing; Orozco, Modesto; Faustino, Ignacio; Stanton, Robert V.

    2013-01-01

    Although considerable progress has been made recently in understanding how gene silencing is mediated by the RNAi pathway, the rational design of effective sequences is still a challenging task. In this article, we demonstrate that including three-dimensional descriptors improved the discrimination between active and inactive small interfering RNAs (siRNAs) in a statistical model. Five descriptor types were used: (i) nucleotide position along the siRNA sequence, (ii) nucleotide composition in terms of presence/absence of specific combinations of di- and trinucleotides, (iii) nucleotide interactions by means of a modified auto- and cross-covariance function, (iv) nucleotide thermodynamic stability derived by the nearest neighbor model representation and (v) nucleic acid structure flexibility. The duplex flexibility descriptors are derived from extended molecular dynamics simulations, which are able to describe the sequence-dependent elastic properties of RNA duplexes, even for non-standard oligonucleotides. The matrix of descriptors was analysed using three statistical packages in R (partial least squares, random forest, and support vector machine), and the most predictive model was implemented in a modeling tool we have made publicly available through SourceForge. Our implementation of new RNA descriptors coupled with appropriate statistical algorithms resulted in improved model performance for the selection of siRNA candidates when compared with publicly available siRNA prediction tools and previously published test sets. Additional validation studies based on in-house RNA interference projects confirmed the robustness of the scoring procedure in prospective studies. PMID:23241392

  17. Amino acid sequences of proteins from Leptospira serovar pomona.

    PubMed

    Alves, S F; Lefebvre, R B; Probert, W

    2000-01-01

    This report describes a partial amino acid sequences from three putative outer envelope proteins from Leptospira serovar pomona. In order to obtain internal fragments for protein sequencing, enzymatic and chemical digestion was performed. The enzyme clostripain was used to digest the proteins 32 and 45 kDa. In situ digestion of 40 kDa molecular weight protein was accomplished using cyanogen bromide. The 32 kDa protein generated two fragments, one of 21 kDa and another of 10 kDa that yielded five residues. A fragment of 24 kDa that yielded nineteen residues of amino acids was obtained from 45 kDa protein. A fragment with a molecular weight of 20 kDa, yielding a twenty amino acids sequence from the 40 kDa protein.

  18. Translating HIV sequences into quantitative fitness landscapes predicts viral vulnerabilities for rational immunogen design.

    PubMed

    Ferguson, Andrew L; Mann, Jaclyn K; Omarjee, Saleha; Ndung'u, Thumbi; Walker, Bruce D; Chakraborty, Arup K

    2013-03-21

    A prophylactic or therapeutic vaccine offers the best hope to curb the HIV-AIDS epidemic gripping sub-Saharan Africa, but it remains elusive. A major challenge is the extreme viral sequence variability among strains. Systematic means to guide immunogen design for highly variable pathogens like HIV are not available. Using computational models, we have developed an approach to translate available viral sequence data into quantitative landscapes of viral fitness as a function of the amino acid sequences of its constituent proteins. Predictions emerging from our computationally defined landscapes for the proteins of HIV-1 clade B Gag were positively tested against new in vitro fitness measurements and were consistent with previously defined in vitro measurements and clinical observations. These landscapes chart the peaks and valleys of viral fitness as protein sequences change and inform the design of immunogens and therapies that can target regions of the virus most vulnerable to selection pressure.

  19. Translating HIV sequences into quantitative fitness landscapes predicts viral vulnerabilities for rational immunogen design

    PubMed Central

    Ferguson, Andrew L.; Mann, Jaclyn K.; Omarjee, Saleha; Ndung’u, Thumbi; Walker, Bruce D.; Chakraborty, Arup K.

    2013-01-01

    Summary A prophylactic or therapeutic vaccine offers the best hope to curb the HIV-AIDS epidemic gripping sub-Saharan Africa, but remains elusive. A major challenge is the extreme viral sequence variability among strains. Systematic means to guide immunogen design for highly variable pathogens like HIV are not available. Using computational models, we have developed an approach to translate available viral sequence data into quantitative landscapes of viral fitness as a function of the amino acid sequences of its constituent proteins. Predictions emerging from our computationally defined landscapes for the proteins of HIV-1 clade B Gag were positively tested against new in vitro fitness measurements, and were consistent with previously defined in vitro measurements and clinical observations. These landscapes chart the peaks and valleys of viral fitness as protein sequences change, and inform the design of immunogens and therapies that can target regions of the virus most vulnerable to selection pressure. PMID:23521886

  20. SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures.

    PubMed

    Suresh, V; Parthasarathy, S

    2014-01-01

    We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.

  1. Prediction of ribosome footprint profile shapes from transcript sequences

    PubMed Central

    Liu, Tzu-Yu; Song, Yun S.

    2016-01-01

    Motivation: Ribosome profiling is a useful technique for studying translational dynamics and quantifying protein synthesis. Applications of this technique have shown that ribosomes are not uniformly distributed along mRNA transcripts. Understanding how each transcript-specific distribution arises is important for unraveling the translation mechanism. Results: Here, we apply kernel smoothing to construct predictive features and build a sparse model to predict the shape of ribosome footprint profiles from transcript sequences alone. Our results on Saccharomyces cerevisiae data show that the marginal ribosome densities can be predicted with high accuracy. The proposed novel method has a wide range of applications, including inferring isoform-specific ribosome footprints, designing transcripts with fast translation speeds and discovering unknown modulation during translation. Availability and implementation: A software package called riboShape is freely available at https://sourceforge.net/projects/riboshape Contact: yss@berkeley.edu PMID:27307616

  2. Predicting sequences and structures of MHC-binding peptides: a computational combinatorial approach

    NASA Astrophysics Data System (ADS)

    Zeng, Jun; Treutlein, Herbert R.; Rudy, George B.

    2001-06-01

    Peptides bound to MHC molecules on the surface of cells convey critical information about the cellular milieu to immune system T cells. Predicting which peptides can bind an MHC molecule, and understanding their modes of binding, are important in order to design better diagnostic and therapeutic agents for infectious and autoimmune diseases. Due to the difficulty of obtaining sufficient experimental binding data for each human MHC molecule, computational modeling of MHC peptide-binding properties is necessary. This paper describes a computational combinatorial design approach to the prediction of peptides that bind an MHC molecule of known X-ray crystallographic or NMR-determined structure. The procedure uses chemical fragments as models for amino acid residues and produces a set of sequences for peptides predicted to bind in the MHC peptide-binding groove. The probabilities for specific amino acids occurring at each position of the peptide are calculated based on these sequences, and these probabilities show a good agreement with amino acid distributions derived from a MHC-binding peptide database. The method also enables prediction of the three-dimensional structure of MHC-peptide complexes. Docking, linking, and optimization procedures were performed with the XPLOR program [1].

  3. Amino acid sequence of porcine spleen cathepsin D.

    PubMed Central

    Shewale, J G; Tang, J

    1984-01-01

    The amino acid sequence of porcine spleen cathepsin D heavy chain has been determined and, hence, the complete structure of this enzyme is now known. The sequence of heavy chain was constructed by aligning the structures of peptides generated by cyanogen bromide, trypsin, and endo-proteinase Lys C cleavages. The structure of the light chain has been published previously. The cathepsin D molecule contains 339 amino acid residues in two polypeptide chains: a 97-residue light chain and a 242-residue heavy chain, with a combined Mr of 36,779 (without carbohydrate). There are two carbohydrate units linked to asparagine residues 70 and 192. The disulfide bond arrangement in cathepsin D is probably similar to that of pepsin, because the positions of six half-cystine residues are conserved. The active site aspartyl residues, corresponding to aspartic acid-32 and -215 of pepsin, are located at residues 33 and 224 in the cathepsin D molecule. The amino acid sequence around these aspartyl residues is strongly conserved. Cathepsin D shows a strong homology with other acid proteases. When the sequence of cathepsin D, renin, and pepsin are aligned, 32.7% of the residues are identical. The homology is observed throughout the length of the molecules, indicating that three-dimensional structures of all three molecules are similar. PMID:6587385

  4. Sequencing and computational analysis of complete genome sequences of Citrus yellow mosaic badna virus from acid lime and pummelo.

    PubMed

    Borah, Basanta K; Johnson, A M Anthony; Sai Gopal, D V R; Dasgupta, Indranil

    2009-08-01

    Citrus yellow mosaic badna virus (CMBV), a member of the Family Caulimoviridae, Genus Badnavirus, is the causative agent of Citrus mosaic disease in India. Although the virus has been detected in several citrus species, only two full-length genomes, one each from Sweet orange and Rangpur lime, are available in publicly accessible databases. In order to obtain a better understanding of the genetic variability of the virus in other citrus mosaic-affected citrus species, we performed the cloning and sequence analysis of complete genomes of CMBV from two additional citrus species, Acid lime and Pummelo. We show that CMBV genomes from the two hosts share high homology with previously reported CMBV sequences and hence conclude that the new isolates represent variants of the virus present in these species. Based on in silico sequence analysis, we predict the possible function of the protein encoded by one of the five ORFs.

  5. De Novo Structure Prediction of Globular Proteins Aided by Sequence Variation-Derived Contacts

    PubMed Central

    Kosciolek, Tomasz; Jones, David T.

    2014-01-01

    The advent of high accuracy residue-residue intra-protein contact prediction methods enabled a significant boost in the quality of de novo structure predictions. Here, we investigate the potential benefits of combining a well-established fragment-based folding algorithm – FRAGFOLD, with PSICOV, a contact prediction method which uses sparse inverse covariance estimation to identify co-varying sites in multiple sequence alignments. Using a comprehensive set of 150 diverse globular target proteins, up to 266 amino acids in length, we are able to address the effectiveness and some limitations of such approaches to globular proteins in practice. Overall we find that using fragment assembly with both statistical potentials and predicted contacts is significantly better than either statistical potentials or contacts alone. Results show up to nearly 80% of correct predictions (TM-score ≥0.5) within analysed dataset and a mean TM-score of 0.54. Unsuccessful modelling cases emerged either from conformational sampling problems, or insufficient contact prediction accuracy. Nevertheless, a strong dependency of the quality of final models on the fraction of satisfied predicted long-range contacts was observed. This not only highlights the importance of these contacts on determining the protein fold, but also (combined with other ensemble-derived qualities) provides a powerful guide as to the choice of correct models and the global quality of the selected model. A proposed quality assessment scoring function achieves 0.93 precision and 0.77 recall for the discrimination of correct folds on our dataset of decoys. These findings suggest the approach is well-suited for blind predictions on a variety of globular proteins of unknown 3D structure, provided that enough homologous sequences are available to construct a large and accurate multiple sequence alignment for the initial contact prediction step. PMID:24637808

  6. De novo structure prediction of globular proteins aided by sequence variation-derived contacts.

    PubMed

    Kosciolek, Tomasz; Jones, David T

    2014-01-01

    The advent of high accuracy residue-residue intra-protein contact prediction methods enabled a significant boost in the quality of de novo structure predictions. Here, we investigate the potential benefits of combining a well-established fragment-based folding algorithm--FRAGFOLD, with PSICOV, a contact prediction method which uses sparse inverse covariance estimation to identify co-varying sites in multiple sequence alignments. Using a comprehensive set of 150 diverse globular target proteins, up to 266 amino acids in length, we are able to address the effectiveness and some limitations of such approaches to globular proteins in practice. Overall we find that using fragment assembly with both statistical potentials and predicted contacts is significantly better than either statistical potentials or contacts alone. Results show up to nearly 80% of correct predictions (TM-score ≥0.5) within analysed dataset and a mean TM-score of 0.54. Unsuccessful modelling cases emerged either from conformational sampling problems, or insufficient contact prediction accuracy. Nevertheless, a strong dependency of the quality of final models on the fraction of satisfied predicted long-range contacts was observed. This not only highlights the importance of these contacts on determining the protein fold, but also (combined with other ensemble-derived qualities) provides a powerful guide as to the choice of correct models and the global quality of the selected model. A proposed quality assessment scoring function achieves 0.93 precision and 0.77 recall for the discrimination of correct folds on our dataset of decoys. These findings suggest the approach is well-suited for blind predictions on a variety of globular proteins of unknown 3D structure, provided that enough homologous sequences are available to construct a large and accurate multiple sequence alignment for the initial contact prediction step.

  7. Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.

    PubMed

    Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook

    2014-11-01

    As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of

  8. Active site amino acid sequence of human factor D.

    PubMed

    Davis, A E

    1980-08-01

    Factor D was isolated from human plasma by chromatography on CM-Sephadex C50, Sephadex G-75, and hydroxylapatite. Digestion of reduced, S-carboxymethylated factor D with cyanogen bromide resulted in three peptides which were isolated by chromatography on Sephadex G-75 (superfine) equilibrated in 20% formic acid. NH2-Terminal sequences were determined by automated Edman degradation with a Beckman 890C sequencer using a 0.1 M Quadrol program. The smallest peptide (CNBr III) consisted of the NH2-terminal 14 amino acids. The other two peptides had molecular weights of 17,000 (CNBr I) and 7000 (CNBr II). Overlap of the NH2-terminal sequence of factor D with the NH2-terminal sequence of CNBr I established the order of the peptides. The NH2-terminal 53 residues of factor D are somewhat more homologous with the group-specific protease of rat intestine than with other serine proteases. The NH2-terminal sequence of CNBr II revealed the active site serine of factor D. The typical serine protease active site sequence (Gly-Asp-Ser-Gly-Gly-Pro was found at residues 12-17. The region surrounding the active site serine does not appear to be more highly homologous with any one of the other serine proteases. The structural data obtained point out the similarities between factor D and the other proteases. However, complete definition of the degree of relationship between factor D and other proteases will require determination of the remainder of the primary structure.

  9. The amino acid sequence of iguana (Iguana iguana) pancreatic ribonuclease.

    PubMed

    Zhao, W; Beintema, J J; Hofsteenge, J

    1994-01-15

    The pyrimidine-specific ribonuclease superfamily constitutes a group of homologous proteins so far found only in higher vertebrates. Four separate families are found in mammals, which have resulted from gene duplications in mammalian ancestors. To learn more about the evolutionary history of this superfamily, the primary structure and other characteristics of the pancreatic enzyme from iguana (Iguana iguana), a herbivorous lizard species belonging to the reptiles, have been determined. The polypeptide chain consists of 119 amino acid residues. The positions of insertions and deletions in the sequence are identical to those in the enzyme from snapping turtle. However, the two enzymes differ at 54% of the amino acid positions. Iguana ribonuclease contains no carbohydrate, although the enzyme possesses three recognition sites for carbohydrate attachment, and has a high number of acidic residues in a localized part of the sequence.

  10. Development of a sugar-binding residue prediction system from protein sequences using support vector machine.

    PubMed

    Banno, Masaki; Komiyama, Yusuke; Cao, Wei; Oku, Yuya; Ueki, Kokoro; Sumikoshi, Kazuya; Nakamura, Shugo; Terada, Tohru; Shimizu, Kentaro

    2017-02-01

    Several methods have been proposed for protein-sugar binding site prediction using machine learning algorithms. However, they are not effective to learn various properties of binding site residues caused by various interactions between proteins and sugars. In this study, we classified sugars into acidic and nonacidic sugars and showed that their binding sites have different amino acid occurrence frequencies. By using this result, we developed sugar-binding residue predictors dedicated to the two classes of sugars: an acid sugar binding predictor and a nonacidic sugar binding predictor. We also developed a combination predictor which combines the results of the two predictors. We showed that when a sugar is known to be an acidic sugar, the acidic sugar binding predictor achieves the best performance, and showed that when a sugar is known to be a nonacidic sugar or is not known to be either of the two classes, the combination predictor achieves the best performance. Our method uses only amino acid sequences for prediction. Support vector machine was used as a machine learning algorithm and the position-specific scoring matrix created by the position-specific iterative basic local alignment search tool was used as the feature vector. We evaluated the performance of the predictors using five-fold cross-validation. We have launched our system, as an open source freeware tool on the GitHub repository (https://doi.org/10.5281/zenodo.61513).

  11. Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder

    PubMed Central

    Lorenzo, J. Ramiro; Alonso, Leonardo G.; Sánchez, Ignacio E.

    2015-01-01

    Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage “Protein and nucleic acid structure and sequence analysis”. PMID:26674530

  12. Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder.

    PubMed

    Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E

    2015-01-01

    Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".

  13. Prediction of protein function improving sequence remote alignment search by a fuzzy logic algorithm.

    PubMed

    Gómez, Antonio; Cedano, Juan; Espadaler, Jordi; Hermoso, Antonio; Piñol, Jaume; Querol, Enrique

    2008-02-01

    The functional annotation of the new protein sequences represents a major drawback for genomic science. The best way to suggest the function of a protein from its sequence is by finding a related one for which biological information is available. Current alignment algorithms display a list of protein sequence stretches presenting significant similarity to different protein targets, ordered by their respective mathematical scores. However, statistical and biological significance do not always coincide, therefore, the rearrangement of the program output according to more biological characteristics than the mathematical scoring would help functional annotation. A new method that predicts the putative function for the protein integrating the results from the PSI-BLAST program and a fuzzy logic algorithm is described. Several protein sequence characteristics have been checked in their ability to rearrange a PSI-BLAST profile according more to their biological functions. Four of them: amino acid content, matched segment length and hydropathic and flexibility profiles positively contributed, upon being integrated by a fuzzy logic algorithm into a program, BYPASS, to the accurate prediction of the function of a protein from its sequence.

  14. Amino acid sequence of bovine heart coupling factor 6.

    PubMed Central

    Fang, J K; Jacobs, J W; Kanner, B I; Racker, E; Bradshaw, R A

    1984-01-01

    The amino acid sequence of bovine heart mitochondrial coupling factor 6 (F6) has been determined by automated Edman degradation of the whole protein and derived peptides. Preparations based on heat precipitation and ethanol extraction showed allotypic variation at three positions while material further purified by HPLC yielded only one sequence that also differed by a Phe-Thr replacement at residue 62. The mature protein contains 76 amino acids with a calculated molecular weight of 9006 and a pI of approximately equal to 5, in good agreement with experimentally measured values. The charged amino acids are mainly clustered at the termini and in one section in the middle; these three polar segments are separated by two segments relatively rich in nonpolar residues. Chou-Fasman analysis suggests three stretches of alpha-helix coinciding (or within) the high-charge-density sequences with a single beta-turn at the first polar-nonpolar junction. Comparison of the F6 sequence with those of other proteins did not reveal any homologous structures. PMID:6149548

  15. Using next generation transcriptome sequencing to predict an ectomycorrhizal metabolome

    PubMed Central

    2011-01-01

    Background Mycorrhizae, symbiotic interactions between soil fungi and tree roots, are ubiquitous in terrestrial ecosystems. The fungi contribute phosphorous, nitrogen and mobilized nutrients from organic matter in the soil and in return the fungus receives photosynthetically-derived carbohydrates. This union of plant and fungal metabolisms is the mycorrhizal metabolome. Understanding this symbiotic relationship at a molecular level provides important contributions to the understanding of forest ecosystems and global carbon cycling. Results We generated next generation short-read transcriptomic sequencing data from fully-formed ectomycorrhizae between Laccaria bicolor and aspen (Populus tremuloides) roots. The transcriptomic data was used to identify statistically significantly expressed gene models using a bootstrap-style approach, and these expressed genes were mapped to specific metabolic pathways. Integration of expressed genes that code for metabolic enzymes and the set of expressed membrane transporters generates a predictive model of the ectomycorrhizal metabolome. The generated model of mycorrhizal metabolome predicts that the specific compounds glycine, glutamate, and allantoin are synthesized by L. bicolor and that these compounds or their metabolites may be used for the benefit of aspen in exchange for the photosynthetically-derived sugars fructose and glucose. Conclusions The analysis illustrates an approach to generate testable biological hypotheses to investigate the complex molecular interactions that drive ectomycorrhizal symbiosis. These models are consistent with experimental environmental data and provide insight into the molecular exchange processes for organisms in this complex ecosystem. The method used here for predicting metabolomic models of mycorrhizal systems from deep RNA sequencing data can be generalized and is broadly applicable to transcriptomic data derived from complex systems. PMID:21569493

  16. Sequences Of Amino Acids For Human Serum Albumin

    NASA Technical Reports Server (NTRS)

    Carter, Daniel C.

    1992-01-01

    Sequences of amino acids defined for use in making polypeptides one-third to one-sixth as large as parent human serum albumin molecule. Smaller, chemically stable peptides have diverse applications including service as artificial human serum and as active components of biosensors and chromatographic matrices. In applications involving production of artificial sera from new sequences, little or no concern about viral contaminants. Smaller genetically engineered polypeptides more easily expressed and produced in large quantities, making commercial isolation and production more feasible and profitable.

  17. Improving HIV coreceptor usage prediction in the clinic using hints from next-generation sequencing data

    PubMed Central

    Pfeifer, Nico; Lengauer, Thomas

    2012-01-01

    Motivation: Due to the high mutation rate of human immunodeficiency virus (HIV), drug-resistant-variants emerge frequently. Therefore, researchers are constantly searching for new ways to attack the virus. One new class of anti-HIV drugs is the class of coreceptor antagonists that block cell entry by occupying a coreceptor on CD4 cells. This type of drug just has an effect on the subset of HIVs that use the inhibited coreceptor. A good prediction of whether the viral population inside a patient is susceptible to the treatment is hence very important for therapy decisions and pre-requisite to administering the respective drug. The first prediction models were based on data from Sanger sequencing of the V3 loop of HIV. Recently, a method based on next-generation sequencing (NGS) data was introduced that predicts labels for each read separately and decides on the patient label through a percentage threshold for the resistant viral minority. Results: We model the prediction problem on the patient level taking the information of all reads from NGS data jointly into account. This enables us to improve prediction performance for NGS data, but we can also use the trained model to improve predictions based on Sanger sequencing data. Therefore, also laboratories without NGS capabilities can benefit from the improvements. Furthermore, we show which amino acids at which position are important for prediction success, giving clues on how the interaction mechanism between the V3 loop and the particular coreceptors might be influenced. Availability: A webserver is available at http://coreceptor.bioinf.mpi-inf.mpg.de. Contact: nico.pfeifer@mpi-inf.mpg.de PMID:22962486

  18. Nanopores and nucleic acids: prospects for ultrarapid sequencing

    NASA Technical Reports Server (NTRS)

    Deamer, D. W.; Akeson, M.

    2000-01-01

    DNA and RNA molecules can be detected as they are driven through a nanopore by an applied electric field at rates ranging from several hundred microseconds to a few milliseconds per molecule. The nanopore can rapidly discriminate between pyrimidine and purine segments along a single-stranded nucleic acid molecule. Nanopore detection and characterization of single molecules represents a new method for directly reading information encoded in linear polymers. If single-nucleotide resolution can be achieved, it is possible that nucleic acid sequences can be determined at rates exceeding a thousand bases per second.

  19. Interrogating noise in protein sequences from the perspective of protein-protein interactions prediction.

    PubMed

    Wang, Yongcui; Ren, Xianwen; Zhang, Chunhua; Deng, Naiyang; Zhang, Xiangsun

    2012-12-21

    The past decades witnessed extensive efforts to study the relationship among proteins. Particularly, sequence-based protein-protein interactions (PPIs) prediction is fundamentally important in speeding up the process of mapping interactomes of organisms. High-throughput experimental methodologies make many model organism's PPIs known, which allows us to apply machine learning methods to learn understandable rules from the available PPIs. Under the machine learning framework, the composition vectors are usually applied to encode proteins as real-value vectors. However, the composition vector value might be highly correlated to the distribution of amino acids, i.e., amino acids which are frequently observed in nature tend to have a large value of composition vectors. Thus formulation to estimate the noise induced by the background distribution of amino acids may be needed during representations. Here, we introduce two kinds of denoising composition vectors, which were successfully used in construction of phylogenetic trees, to eliminate the noise. When validating these two denoising composition vectors on Escherichia coli (E. coli), Saccharomyces cerevisiae (S. cerevisiae) and human PPIs datasets, surprisingly, the predictive performance is not improved, and even worse than non-denoised prediction. These results suggest that the noise in phylogenetic tree construction may be valuable information in PPIs prediction.

  20. Optimal coding of vectorcardiographic sequences using spatial prediction.

    PubMed

    Augustyniak, Piotr

    2007-05-01

    This paper discusses principles, implementation details, and advantages of sequence coding algorithm applied to the compression of vectocardiograms (VCG). The main novelty of the proposed method is the automatic management of distortion distribution controlled by the local signal contents in both technical and medical aspects. As in clinical practice, the VCG loops representing P, QRS, and T waves in the three-dimensional (3-D) space are considered here as three simultaneous sequences of objects. Because of the similarity of neighboring loops, encoding the values of prediction error significantly reduces the data set volume. The residual values are de-correlated with the discrete cosine transform (DCT) and truncated at certain energy threshold. The presented method is based on the irregular temporal distribution of medical data in the signal and takes advantage of variable sampling frequency for automatically detected VCG loops. The features of the proposed algorithm are confirmed by the results of the numerical experiment carried out for a wide range of real records. The average data reduction ratio reaches a value of 8.15 while the percent root-mean-square difference (PRD) distortion ratio for the most important sections of signal does not exceed 1.1%.

  1. A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features.

    PubMed

    Li, Liqi; Luo, Qifa; Xiao, Weidong; Li, Jinhui; Zhou, Shiwen; Li, Yongsheng; Zheng, Xiaoqi; Yang, Hua

    2017-02-01

    Palmitoylation is the covalent attachment of lipids to amino acid residues in proteins. As an important form of protein posttranslational modification, it increases the hydrophobicity of proteins, which contributes to the protein transportation, organelle localization, and functions, therefore plays an important role in a variety of cell biological processes. Identification of palmitoylation sites is necessary for understanding protein-protein interaction, protein stability, and activity. Since conventional experimental techniques to determine palmitoylation sites in proteins are both labor intensive and costly, a fast and accurate computational approach to predict palmitoylation sites from protein sequences is in urgent need. In this study, a support vector machine (SVM)-based method was proposed through integrating PSI-BLAST profile, physicochemical properties, [Formula: see text]-mer amino acid compositions (AACs), and [Formula: see text]-mer pseudo AACs into the principal feature vector. A recursive feature selection scheme was subsequently implemented to single out the most discriminative features. Finally, an SVM method was implemented to predict palmitoylation sites in proteins based on the optimal features. The proposed method achieved an accuracy of 99.41% and Matthews Correlation Coefficient of 0.9773 for a benchmark dataset. The result indicates the efficiency and accuracy of our method in prediction of palmitoylation sites based on protein sequences.

  2. Prediction of substrate specificity in NS3/4A serine protease by biased sequence search threading.

    PubMed

    Ozdemir Isik, Gonca; Ozer, A Nevra

    2017-04-01

    Proteases recognize specific substrate sequences and catalyze the hydrolysis of targeted peptide bonds to activate or degrade them. It is particularly important to identify the recognition and binding mechanisms of protease-substrate complex structures in studies of drug development. Cleavage specificity in protease systems is generally determined by the amino acid profile, structural features, and distinct molecular interactions. In this work, substrate variability and substrate specificity of the NS3/4A serine protease encoded by the hepatitis C virus (HCV) was investigated by the biased sequence search threading (BSST) methodology. The available crystal structures of peptide-bound protease were used as templates as well as new complex structures that were generated via docking calculations. Threading various binding and nonbinding sequences as starting sequences over multiple templates, the potential sequence space was efficiently explored by a low-resolution knowledge-based scoring potential. The low-energy substrate sequences generated by the biased search are correlated with the natural substrates with conserved amino acid preferences, although some positions exhibit variability. Specifically, the amino acids which play essential roles in cleavage are mostly preferred. Potential substrate sequences were predicted by statistical probability approaches that consider the pairwise and triplewise interdependencies among residue positions in the low-energy sequences. The predicted substrate sequences also reproduce most of the natural substrate sequences, implying the complex interdependence between the different substrate residues. Consequently, the BSST seems to provide a powerful methodology for predicting the substrate specificity for the NS3/4A protease, which is a target in drug discovery studies for HCV.

  3. Sequence-specific thermodynamic properties of nucleic acids influence both transcriptional pausing and backtracking in yeast

    PubMed Central

    2017-01-01

    RNA Polymerase II pauses and backtracks during transcription, with many consequences for gene expression and cellular physiology. Here, we show that the energy required to melt double-stranded nucleic acids in the transcription bubble predicts pausing in Saccharomyces cerevisiae far more accurately than nucleosome roadblocks do. In addition, the same energy difference also determines when the RNA polymerase backtracks instead of continuing to move forward. This data-driven model corroborates—in a genome wide and quantitative manner—previous evidence that sequence-dependent thermodynamic features of nucleic acids influence both transcriptional pausing and backtracking. PMID:28301878

  4. Genetic Prediction in the Genetic Analysis Workshop 18 Sequencing Data

    PubMed Central

    Ziegler, Andreas; Bohossian, Nora; Diego, Vincent P.; Yao, Chen

    2015-01-01

    High-throughput sequencing data can be used to predict phenotypes from genotypes, and this corresponds to establishing a prognostic model. In extended pedigrees the relatedness of subjects provides additional information so that genetic values, fixed or random genetic components, and heritability can be estimated. At the Genetic Analysis Workshop 18 the working group on genetic prediction dealt with both establishing a prognostic model and, in one contribution, comparing standard logistic regression with robust logistic regression in a sample of unrelated affected or unaffected individuals. Results of both logistic regression approaches were similar. All other contributions to this group used extended family data, in general using the quantitative trait blood pressure. The individual contributions varied in several important aspects, such as the estimation of the kinship matrix and the estimation method. Contributors chose various approaches for model validation, including different versions of cross-validation or within-family validation. Within-family validation included model building in the upper generations and validation in later generations. The choice of the statistical model and the computational algorithm had substantial effects on computation time. If decorrelation approaches were applied, the computational burden was substantially reduced. Some software packages estimated negative eigenvalues, although eigenvalues of correlation matrices should be nonnegative. Most statistical models and software packages have been developed for experimental crosses and planned breeding programs. With their specialized pedigree structures, they are not sufficiently flexible to accommodate the variability of human pedigrees in general, and improved implementations are required. PMID:25112190

  5. The complementary deoxyribonucleic acid sequence of guinea pig endometrial prorelaxin.

    PubMed

    Lee, Y A; Bryant-Greenwood, G D; Mandel, M; Greenwood, F C

    1992-03-01

    The nucleotide sequence of the relaxin gene transcript in the endometrium of the late pregnant guinea pig has been determined. The strategy used was a combination of polymerase chain reaction (PCR) with primers designed from the mRNA sequence of porcine preprorelaxin, rapid amplification of cDNA ends-PCR, and blunt end cloning in M13 mp18. With heterologous primers, a 226-basepair (bp) segment of the guinea pig relaxin gene sequence was obtained and was used to design a guinea pig-specific primer for use with the rapid amplification of cDNA ends-PCR method. The latter allowed completion of the sequence of 336 bp, with a 96-bp overlap. The sequence obtained shows greater homology at both the nucleotide and amino acid levels with porcine and human relaxins H1 and H2 than with rat relaxin, supporting the thesis that the guinea pig is not a rodent. The transcription of the guinea pig endometrial relaxin gene during pregnancy was confirmed by Northern analysis of guinea pig endometrial tissues with a species-specific cDNA probe. The endometrial relaxin gene is transcribed during pregnancy, but not in lactation, consistent with the observed immunostaining for relaxin.

  6. Quantum-Sequencing: Biophysics of quantum tunneling through nucleic acids

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    Tunneling microscopy and spectroscopy has extensively been used in physical surface sciences to study quantum tunneling to measure electronic local density of states of nanomaterials and to characterize adsorbed species. Quantum-Sequencing (Q-Seq) is a new method based on tunneling microscopy for electronic sequencing of single molecule of nucleic acids. A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free single-molecule sequencing method. Here, we present the unique ``electronic fingerprints'' for all nucleotides on DNA and RNA using Q-Seq along their intrinsic biophysical parameters. We have analyzed tunneling spectra for the nucleotides at different pH conditions and analyzed the HOMO, LUMO and energy gap for all of them. In addition we show a number of biophysical parameters to further characterize all nucleobases (electron and hole transition voltage and energy barriers). These results highlight the robustness of Q-Seq as a technique for next-generation sequencing.

  7. MitoFates: improved prediction of mitochondrial targeting sequences and their cleavage sites.

    PubMed

    Fukasawa, Yoshinori; Tsuji, Junko; Fu, Szu-Chin; Tomii, Kentaro; Horton, Paul; Imai, Kenichiro

    2015-04-01

    Mitochondria provide numerous essential functions for cells and their dysfunction leads to a variety of diseases. Thus, obtaining a complete mitochondrial proteome should be a crucial step toward understanding the roles of mitochondria. Many mitochondrial proteins have been identified experimentally but a complete list is not yet available. To fill this gap, methods to computationally predict mitochondrial proteins from amino acid sequence have been developed and are widely used, but unfortunately, their accuracy is far from perfect. Here we describe MitoFates, an improved prediction method for cleavable N-terminal mitochondrial targeting signals (presequences) and their cleavage sites. MitoFates introduces novel sequence features including positively charged amphiphilicity, presequence motifs, and position weight matrices modeling the presequence cleavage sites. These features are combined with classical ones such as amino acid composition and physico-chemical properties as input to a standard support vector machine classifier. On independent test data, MitoFates attains better performance than existing predictors in both detection of presequences and in predicting their cleavage sites. We used MitoFates to look for undiscovered mitochondrial proteins from 42,217 human proteins (including isoforms such as alternative splicing or translation initiation variants). MitoFates predicts 1167 genes to have at least one isoform with a presequence. Five-hundred and eighty of these genes were not annotated as mitochondrial in either UniProt or Gene Ontology. Interestingly, these include candidate regulators of parkin translocation to damaged mitochondria, and also many genes with known disease mutations, suggesting that careful investigation of MitoFates predictions may be helpful in elucidating the role of mitochondria in health and disease. MitoFates is open source with a convenient web server publicly available.

  8. Molecular cloning and amino acid sequence of human 5-lipoxygenase

    SciTech Connect

    Matsumoto, T.; Funk, C.D.; Radmark, O.; Hoeoeg, J.O.; Joernvall, H.; Samuelsson, B.

    1988-01-01

    5-Lipoxygenase (EC 1.13.11.34), a Ca/sup 2 +/- and ATP-requiring enzyme, catalyzes the first two steps in the biosynthesis of the peptidoleukotrienes and the chemotactic factor leukotriene B/sub 4/. A cDNA clone corresponding to 5-lipoxygenase was isolated from a human lung lambda gt11 expression library by immunoscreening with a polyclonal antibody. Additional clones from a human placenta lambda gt11 cDNA library were obtained by plaque hybridization with the /sup 32/P-labeled lung cDNA clone. Sequence data obtained from several overlapping clones indicate that the composite DNAs contain the complete coding region for the enzyme. From the deduced primary structure, 5-lipoxygenase encodes a 673 amino acid protein with a calculated molecular weight of 77,839. Direct analysis of the native protein and its proteolytic fragments confirmed the deduced composition, the amino-terminal amino acid sequence, and the structure of many internal segments. 5-Lipoxygenase has no apparent sequence homology with leukotriene A/sub 4/ hydrolase or Ca/sup 2 +/-binding proteins. RNA blot analysis indicated substantial amounts of an mRNA species of approx. = 2700 nucleotides in leukocytes, lung, and placenta.

  9. Nucleic acid sequence detection using multiplexed oligonucleotide PCR

    DOEpatents

    Nolan, John P.; White, P. Scott

    2006-12-26

    Methods for rapidly detecting single or multiple sequence alleles in a sample nucleic acid are described. Provided are all of the oligonucleotide pairs capable of annealing specifically to a target allele and discriminating among possible sequences thereof, and ligating to each other to form an oligonucleotide complex when a particular sequence feature is present (or, alternatively, absent) in the sample nucleic acid. The design of each oligonucleotide pair permits the subsequent high-level PCR amplification of a specific amplicon when the oligonucleotide complex is formed, but not when the oligonucleotide complex is not formed. The presence or absence of the specific amplicon is used to detect the allele. Detection of the specific amplicon may be achieved using a variety of methods well known in the art, including without limitation, oligonucleotide capture onto DNA chips or microarrays, oligonucleotide capture onto beads or microspheres, electrophoresis, and mass spectrometry. Various labels and address-capture tags may be employed in the amplicon detection step of multiplexed assays, as further described herein.

  10. Nine-amino-acid transactivation domain: establishment and prediction utilities.

    PubMed

    Piskacek, Simona; Gregor, Martin; Nemethova, Maria; Grabner, Martin; Kovarik, Pavel; Piskacek, Martin

    2007-06-01

    Here we describe the establishment and prediction utilities for a novel nine-amino-acid transactivation domain, 9aa TAD, that is common to the transactivation domains of a large number of yeast and animal transcription factors. We show that the 9aa TAD motif is required for the function of the transactivation domain of Gal4 and the related transcription factors Oaf1 and Pip2. The 9aa TAD possesses an autonomous transactivation activity in yeast and mammalian cells. Using sequence alignment and experimental data we derived a pattern that can be used for 9aa TAD prediction. The pattern allows the identification of 9aa TAD in other Gal4 family members or unrelated yeast, animal, and viral transcription factors. Thus, the 9aa TAD represents the smallest known denominator for a broad range of transcription factors. The wide occurrence of the 9aa TAD suggests that this domain mediates conserved interactions with general transcriptional cofactors. A computational search for the 9aa TAD is available online from National EMBnet-Node Austria at http://www.at.embnet.org/toolbox/9aatad/.

  11. The amino acid sequence of chymopapain from Carica papaya.

    PubMed Central

    Watson, D C; Yaguchi, M; Lynn, K R

    1990-01-01

    Chymopapain is a polypeptide of 218 amino acid residues. It has considerable structural similarity with papain and papaya proteinase omega, including conservation of the catalytic site and of the disulphide bonding. Chymopapain is like papaya proteinase omega in carrying four extra residues between papain positions 168 and 169, but differs from both papaya proteinases in the composition of its S2 subsite, as well as in having a second thiol group, Cys-117. Some evidence for the amino acid sequence of chymopapain has been deposited as Supplementary Publication SUP 50153 (12 pages) at the British Library Document Supply Centre, Boston Spa., Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms indicated in Biochem. J. (1990) 265, 5. The information comprises Supplement Tables 1-4, which contain, in order, amino acid compositions of peptides from tryptic, peptic, CNBr and mild acid cleavages, Supplement Fig. 1, showing re-fractionation of selected peaks from Fig. 2 of the main paper. Supplement Fig. 2, showing cation-exchange chromatography of the earliest-eluted peak of Fig. 3 of the main paper, Supplement Fig. 3, showing reverse-phase h.p.l.c. of the later-eluted peak from Fig. 3 of the main paper, and Supplement Fig. 4, showing the separation of peptides after mild acid hydrolysis of CNBr-cleavage fragment CB3. PMID:2106878

  12. Prediction of Staphylococcus aureus antimicrobial resistance by whole-genome sequencing.

    PubMed

    Gordon, N C; Price, J R; Cole, K; Everitt, R; Morgan, M; Finney, J; Kearns, A M; Pichon, B; Young, B; Wilson, D J; Llewelyn, M J; Paul, J; Peto, T E A; Crook, D W; Walker, A S; Golubchik, T

    2014-04-01

    Whole-genome sequencing (WGS) could potentially provide a single platform for extracting all the information required to predict an organism's phenotype. However, its ability to provide accurate predictions has not yet been demonstrated in large independent studies of specific organisms. In this study, we aimed to develop a genotypic prediction method for antimicrobial susceptibilities. The whole genomes of 501 unrelated Staphylococcus aureus isolates were sequenced, and the assembled genomes were interrogated using BLASTn for a panel of known resistance determinants (chromosomal mutations and genes carried on plasmids). Results were compared with phenotypic susceptibility testing for 12 commonly used antimicrobial agents (penicillin, methicillin, erythromycin, clindamycin, tetracycline, ciprofloxacin, vancomycin, trimethoprim, gentamicin, fusidic acid, rifampin, and mupirocin) performed by the routine clinical laboratory. We investigated discrepancies by repeat susceptibility testing and manual inspection of the sequences and used this information to optimize the resistance determinant panel and BLASTn algorithm. We then tested performance of the optimized tool in an independent validation set of 491 unrelated isolates, with phenotypic results obtained in duplicate by automated broth dilution (BD Phoenix) and disc diffusion. In the validation set, the overall sensitivity and specificity of the genomic prediction method were 0.97 (95% confidence interval [95% CI], 0.95 to 0.98) and 0.99 (95% CI, 0.99 to 1), respectively, compared to standard susceptibility testing methods. The very major error rate was 0.5%, and the major error rate was 0.7%. WGS was as sensitive and specific as routine antimicrobial susceptibility testing methods. WGS is a promising alternative to culture methods for resistance prediction in S. aureus and ultimately other major bacterial pathogens.

  13. Prediction of Staphylococcus aureus Antimicrobial Resistance by Whole-Genome Sequencing

    PubMed Central

    Price, J. R.; Cole, K.; Everitt, R.; Morgan, M.; Finney, J.; Kearns, A. M.; Pichon, B.; Young, B.; Wilson, D. J.; Llewelyn, M. J.; Paul, J.; Peto, T. E. A.; Crook, D. W.; Walker, A. S.; Golubchik, T.

    2014-01-01

    Whole-genome sequencing (WGS) could potentially provide a single platform for extracting all the information required to predict an organism's phenotype. However, its ability to provide accurate predictions has not yet been demonstrated in large independent studies of specific organisms. In this study, we aimed to develop a genotypic prediction method for antimicrobial susceptibilities. The whole genomes of 501 unrelated Staphylococcus aureus isolates were sequenced, and the assembled genomes were interrogated using BLASTn for a panel of known resistance determinants (chromosomal mutations and genes carried on plasmids). Results were compared with phenotypic susceptibility testing for 12 commonly used antimicrobial agents (penicillin, methicillin, erythromycin, clindamycin, tetracycline, ciprofloxacin, vancomycin, trimethoprim, gentamicin, fusidic acid, rifampin, and mupirocin) performed by the routine clinical laboratory. We investigated discrepancies by repeat susceptibility testing and manual inspection of the sequences and used this information to optimize the resistance determinant panel and BLASTn algorithm. We then tested performance of the optimized tool in an independent validation set of 491 unrelated isolates, with phenotypic results obtained in duplicate by automated broth dilution (BD Phoenix) and disc diffusion. In the validation set, the overall sensitivity and specificity of the genomic prediction method were 0.97 (95% confidence interval [95% CI], 0.95 to 0.98) and 0.99 (95% CI, 0.99 to 1), respectively, compared to standard susceptibility testing methods. The very major error rate was 0.5%, and the major error rate was 0.7%. WGS was as sensitive and specific as routine antimicrobial susceptibility testing methods. WGS is a promising alternative to culture methods for resistance prediction in S. aureus and ultimately other major bacterial pathogens. PMID:24501024

  14. The amino acid sequence of rabbit cardiac troponin I.

    PubMed Central

    Grand, R J; Wilkinson, J M

    1976-01-01

    The complete amino acid sequence of troponin I from rabbit cardiac muscle was determined by the isolation of four unique CNBr fragments, together with overlapping tryptic peptides containing radioactive methionine residues. Overlap data for residues 35-36, 93-94 and 140-145 are incomplete, the sequence at these positions being based on homology with the sequence of the fast-skeletal-muscle protein. Cardiac troponin I is a single polypeptide chain of 206 residues with mol.wt. 23550 and an extinction coefficient, E 1%,1cm/280, of 4.37. The protein has a net positive charge of 14 and is thus somewhat more basic than troponin I from fast-skeletal muscle. Comparison of the sequences of troponin I from cardiac and fast skeletal muscle show that the cardiac protein has 26 extra residues at the N-terminus which account for the larger size of the protein. In the remainder of sequence there is a considerable degree of homology, this being greater in the C-terminal two-thirds of the molecule. The region in the cardiac protein corresponding to the peptide with inhibitory activity from the fast-skeletal-muscle protein is very similar and it seems unlikely that this is the cause of the difference in inhibitory activity between the two proteins. The region responsible for binding troponin C, however, possesses a lower degree of homology. Detailed evidence on which the sequence is based has been deposited as Supplementary Publication SUP 50072 (20 pages), at the British Library Lending Division, Boston Spa, Wetherby, West Yorkshire LS23 7QB, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1976) 153, 5. PMID:1008822

  15. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection

    PubMed Central

    Ma, Xin; Guo, Jing; Sun, Xiao

    2015-01-01

    The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information. PMID:26543860

  16. Amino acid sequence of a mouse immunoglobulin mu chain.

    PubMed Central

    Kehry, M; Sibley, C; Fuhrman, J; Schilling, J; Hood, L E

    1979-01-01

    The complete amino acid sequence of the mouse mu chain from the BALB/c myeloma tumor MOPC 104E is reported. The C mu region contains four consecutive homology regions of approximately 110 residues and a COOH-terminal region of 19 residues. A comparison of this mu chain from mouse with a complete mu sequence from human (Ou) and a partial mu chain sequence from dog (Moo) reveals a striking gradient of increasing homology from the NH2-terminal to the COOH-terminal portion of these mu chains, with the former being the least and the latter the most highly conserved. Four of the five sites of carbohydrate attachment appear to be at identical residue positions when the constant regions of the mouse and human mu chains are compared. The mu chain of MOPC 104E has a carbohydrate moiety attached in the second hypervariable region. This is particularly interesting in view of the fact that MOPC 104E binds alpha-(1 leads to 3)-dextran, a simple carbohydrate. The structural and functional constraints imposed by these comparative sequence analyses are discussed. PMID:111247

  17. Innovations in host and microbial sialic acid biosynthesis revealed by phylogenomic prediction of nonulosonic acid structure

    PubMed Central

    Lewis, Amanda L.; Desa, Nolan; Hansen, Elizabeth E.; Knirel, Yuriy A.; Gordon, Jeffrey I.; Gagneux, Pascal; Nizet, Victor; Varki, Ajit

    2009-01-01

    Sialic acids (Sias) are nonulosonic acid (NulO) sugars prominently displayed on vertebrate cells and occasionally mimicked by bacterial pathogens using homologous biosynthetic pathways. It has been suggested that Sias were an animal innovation and later emerged in pathogens by convergent evolution or horizontal gene transfer. To better illuminate the evolutionary processes underlying the phenomenon of Sia molecular mimicry, we performed phylogenomic analyses of biosynthetic pathways for Sias and related higher sugars derived from 5,7-diamino-3,5,7,9-tetradeoxynon-2-ulosonic acids. Examination of ≈1,000 sequenced microbial genomes indicated that such biosynthetic pathways are far more widely distributed than previously realized. Phylogenetic analysis, validated by targeted biochemistry, was used to predict NulO types (i.e., neuraminic, legionaminic, or pseudaminic acids) expressed by various organisms. This approach uncovered previously unreported occurrences of Sia pathways in pathogenic and symbiotic bacteria and identified at least one instance in which a human archaeal symbiont tentatively reported to express Sias in fact expressed the related pseudaminic acid structure. Evaluation of targeted phylogenies and protein domain organization revealed that the “unique” Sia biosynthetic pathway of animals was instead a much more ancient innovation. Pathway phylogenies suggest that bacterial pathogens may have acquired Sia expression via adaptation of pathways for legionaminic acid biosynthesis, one of at least 3 evolutionary paths for de novo Sia synthesis. Together, these data indicate that some of the long-standing paradigms in Sia biology should be reconsidered in a wider evolutionary context of the extended family of NulO sugars. PMID:19666579

  18. Coevolutionary modeling of protein sequences: Predicting structure, function, and mutational landscapes

    NASA Astrophysics Data System (ADS)

    Weigt, Martin

    Over the last years, biological research has been revolutionized by experimental high-throughput techniques, in particular by next-generation sequencing technology. Unprecedented amounts of data are accumulating, and there is a growing request for computational methods unveiling the information hidden in raw data, thereby increasing our understanding of complex biological systems. Statistical-physics models based on the maximum-entropy principle have, in the last few years, played an important role in this context. To give a specific example, proteins and many non-coding RNA show a remarkable degree of structural and functional conservation in the course of evolution, despite a large variability in amino acid sequences. We have developed a statistical-mechanics inspired inference approach - called Direct-Coupling Analysis - to link this sequence variability (easy to observe in sequence alignments, which are available in public sequence databases) to bio-molecular structure and function. In my presentation I will show, how this methodology can be used (i) to infer contacts between residues and thus to guide tertiary and quaternary protein structure prediction and RNA structure prediction, (ii) to discriminate interacting from non-interacting protein families, and thus to infer conserved protein-protein interaction networks, and (iii) to reconstruct mutational landscapes and thus to predict the phenotypic effect of mutations. References [1] M. Figliuzzi, H. Jacquier, A. Schug, O. Tenaillon and M. Weigt ''Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1'', Mol. Biol. Evol. (2015), doi: 10.1093/molbev/msv211 [2] E. De Leonardis, B. Lutz, S. Ratz, S. Cocco, R. Monasson, A. Schug, M. Weigt ''Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction'', Nucleic Acids Research (2015), doi: 10.1093/nar/gkv932 [3] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. Marks, C

  19. Nucleotide sequences of the Pseudomonas savastanoi indoleacetic acid genes show homology with Agrobacterium tumefaciens T-DNA

    PubMed Central

    Yamada, Tetsuji; Palm, Curtis J.; Brooks, Bob; Kosuge, Tsune

    1985-01-01

    We report the nucleotide sequences of iaaM and iaaH, the genetic determinants for, respectively, tryptophan 2-monooxygenase and indoleacetamide hydrolase, the enzymes that catalyze the conversion of L-tryptophan to indoleacetic acid in the tumor-forming bacterium Pseudomonas syringae pv. savastanoi. The sequence analysis indicates that the iaaM locus contains an open reading frame encoding 557 amino acids that would comprise a protein with a molecular weight of 61,783; the iaaH locus contains an open reading frame of 455 amino acids that would comprise a protein with a molecular weight of 48,515. Significant amino acid sequence homology was found between the predicted sequence of the tryptophan monooxygenase of P. savastanoi and the deduced product of the T-DNA tms-1 gene of the octopine-type plasmid pTiA6NC from Agrobacterium tumefaciens. Strong homology was found in the 25 amino acid sequence in the putative FAD-binding region of tryptophan monooxygenase. Homology was also found in the amino acid sequences representing the central regions of the putative products of iaaH and tms-2 T-DNA. The results suggest a strong similarity in the pathways for indoleacetic acid synthesis encoded by genes in P. savastanoi and in A. tumefaciens T-DNA. Images PMID:16593610

  20. Yeast prions and human prion-like proteins: sequence features and prediction methods.

    PubMed

    Cascarina, Sean M; Ross, Eric D

    2014-06-01

    Prions are self-propagating infectious protein isoforms. A growing number of prions have been identified in yeast, each resulting from the conversion of soluble proteins into an insoluble amyloid form. These yeast prions have served as a powerful model system for studying the causes and consequences of prion aggregation. Remarkably, a number of human proteins containing prion-like domains, defined as domains with compositional similarity to yeast prion domains, have recently been linked to various human degenerative diseases, including amyotrophic lateral sclerosis. This suggests that the lessons learned from yeast prions may help in understanding these human diseases. In this review, we examine what has been learned about the amino acid sequence basis for prion aggregation in yeast, and how this information has been used to develop methods to predict aggregation propensity. We then discuss how this information is being applied to understand human disease, and the challenges involved in applying yeast prediction methods to higher organisms.

  1. Improved peptide elution time prediction for reversed-phase liquid chromatography-MS by incorporating peptide sequence information

    SciTech Connect

    Petritis, Konstantinos; Kangas, Lars J.; Yan, Bo; Monroe, Matthew E.; Strittmatter, Eric F.; Qian, Weijun; Adkins, Joshua N.; Moore, Ronald J.; Xu, Ying; Lipton, Mary S.; Camp, David G.; Smith, Richard D.

    2006-07-15

    We describe an improved artificial neural network (ANN)-based method for predicting peptide retention times in reversed phase liquid chromatography. In addition to the peptide amino acid composition, this study investigated several other peptide descriptors to improve the predictive capability, such as peptide length, sequence, hydrophobicity and hydrophobic moment, and nearest neighbor amino acid, as well as peptide predicted structural configurations (i.e., helix, sheet, coil). An ANN architecture that consisted of 1052 input nodes, 24 hidden nodes, and 1 output node was used to fully consider the amino acid residue sequence in each peptide. The network was trained using {approx}345,000 non-redundant peptides identified from a total of 12,059 LC-MS/MS analyses of more than 20 different organisms, and the predictive capability of the model was tested using 1303 confidently identified peptides that were not included in the training set. The model demonstrated an average elution time precision of {approx}1.5% and was able to distinguish among isomeric peptides based upon the inclusion of peptide sequence information. The prediction power represents a significant improvement over our earlier report (Petritis et al., Anal. Chem. 2003, 75, 1039-1048) and other previously reported models.

  2. Ultrasensitive nucleic acid sequence detection by single-molecule electrophoresis

    SciTech Connect

    Castro, A; Shera, E.B.

    1996-09-01

    This is the final report of a one-year laboratory-directed research and development project at Los Alamos National Laboratory. There has been considerable interest in the development of very sensitive clinical diagnostic techniques over the last few years. Many pathogenic agents are often present in extremely small concentrations in clinical samples, especially at the initial stages of infection, making their detection very difficult. This project sought to develop a new technique for the detection and accurate quantification of specific bacterial and viral nucleic acid sequences in clinical samples. The scheme involved the use of novel hybridization probes for the detection of nucleic acids combined with our recently developed technique of single-molecule electrophoresis. This project is directly relevant to the DOE`s Defense Programs strategic directions in the area of biological warfare counter-proliferation.

  3. Deduced amino acid sequence of human pulmonary surfactant proteolipid: SPL(pVal)

    SciTech Connect

    Whitsett, J.A.; Glasser, S.W.; Korfhagen, T.R.; Weaver, T.E.; Clark, J.; Pilot-Matias, T.; Meuth, J.; Fox, J.L.

    1987-05-01

    Hydrophobic, proteolipid-like protein of Mr 6500 was isolated from ether/ethanol extracts of human, canine and bovine pulmonary surfactant. Amino acid composition of the protein demonstrated a remarkable abundance of hydrophobic residues, particularly valine and leucine. The N-terminal amino acid sequence of the human protein was determined: N-Leu-Ile-Pro-Cys-Cys-Pro-Val-Asn-Leu-Lys-Arg-Leu-Leu-Ile-Val4... An oligonucleotide probe was used to screen an adult human lung cDNA library and resulted in detection of cDNA clones with predicted amino acid sequence with close identity to the N-terminal amino acid sequence of the human peptide. SPL(pVal) was found within the reading frame of a larger peptide. SPL(pVal) results from proteolytic processing of a larger preprotein. Northern blot analysis detected in a single 1.0 kilobase SPL(pVal) RNA which was less abundant in fetal than in adult lung. Mixtures of purified canine and bovine SPL(pVal) and synthetic phospholipids display properties of rapid adsorption and surface tension lowering activity characteristic of surfactant. Human SPL(pVal) is a pulmonary surfactant proteolipid which may therefore be useful in combination with phospholipids and/or other surfactant proteins for the treatment of surfactant deficiency such as hyaline membrane disease in newborn infants.

  4. Predictive genomics: a cancer hallmark network framework for predicting tumor clinical phenotypes using genome sequencing data.

    PubMed

    Wang, Edwin; Zaman, Naif; Mcgee, Shauna; Milanese, Jean-Sébastien; Masoudi-Nejad, Ali; O'Connor-McCourt, Maureen

    2015-02-01

    Tumor genome sequencing leads to documenting thousands of DNA mutations and other genomic alterations. At present, these data cannot be analyzed adequately to aid in the understanding of tumorigenesis and its evolution. Moreover, we have little insight into how to use these data to predict clinical phenotypes and tumor progression to better design patient treatment. To meet these challenges, we discuss a cancer hallmark network framework for modeling genome sequencing data to predict cancer clonal evolution and associated clinical phenotypes. The framework includes: (1) cancer hallmarks that can be represented by a few molecular/signaling networks. 'Network operational signatures' which represent gene regulatory logics/strengths enable to quantify state transitions and measures of hallmark traits. Thus, sets of genomic alterations which are associated with network operational signatures could be linked to the state/measure of hallmark traits. The network operational signature transforms genotypic data (i.e., genomic alterations) to regulatory phenotypic profiles (i.e., regulatory logics/strengths), to cellular phenotypic profiles (i.e., hallmark traits) which lead to clinical phenotypic profiles (i.e., a collection of hallmark traits). Furthermore, the framework considers regulatory logics of the hallmark networks under tumor evolutionary dynamics and therefore also includes: (2) a self-promoting positive feedback loop that is dominated by a genomic instability network and a cell survival/proliferation network is the main driver of tumor clonal evolution. Surrounding tumor stroma and its host immune systems shape the evolutionary paths; (3) cell motility initiating metastasis is a byproduct of the above self-promoting loop activity during tumorigenesis; (4) an emerging hallmark network which triggers genome duplication dominates a feed-forward loop which in turn could act as a rate-limiting step for tumor formation; (5) mutations and other genomic alterations have

  5. Sequence-Based Predictions of Lipooligosaccharide Diversity in the Neisseriaceae and Their Implication in Pathogenicity

    PubMed Central

    Stein, Daniel C.; Miller, Clinton J.; Bhoopalan, Senthil V.; Sommer, Daniel D.

    2011-01-01

    Endotoxin [Lipopolysaccharide (LPS)/Lipooligosaccharide (LOS)] is an important virulence determinant in gram negative bacteria. While the genetic basis of endotoxin production and its role in disease in the pathogenic Neisseria has been extensively studied, little research has focused on the genetic basis of LOS biosynthesis in commensal Neisseria. We determined the genomic sequences of a variety of commensal Neisseria strains, and compared these sequences, along with other genomic sequences available from various sequencing centers from commensal and pathogenic strains, to identify genes involved in LOS biosynthesis. This allowed us to make structural predictions as to differences in LOS seen between commensal and pathogenic strains. We determined that all neisserial strains possess a conserved set of genes needed to make a common 3-Deoxy-D-manno-octulosonic acid -heptose core structure. However, significant genomic differences in glycosyl transferase genes support the published literature indicating compositional differences in the terminal oligosaccharides. This was most pronounced in commensal strains that were distally related to the gonococcus and meningococcus. These strains possessed a homolog of heptosyltransferase III, suggesting that they differ from the pathogenic strains by the presence a third heptose. Furthermore, most commensal strains possess homologs of genes needed to synthesize lipopolysaccharide (LPS). N. cinerea, a commensal species that is highly related to the gonococcus has lost the ability to make sialyltransferase. Overall genomic comparisons of various neisserial strains indicate that significant recombination/genetic acquisition/loss has occurred within the genus, and this muddles proper speciation. PMID:21533118

  6. Acid mine drainage prediction and remediation

    SciTech Connect

    Robb, G.; Robinson, J.

    1996-12-31

    The use of constructed wetlands for treatment of acid mine drainage is discussed in the article. Drainage characteristics and mine water flow rate are identified as important predictors of remediation success. Aerobic and anaerobic chemical reaction processes are described. Problems and potential uses of wetlands are briefly described.

  7. Nucleic acid (cDNA) and amino acid sequences of alpha-type gliadins from wheat (Triticum aestivum).

    PubMed Central

    Kasarda, D D; Okita, T W; Bernardin, J E; Baecker, P A; Nimmo, C C; Lew, E J; Dietler, M D; Greene, F C

    1984-01-01

    The complete amino acid sequence for an alpha-type gliadin protein of wheat (Triticum aestivum Linnaeus) endosperm has been derived from a cloned cDNA sequence. An additional cDNA clone that corresponds to about 75% of a similar alpha-type gliadin has been sequenced and shows some important differences. About 97% of the composite sequence of A-gliadin (an alpha-type gliadin fraction) has also been obtained by direct amino acid sequencing. This sequence shows a high degree of similarity with amino acid sequences derived from both cDNA clones and is virtually identical to one of them. On the basis of sequence information, after loss of the signal sequence, the mature alpha-type gliadins may be divided into five different domains, two of which may have evolved from an ancestral gliadin gene, whereas the remaining three contain repeating sequences that may have developed independently. Images PMID:6589619

  8. Prediction of enzymes and non-enzymes from protein sequences based on sequence derived features and PSSM matrix using artificial neural network.

    PubMed

    Naik, Pradeep Kumar; Mishra, Viplav Shankar; Gupta, Mukul; Jaiswal, Kunal

    2007-12-05

    The problem of predicting the enzymes and non-enzymes from the protein sequence information is still an open problem in bioinformatics. It is further becoming more important as the number of sequenced information grows exponentially over time. We describe a novel approach for predicting the enzymes and non-enzymes from its amino-acid sequence using artificial neural network (ANN). Using 61 sequence derived features alone we have been able to achieve 79 percent correct prediction of enzymes/non-enzymes (in the set of 660 proteins). For the complete set of 61 parameters using 5-fold cross-validated classification, ANN model reveal a superior model (accuracy = 78.79 plus or minus 6.86 percent, Q(pred) = 74.734 plus or minus 17.08 percent, sensitivity = 84.48 plus or minus 6.73 percent, specificity = 77.13 plus or minus 13.39 percent). The second module of ANN is based on PSSM matrix. Using the same 5-fold cross-validation set, this ANN model predicts enzymes/non-enzymes with more accuracy (accuracy = 80.37 plus or minus 6.59 percent, Q(pred) = 67.466 plus or minus 12.41 percent, sensitivity = 0.9070 plus or minus 3.37 percent, specificity = 74.66 plus or minus 7.17 percent).

  9. Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information

    PubMed Central

    Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

    2016-01-01

    3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids. PMID:27467780

  10. Stereochemical Sequence Ion Selectivity: Proline versus Pipecolic-acid-containing Protonated Peptides

    NASA Astrophysics Data System (ADS)

    Abutokaikah, Maha T.; Guan, Shanshan; Bythell, Benjamin J.

    2017-01-01

    Substitution of proline by pipecolic acid, the six-membered ring congener of proline, results in vastly different tandem mass spectra. The well-known proline effect is eliminated and amide bond cleavage C-terminal to pipecolic acid dominates instead. Why do these two ostensibly similar residues produce dramatically differing spectra? Recent evidence indicates that the proton affinities of these residues are similar, so are unlikely to explain the result [Raulfs et al., J. Am. Soc. Mass Spectrom. 25, 1705-1715 (2014)]. An additional hypothesis based on increased flexibility was also advocated. Here, we provide a computational investigation of the "pipecolic acid effect," to test this and other hypotheses to determine if theory can shed additional light on this fascinating result. Our calculations provide evidence for both the increased flexibility of pipecolic-acid-containing peptides, and structural changes in the transition structures necessary to produce the sequence ions. The most striking computational finding is inversion of the stereochemistry of the transition structures leading to "proline effect"-type amide bond fragmentation between the proline/pipecolic acid-congeners: R (proline) to S (pipecolic acid). Additionally, our calculations predict substantial stabilization of the amide bond cleavage barriers for the pipecolic acid congeners by reduction in deleterious steric interactions and provide evidence for the importance of experimental energy regime in rationalizing the spectra.

  11. Development of a Machine Learning Method to Predict Membrane Protein-Ligand Binding Residues Using Basic Sequence Information

    PubMed Central

    Suresh, M. Xavier; Gromiha, M. Michael; Suwa, Makiko

    2015-01-01

    Locating ligand binding sites and finding the functionally important residues from protein sequences as well as structures became one of the challenges in understanding their function. Hence a Naïve Bayes classifier has been trained to predict whether a given amino acid residue in membrane protein sequence is a ligand binding residue or not using only sequence based information. The input to the classifier consists of the features of the target residue and two sequence neighbors on each side of the target residue. The classifier is trained and evaluated on a nonredundant set of 42 sequences (chains with at least one transmembrane domain) from 31 alpha-helical membrane proteins. The classifier achieves an overall accuracy of 70.7% with 72.5% specificity and 61.1% sensitivity in identifying ligand binding residues from sequence. The classifier performs better when the sequence is encoded by psi-blast generated PSSM profiles. Assessment of the predictions in the context of three-dimensional structures of proteins reveals the effectiveness of this method in identifying ligand binding sites from sequence information. In 83.3% (35 out of 42) of the proteins, the classifier identifies the ligand binding sites by correctly recognizing more than half of the binding residues. This will be useful to protein engineers in exploiting potential residues for functional assessment. PMID:25802517

  12. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.

    PubMed

    Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

    2015-01-01

    Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  13. [Prediction of lipases types by different scale pseudo-amino acid composition].

    PubMed

    Zhang, Guangya; Li, Hongchun; Gao, Jiaqiang; Fang, Baishan

    2008-11-01

    Lipases are widely used enzymes in biotechnology. Although they catalyze the same reaction, their sequences vary. Therefore, it is highly desired to develop a fast and reliable method to identify the types of lipases according to their sequences, or even just to confirm whether they are lipases or not. By proposing two scales based pseudo amino acid composition approaches to extract the features of the sequences, a powerful predictor based on k-nearest neighbor was introduced to address the problems. The overall success rates thus obtained by the 10-fold cross-validation test were shown as below: for predicting lipases and nonlipase, the success rates were 92.8%, 91.4% and 91.3%, respectively. For lipase types, the success rates were 92.3%, 90.3% and 89.7%, respectively. Among them, the Z scales based pseudo amino acid composition was the best, T scales was the second. They outperformed significantly than 6 other frequently used sequence feature extraction methods. The high success rates yielded for such a stringent dataset indicate predicting the types of lipases is feasible and the different scales pseudo amino acid composition might be a useful tool for extracting the features of protein sequences, or at lease can play a complementary role to many of the other existing approaches.

  14. Predicted Molecular Effects of Sequence Variants Link to System Level of Disease

    PubMed Central

    Bromberg, Yana; Rost, Burkhard

    2016-01-01

    Developments in experimental and computational biology are advancing our understanding of how protein sequence variation impacts molecular protein function. However, the leap from the micro level of molecular function to the macro level of the whole organism, e.g. disease, remains barred. Here, we present new results emphasizing earlier work that suggested some links from molecular function to disease. We focused on non-synonymous single nucleotide variants, also referred to as single amino acid variants (SAVs). Building upon OMIA (Online Mendelian Inheritance in Animals), we introduced a curated set of 117 disease-causing SAVs in animals. Methods optimized to capture effects upon molecular function often correctly predict human (OMIM) and animal (OMIA) Mendelian disease-causing variants. We also predicted effects of human disease-causing variants in the mouse model, i.e. we put OMIM SAVs into mouse orthologs. Overall, fewer variants were predicted with effect in the model organism than in the original organism. Our results, along with other recent studies, demonstrate that predictions of molecular effects capture some important aspects of disease. Thus, in silico methods focusing on the micro level of molecular function can help to understand the macro system level of disease. PMID:27536940

  15. AMS 4.0: consensus prediction of post-translational modifications in protein sequences.

    PubMed

    Plewczynski, Dariusz; Basu, Subhadip; Saha, Indrajit

    2012-08-01

    We present here the 2011 update of the AutoMotif Service (AMS 4.0) that predicts the wide selection of 88 different types of the single amino acid post-translational modifications (PTM) in protein sequences. The selection of experimentally confirmed modifications is acquired from the latest UniProt and Phospho.ELM databases for training. The sequence vicinity of each modified residue is represented using amino acids physico-chemical features encoded using high quality indices (HQI) obtaining by automatic clustering of known indices extracted from AAindex database. For each type of the numerical representation, the method builds the ensemble of Multi-Layer Perceptron (MLP) pattern classifiers, each optimising different objectives during the training (for example the recall, precision or area under the ROC curve (AUC)). The consensus is built using brainstorming technology, which combines multi-objective instances of machine learning algorithm, and the data fusion of different training objects representations, in order to boost the overall prediction accuracy of conserved short sequence motifs. The performance of AMS 4.0 is compared with the accuracy of previous versions, which were constructed using single machine learning methods (artificial neural networks, support vector machine). Our software improves the average AUC score of the earlier version by close to 7 % as calculated on the test datasets of all 88 PTM types. Moreover, for the selected most-difficult sequence motifs types it is able to improve the prediction performance by almost 32 %, when compared with previously used single machine learning methods. Summarising, the brainstorming consensus meta-learning methodology on the average boosts the AUC score up to around 89 %, averaged over all 88 PTM types. Detailed results for single machine learning methods and the consensus methodology are also provided, together with the comparison to previously published methods and state-of-the-art software tools. The

  16. Complete amino acid sequence of a histidine-rich proteolytic fragment of human ceruloplasmin.

    PubMed

    Kingston, I B; Kingston, B L; Putnam, F W

    1979-04-01

    The complete amino acid sequence has been determined for a fragment of human ceruloplasmin [ferroxidase; iron(II):oxygen oxidoreductase, EC 1.16.3.1]. The fragment (designated Cp F5) contains 159 amino acid residues and has a molecular weight of 18,650; it lacks carbohydrate, is rich in histidine, and contains one free cysteine that may be part of a copper-binding site. This fragment is present in most commercial preparations of ceruloplasmin, probably owing to proteolytic degradation, but can also be obtained by limited cleavage of single-chain ceruloplasmin with plasmin. Cp F5 probably is an intact domain attached to the COOH-terminal end of single-chain ceruloplasmin via a labile interdomain peptide bond. A model of the secondary structure predicted by empirical methods suggests that almost one-third of the amino acid residues are distributed in alpha helices, about a third in beta-sheet structure, and the remainder in beta turns and unidentified structures. Computer analysis of the amino acid sequence has not demonstrated a statistically significant relationship between this ceruloplasmin fragment and any other protein, but there is some evidence for an internal duplication.

  17. The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences.

    PubMed

    White, S H

    1994-04-01

    This paper continues an examination of the hypothesis that modern proteins evolved from random heteropeptide sequences. In support of the hypothesis, White and Jacobs (1993, J Mol Evol 36:79-95) have shown that any sequence chosen randomly from a large collection of nonhomologous proteins has a 90% or better chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. The goal of the present study was to investigate the possibility that the random-origin hypothesis could explain the lengths of modern protein sequences without invoking specific mechanisms such as gene duplication or exon splicing. The sets of sequences examined were taken from the 1989 PIR database and consisted of 1,792 "super-family" proteins selected to have little sequence identity, 623 E. coli sequences, and 398 human sequences. The length distributions of the proteins could be described with high significance by either of two closely related probability density functions: The gamma distribution with parameter 2 or the distribution for the sum of two exponential random independent variables. A simple theory for the distributions was developed which assumes that (1) protoprotein sequences had exponentially distributed random independent lengths, (2) the length dependence of protein stability determined which of these protoproteins could fold into compact primitive proteins and thereby attain the potential for biochemical activity, (3) the useful protein sequences were preserved by the primitive genome, and (4) the resulting distribution of sequence lengths is reflected by modern proteins. The theory successfully predicts the two observed distributions which can be distinguished by the functional form of the dependence of protein stability on length. The theory leads to three interesting conclusions. First, it predicts that a tetra-nucleotide was the signal for primitive translation termination. This prediction is

  18. Nucleic acid (cDNA) and amino acid sequences of the maize endosperm protein glutelin-2.

    PubMed Central

    Prat, S; Cortadas, J; Puigdomènech, P; Palau, J

    1985-01-01

    The cDNA coding for a glutelin-2 protein from maize endosperm has been cloned and the complete amino acid sequence of the protein derived for the first time. An immature maize endosperm cDNA bank was screened for the expression of a beta-lactamase:glutelin-2 (G2) fusion polypeptide by using antibodies against the purified 28 kd G2 protein. A clone corresponding to the 28 kd G2 protein was sequenced and the primary structure of this protein was derived. Five regions can be defined in the protein sequence: an 11 residue N-terminal part, a repeated region formed by eight units of the sequence Pro-Pro-Pro-Val-His-Leu, an alternating Pro-X stretch 21 residues long, a Cys rich domain and a C-terminal part rich in Gln. The protein sequence is preceded by 19 residues which have the characteristics of the signal peptide found in secreted proteins. Unlike zeins, the main maize storage proteins, 28 kd glutelin-2 has several homologous sequences in common with other cereal storage proteins. Images PMID:3839076

  19. Environment-specific amino acid substitution tables: tertiary templates and prediction of protein folds.

    PubMed Central

    Overington, J.; Donnelly, D.; Johnson, M. S.; Sali, A.; Blundell, T. L.

    1992-01-01

    The local environment of an amino acid in a folded protein determines the acceptability of mutations at that position. In order to characterize and quantify these structural constraints, we have made a comparative analysis of families of homologous proteins. Residues in each structure are classified according to amino acid type, secondary structure, accessibility of the side chain, and existence of hydrogen bonds from the side chains. Analysis of the pattern of observed substitutions as a function of local environment shows that there are distinct patterns, especially for buried polar residues. The substitution data tables are available on diskette with Protein Science. Given the fold of a protein, one is able to predict sequences compatible with the fold (profiles or templates) and potentially to discriminate between a correctly folded and misfolded protein. Conversely, analysis of residue variation across a family of aligned sequences in terms of substitution profiles can allow prediction of secondary structure or tertiary environment. PMID:1304904

  20. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2010-07-01 2010-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  1. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2012-07-01 2012-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  2. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2014-07-01 2014-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  3. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2011-07-01 2011-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  4. 37 CFR 1.821 - Nucleotide and/or amino acid sequence disclosures in patent applications.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Nucleotide and/or amino acid... Biotechnology Invention Disclosures Application Disclosures Containing Nucleotide And/or Amino Acid Sequences § 1.821 Nucleotide and/or amino acid sequence disclosures in patent applications. (a) Nucleotide...

  5. Computational Prediction of Phylogenetically Conserved Sequence Motifs for Five Different Candidate Genes in Type II Diabetic Nephropathy

    PubMed Central

    Sindhu, T; Rajamanikandan, S; Srinivasan, P

    2012-01-01

    Background: Computational identification of phylogenetic motifs helps to understand the knowledge about known functional features that includes catalytic site, substrate binding epitopes, and protein-protein interfaces. Furthermore, they are strongly conserved among orthologs, indicating their evolutionary importance. The study aimed to analyze five candidate genes involved in type II diabetic nephropathy and to predict phylogenetic motifs from their corresponding orthologous protein sequences. Methods: AKR1B1, APOE, ENPP1, ELMO1 and IGFBP1 are the genes that have been identified as an important target for type II diabetic nephropathy through experimental studies. Their corresponding protein sequences, structures, orthologous sequences were retrieved from UniprotKB, PDB, and PHOG database respectively. Multiple sequence alignments were constructed using ClustalW and phylogenetic motifs were identified using MINER. The occurrence of amino acids in the obtained phylogenetic motifs was generated using WebLogo and false positive expectations were calculated against phylogenetic similarity. Results: In total, 17 phylogenetic motifs were identified from the five proteins and the residues such as glycine, leucine, tryptophan, aspartic acid were found in appreciable frequency whereas arginine identified in all the predicted PMs. The result implies that these residues can be important to the functional and structural role of the proteins and calculated false positive expectations implies that they were generally conserved in traditional sense. Conclusion: The prediction of phylogenetic motifs is an accurate method for detecting functionally important conserved residues. The conserved motifs can be used as a potential drug target for type II diabetic nephropathy. PMID:23113206

  6. EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models.

    PubMed

    Folkman, Lukas; Stantic, Bela; Sattar, Abdul; Zhou, Yaoqi

    2016-03-27

    Protein engineering and characterisation of non-synonymous single nucleotide variants (SNVs) require accurate prediction of protein stability changes (ΔΔGu) induced by single amino acid substitutions. Here, we have developed a new prediction method called Evolutionary, Amino acid, and Structural Encodings with Multiple Models (EASE-MM), which comprises five specialised support vector machine (SVM) models and makes the final prediction from a consensus of two models selected based on the predicted secondary structure and accessible surface area of the mutated residue. The new method is applicable to single-domain monomeric proteins and can predict ΔΔGu with a protein sequence and mutation as the only inputs. EASE-MM yielded a Pearson correlation coefficient of 0.53-0.59 in 10-fold cross-validation and independent testing and was able to outperform other sequence-based methods. When compared to structure-based energy functions, EASE-MM achieved a comparable or better performance. The application to a large dataset of human germline non-synonymous SNVs showed that the disease-causing variants tend to be associated with larger magnitudes of ΔΔGu predicted with EASE-MM. The EASE-MM web-server is available at http://sparks-lab.org/server/ease.

  7. Predicting Shine–Dalgarno Sequence Locations Exposes Genome Annotation Errors

    PubMed Central

    Starmer, J; Stomp, A; Vouk, M; Bitzer, D

    2006-01-01

    In prokaryotes, Shine–Dalgarno (SD) sequences, nucleotides upstream from start codons on messenger RNAs (mRNAs) that are complementary to ribosomal RNA (rRNA), facilitate the initiation of protein synthesis. The location of SD sequences relative to start codons and the stability of the hybridization between the mRNA and the rRNA correlate with the rate of synthesis. Thus, accurate characterization of SD sequences enhances our understanding of how an organism's transcriptome relates to its cellular proteome. We implemented the Individual Nearest Neighbor Hydrogen Bond model for oligo–oligo hybridization and created a new metric, relative spacing (RS), to identify both the location and the hybridization potential of SD sequences by simulating the binding between mRNAs and single-stranded 16S rRNA 3′ tails. In 18 prokaryote genomes, we identified 2,420 genes out of 58,550 where the strongest binding in the translation initiation region included the start codon, deviating from the expected location for the SD sequence of five to ten bases upstream. We designated these as RS+1 genes. Additional analysis uncovered an unusual bias of the start codon in that the majority of the RS+1 genes used GUG, not AUG. Furthermore, of the 624 RS+1 genes whose SD sequence was associated with a free energy release of less than −8.4 kcal/mol (strong RS+1 genes), 384 were within 12 nucleotides upstream of in-frame initiation codons. The most likely explanation for the unexpected location of the SD sequence for these 384 genes is mis-annotation of the start codon. In this way, the new RS metric provides an improved method for gene sequence annotation. The remaining strong RS+1 genes appear to have their SD sequences in an unexpected location that includes the start codon. Thus, our RS metric provides a new way to explore the role of rRNA–mRNA nucleotide hybridization in translation initiation. PMID:16710451

  8. Human liver apolipoprotein B-100 cDNA: complete nucleic acid and derived amino acid sequence.

    PubMed Central

    Law, S W; Grant, S M; Higuchi, K; Hospattankar, A; Lackner, K; Lee, N; Brewer, H B

    1986-01-01

    Human apolipoprotein B-100 (apoB-100), the ligand on low density lipoproteins that interacts with the low density lipoprotein receptor and initiates receptor-mediated endocytosis and low density lipoprotein catabolism, has been cloned, and the complete nucleic acid and derived amino acid sequences have been determined. ApoB-100 cDNAs were isolated from normal human liver cDNA libraries utilizing immunoscreening as well as filter hybridization with radiolabeled apoB-100 oligodeoxynucleotides. The apoB-100 mRNA is 14.1 kilobases long encoding a mature apoB-100 protein of 4536 amino acids with a calculated amino acid molecular weight of 512,723. ApoB-100 contains 20 potential glycosylation sites, and 12 of a total of 25 cysteine residues are located in the amino-terminal region of the apolipoprotein providing a potential globular structure of the amino terminus of the protein. ApoB-100 contains relatively few regions of amphipathic helices, but compared to other human apolipoproteins it is enriched in beta-structure. The delineation of the entire human apoB-100 sequence will now permit a detailed analysis of the conformation of the protein, the low density lipoprotein receptor binding domain(s), and the structural relationship between apoB-100 and apoB-48 and will provide the basis for the study of genetic defects in apoB-100 in patients with dyslipoproteinemias. PMID:3464946

  9. Computer selection of oligonucleotide probes from amino acid sequences for use in gene library screening.

    PubMed

    Yang, J H; Ye, J H; Wallace, D C

    1984-01-11

    We present a computer program, FINPROBE, which utilizes known amino acid sequence data to deduce minimum redundancy oligonucleotide probes for use in screening cDNA or genomic libraries or in primer extension. The user enters the amino acid sequence of interest, the desired probe length, the number of probes sought, and the constraints on oligonucleotide synthesis. The computer generates a table of possible probes listed in increasing order of redundancy and provides the location of each probe in the protein and mRNA coding sequence. Activation of a next function provides the amino acid and mRNA sequences of each probe of interest as well as the complementary sequence and the minimum dissociation temperature of the probe. A final routine prints out the amino acid sequence of the protein in parallel with the mRNA sequence listing all possible codons for each amino acid.

  10. A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%.

    PubMed Central

    Mehta, P. K.; Heringa, J.; Argos, P.

    1995-01-01

    To improve secondary structure predictions in protein sequences, the information residing in multiple sequence alignments of substituted but structurally related proteins is exploited. A database comprised of 70 protein families and a total of 2,500 sequences, some of which were aligned by tertiary structural superpositions, was used to calculate residue exchange weight matrices within alpha-helical, beta-strand, and coil substructures, respectively. Secondary structure predictions were made based on the observed residue substitutions in local regions of the multiple alignments and the largest possible associated exchange weights in each of the three matrix types. Comparison of the observed and predicted secondary structure on a per-residue basis yielded a mean accuracy of 72.2%. Individual alpha-helix, beta-strand, and coil states were respectively predicted at 66.7, and 75.8% correctness, representing a well-balanced three-state prediction. The accuracy level, verified by cross-validation through jack-knife tests on all protein families, dropped, on average, to only 70.9%, indicating the rigor of the prediction procedure. On the basis of robustness, conceptual clarity, accuracy, and executable efficiency, the method has considerable advantage, especially with its sole reliance on amino acid substitutions within structurally related proteins. PMID:8580842

  11. PreSSAPro: a software for the prediction of secondary structure by amino acid properties.

    PubMed

    Costantini, Susan; Colonna, Giovanni; Facchiano, Angelo M

    2007-10-01

    PreSSAPro is a software, available to the scientific community as a free web service designed to provide predictions of secondary structures starting from the amino acid sequence of a given protein. Predictions are based on our recently published work on the amino acid propensities for secondary structures in either large but not homogeneous protein data sets, as well as in smaller but homogeneous data sets corresponding to protein structural classes, i.e. all-alpha, all-beta, or alpha-beta proteins. Predictions result improved by the use of propensities evaluated for the right protein class. PreSSAPro predicts the secondary structure according to the right protein class, if known, or gives a multiple prediction with reference to the different structural classes. The comparison of these predictions represents a novel tool to evaluate what sequence regions can assume different secondary structures depending on the structural class assignment, in the perspective of identifying proteins able to fold in different conformations. The service is available at the URL http://bioinformatica.isa.cnr.it/PRESSAPRO/.

  12. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  13. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  14. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  15. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  16. 37 CFR 1.822 - Symbols and format to be used for nucleotide and/or amino acid sequence data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... for nucleotide and/or amino acid sequence data. 1.822 Section 1.822 Patents, Trademarks, and... Amino Acid Sequences § 1.822 Symbols and format to be used for nucleotide and/or amino acid sequence data. (a) The symbols and format to be used for nucleotide and/or amino acid sequence data...

  17. Multilign: an algorithm to predict secondary structures conserved in multiple RNA sequences

    PubMed Central

    Xu, Zhenjiang; Mathews, David H.

    2011-01-01

    Motivation: With recent advances in sequencing, structural and functional studies of RNA lag behind the discovery of sequences. Computational analysis of RNA is increasingly important to reveal structure–function relationships with low cost and speed. The purpose of this study is to use multiple homologous sequences to infer a conserved RNA structure. Results: A new algorithm, called Multilign, is presented to find the lowest free energy RNA secondary structure common to multiple sequences. Multilign is based on Dynalign, which is a program that simultaneously aligns and folds two sequences to find the lowest free energy conserved structure. For Multilign, Dynalign is used to progressively construct a conserved structure from multiple pairwise calculations, with one sequence used in all pairwise calculations. A base pair is predicted only if it is contained in the set of low free energy structures predicted by all Dynalign calculations. In this way, Multilign improves prediction accuracy by keeping the genuine base pairs and excluding competing false base pairs. Multilign has computational complexity that scales linearly in the number of sequences. Multilign was tested on extensive datasets of sequences with known structure and its prediction accuracy is among the best of available algorithms. Multilign can run on long sequences (> 1500 nt) and an arbitrarily large number of sequences. Availability: The algorithm is implemented in ANSI C++ and can be downloaded as part of the RNAstructure package at: http://rna.urmc.rochester.edu Contact: david_mathews@urmc.rochester.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21193521

  18. Human retroviruses and AIDS 1996. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Foley, B.; Korber, B.; Mellors, J.W.; Jeang, K.T.; Wain-Hobson, S.

    1997-04-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) Nuclear Acid Alignments and Sequences; (2) Amino Acid Alignments; (3) Analysis; (4) Related Sequences; and (5) Database Communications. Information within all the parts is updated throughout the year on the Web site, http://hiv-web.lanl.gov. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions of the parts of the compendium, the user should read the individual introductions for each part.

  19. Transcriptome Sequencing in Response to Salicylic Acid in Salvia miltiorrhiza

    PubMed Central

    Zhang, Xiaoru; Dong, Juane; Liu, Hailong; Wang, Jiao; Qi, Yuexin; Liang, Zongsuo

    2016-01-01

    Salvia miltiorrhiza is a traditional Chinese herbal medicine, whose quality and yield are often affected by diseases and environmental stresses during its growing season. Salicylic acid (SA) plays a significant role in plants responding to biotic and abiotic stresses, but the involved regulatory factors and their signaling mechanisms are largely unknown. In order to identify the genes involved in SA signaling, the RNA sequencing (RNA-seq) strategy was employed to evaluate the transcriptional profiles in S. miltiorrhiza cell cultures. A total of 50,778 unigenes were assembled, in which 5,316 unigenes were differentially expressed among 0-, 2-, and 8-h SA induction. The up-regulated genes were mainly involved in stimulus response and multi-organism process. A core set of candidate novel genes coding SA signaling component proteins was identified. Many transcription factors (e.g., WRKY, bHLH and GRAS) and genes involved in hormone signal transduction were differentially expressed in response to SA induction. Detailed analysis revealed that genes associated with defense signaling, such as antioxidant system genes, cytochrome P450s and ATP-binding cassette transporters, were significantly overexpressed, which can be used as genetic tools to investigate disease resistance. Our transcriptome analysis will help understand SA signaling and its mechanism of defense systems in S. miltiorrhiza. PMID:26808150

  20. Next Generation Sequencing in Predicting Gene Function in Podophyllotoxin Biosynthesis*

    PubMed Central

    Marques, Joaquim V.; Kim, Kye-Won; Lee, Choonseok; Costa, Michael A.; May, Gregory D.; Crow, John A.; Davin, Laurence B.; Lewis, Norman G.

    2013-01-01

    Podophyllum species are sources of (−)-podophyllotoxin, an aryltetralin lignan used for semi-synthesis of various powerful and extensively employed cancer-treating drugs. Its biosynthetic pathway, however, remains largely unknown, with the last unequivocally demonstrated intermediate being (−)-matairesinol. Herein, massively parallel sequencing of Podophyllum hexandrum and Podophyllum peltatum transcriptomes and subsequent bioinformatics analyses of the corresponding assemblies were carried out. Validation of the assembly process was first achieved through confirmation of assembled sequences with those of various genes previously established as involved in podophyllotoxin biosynthesis as well as other candidate biosynthetic pathway genes. This contribution describes characterization of two of the latter, namely the cytochrome P450s, CYP719A23 from P. hexandrum and CYP719A24 from P. peltatum. Both enzymes were capable of converting (−)-matairesinol into (−)-pluviatolide by catalyzing methylenedioxy bridge formation and did not act on other possible substrates tested. Interestingly, the enzymes described herein were highly similar to methylenedioxy bridge-forming enzymes from alkaloid biosynthesis, whereas candidates more similar to lignan biosynthetic enzymes were catalytically inactive with the substrates employed. This overall strategy has thus enabled facile further identification of enzymes putatively involved in (−)-podophyllotoxin biosynthesis and underscores the deductive power of next generation sequencing and bioinformatics to probe and deduce medicinal plant biosynthetic pathways. PMID:23161544

  1. SeqAPASS (Sequence Alignment to Predict Across Species Susceptibility) software and documentation

    EPA Science Inventory

    SeqAPASS is a software application facilitates rapid and streamlined, yet transparent, comparisons of the similarity of toxicologically-significant molecular targets across species. The present application facilitates analysis of primary amino acid sequence similarity (including ...

  2. Human retroviruses and aids, 1992. A compilation and analysis of nucleic acid and amino acid sequences

    SciTech Connect

    Myers, G.; Korber, B.; Berzofsky, J.A.; Pavlakis, G.N.; Smith, R.F.

    1992-10-01

    This compendium and the accompanying floppy diskettes are the result of an effort to compile and rapidly publish all relevant molecular data concerning the human immunodeficiency viruses (HIV) and related retroviruses. The scope of the compendium and database is best summarized by the five parts that it comprises: (1) HIV and SIV Nucleotide Sequences; (H) Amino Acid Sequences; (III) Analyses; (IV) Related Sequences; and (V) Database Communications. information within all the parts is updated at least twice in each year, which accounts for the modes of binding and pagination in the compendium. While this publication could take the form of a review or sequence monograph, it is not so conceived. Instead, the literature from which the database is derived has simply been summarized and some elementary computational analyses have been performed upon the data. Interpretation and commentary have been avoided insofar as possible so that the reader can form his or her own judgments concerning the complex information. In addition to the general descriptions below of the parts of the compendium, the user should read the individual introductions for each part.

  3. Prediction of Out-of-Sequence Development by BSID Scores.

    ERIC Educational Resources Information Center

    Richards, Ruth C.; And Others

    The primary purpose of this study was to examine uneven early development in premature infants. A multiple regression analysis was performed in which birth weight, length of gestation, length of assisted feeding, and length of ventilation were used to predict the descrepancy between a child's Psychomotor and Mental Scale scores on the Bayley…

  4. Structure Prediction and Analysis of Neuraminidase Sequence Variants

    ERIC Educational Resources Information Center

    Thayer, Kelly M.

    2016-01-01

    Analyzing protein structure has become an integral aspect of understanding systems of biochemical import. The laboratory experiment endeavors to introduce protein folding to ascertain structures of proteins for which the structure is unavailable, as well as to critically evaluate the quality of the prediction obtained. The model system used is the…

  5. Robust prediction of B-factor profile from sequence using two-stage SVR based on random forest feature selection.

    PubMed

    Pan, Xiao-Yong; Shen, Hong-Bin

    2009-01-01

    B-factor is highly correlated with protein internal motion, which is used to measure the uncertainty in the position of an atom within a crystal structure. Although the rapid progress of structural biology in recent years makes more accurate protein structures available than ever, with the avalanche of new protein sequences emerging during the post-genomic Era, the gap between the known protein sequences and the known protein structures becomes wider and wider. It is urgent to develop automated methods to predict B-factor profile from the amino acid sequences directly, so as to be able to timely utilize them for basic research. In this article, we propose a novel approach, called PredBF, to predict the real value of B-factor. We firstly extract both global and local features from the protein sequences as well as their evolution information, then the random forests feature selection is applied to rank their importance and the most important features are inputted to a two-stage support vector regression (SVR) for prediction, where the initial predicted outputs from the 1(st) SVR are further inputted to the 2nd layer SVR for final refinement. Our results have revealed that a systematic analysis of the importance of different features makes us have deep insights into the different contributions of features and is very necessary for developing effective B-factor prediction tools. The two-layer SVR prediction model designed in this study further enhanced the robustness of predicting the B-factor profile. As a web server, PredBF is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/PredBF for academic use.

  6. A prediction of the amino acids and structures involved in DNA recognition by type I DNA restriction and modification enzymes.

    PubMed Central

    Sturrock, S S; Dryden, D T

    1997-01-01

    The S subunits of type I DNA restriction/modification enzymes are responsible for recognising the DNA target sequence for the enzyme. They contain two domains of approximately 150 amino acids, each of which is responsible for recognising one half of the bipartite asymmetric target. In the absence of any known tertiary structure for type I enzymes or recognisable DNA recognition motifs in the highly variable amino acid sequences of the S subunits, it has previously not been possible to predict which amino acids are responsible for sequence recognition. Using a combination of sequence alignment and secondary structure prediction methods to analyse the sequences of S subunits, we predict that all of the 51 known target recognition domains (TRDs) have the same tertiary structure. Furthermore, this structure is similar to the structure of the TRD of the C5-cytosine methyltransferase, Hha I, which recognises its DNA target via interactions with two short polypeptide loops and a beta strand. Our results predict the location of these sequence recognition structures within the TRDs of all type I S subunits. PMID:9254696

  7. Urinary intestinal fatty acid binding protein predicts necrotizing enterocolitis.

    PubMed

    Gregory, Katherine E; Winston, Abigail B; Yamamoto, Hidemi S; Dawood, Hassan Y; Fashemi, Titilayo; Fichorova, Raina N; Van Marter, Linda J

    2014-06-01

    Necrotizing enterocolitis, characterized by sudden onset and rapid progression, remains the most significant gastrointestinal disorder among premature infants. In seeking a predictive biomarker, we found intestinal fatty acid binding protein, an indicator of enterocyte damage, was substantially increased within three and seven days before the diagnosis of necrotizing enterocolitis.

  8. Comparative analysis of grapevine whole-genome gene predictions, functional annotation, categorization and integration of the predicted gene sequences

    PubMed Central

    2012-01-01

    Background The first draft assembly and gene prediction of the grapevine genome (8X base coverage) was made available to the scientific community in 2007, and functional annotation was developed on this gene prediction. Since then additional Sanger sequences were added to the 8X sequences pool and a new version of the genomic sequence with superior base coverage (12X) was produced. Results In order to more efficiently annotate the function of the genes predicted in the new assembly, it is important to build on as much of the previous work as possible, by transferring 8X annotation of the genome to the 12X version. The 8X and 12X assemblies and gene predictions of the grapevine genome were compared to answer the question, “Can we uniquely map 8X predicted genes to 12X predicted genes?” The results show that while the assemblies and gene structure predictions are too different to make a complete mapping between them, most genes (18,725) showed a one-to-one relationship between 8X predicted genes and the last version of 12X predicted genes. In addition, reshuffled genomic sequence structures appeared. These highlight regions of the genome where the gene predictions need to be taken with caution. Based on the new grapevine gene functional annotation and in-depth functional categorization, twenty eight new molecular networks have been created for VitisNet while the existing networks were updated. Conclusions The outcomes of this study provide a functional annotation of the 12X genes, an update of VitisNet, the system of the grapevine molecular networks, and a new functional categorization of genes. Data are available at the VitisNet website (http://www.sdstate.edu/ps/research/vitis/pathways.cfm). PMID:22554261

  9. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3

    PubMed Central

    Xiao, Jingfa; Hao, Lirui; Crowley, David E.; Zhang, Zhewen; Yu, Jun; Huang, Ning; Huo, Mingxin; Wu, Jiayan

    2015-01-01

    Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals. PMID:26301592

  10. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3.

    PubMed

    Wang, Xiaoyu; Chen, Meili; Xiao, Jingfa; Hao, Lirui; Crowley, David E; Zhang, Zhewen; Yu, Jun; Huang, Ning; Huo, Mingxin; Wu, Jiayan

    2015-01-01

    Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals.

  11. Representation of DNA sequences in genetic codon context with applications in exon and intron prediction.

    PubMed

    Yin, Changchuan

    2015-04-01

    To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.

  12. DNA sequencing and predictions of the cosmic theory of life

    NASA Astrophysics Data System (ADS)

    Wickramasinghe, N. Chandra

    2013-01-01

    The theory of cometary panspermia, developed by the late Sir Fred Hoyle and the present author argues that life originated cosmically as a unique event in one of a great multitude of comets or planetary bodies in the Universe. Life on Earth did not originate here but was introduced by impacting comets, and its further evolution was driven by the subsequent acquisition of cosmically derived genes. Explicit predictions of this theory published in 1979-1981, stating how the acquisition of new genes drives evolution, are compared with recent developments in relation to horizontal gene transfer, and the role of retroviruses in evolution. Precisely-stated predictions of the theory of cometary panspermia are shown to have been verified.

  13. DNA Sequencing and Predictions of the Cosmic Theory of Life

    NASA Astrophysics Data System (ADS)

    Wickramasinghe, N. Chandra

    The theory of cometary panspermia, developed by the late Sir Fred Hoyle and the present author argues that life originated cosmically as a unique event in one of a great multitude of comets or planetary bodies in the Universe. Life on Earth did not originate here but was introduced by impacting comets, and its further evolution was driven by the subsequent acquisition of cosmically derived genes. Explicit predictions of this theory published in 1979-1981, stating how the acquisition of new genes drives evolution, are compared with recent developments in relation to horizontal gene transfer, and the role of retroviruses in evolution. Precisely-stated predictions of the theory of cometary panspermia are shown to have been verified.

  14. Risk assessment prediction from genome sequences: promises and dreams.

    PubMed

    Wassenaar, Trudy M

    2004-09-01

    The application of bacterial genomics opens new avenues of research on foodborne pathogens. Foodborne pathogens must be able to colonize their hosts and survive transmission from host to host. Different groups of genes are involved in the processes of survival, colonization, and virulence, and such genes are potential targets for risk assessment and intervention strategies. Filtering from genome sequences the genes relevant to these processes is a major challenge, and although many tools are already available for analyses, this type of data mining is just beginning. For the simplest application, gene comparison, it is important to know how gene function, for instance in virulence, is being defined and tested. In other genomic applications, reserachers look for specific properties or characteristics of (virulence) genes to identify novel gene candidates. Each approach has pitfalls, and gene candidates must be tested in the lab to confirm their function. Models for colonization and virulence are available for most although not all pathogens. Models for survival and stress responses are needed to increase the utilization of genomic approaches to risk assessment. Here, I discuss how genome sequences are likely to help in microbial risk assessment of foodborne pathogens and how dreams may become promises.

  15. Completion of the amino acid sequence of the alpha 1 chain from type I calf skin collagen. Amino acid sequence of alpha 1(I)B8.

    PubMed Central

    Glanville, R W; Breitkreutz, D; Meitinger, M; Fietzek, P P

    1983-01-01

    The complete amino acid sequence of the 279-residue CNBr peptide CB8 from the alpha 1 chain of type I calf skin collagen is presented. It was determined by sequencing overlapping fragments of CB8 produced by Staphylococcus aureus V8 proteinase, trypsin, Endoproteinase Arg-C and hydroxylamine. Tryptic cleavages were also made specific for lysine by blocking arginine residues with cyclohexane-1,2-dione. This completes the amino acid sequence analysis of the 1054-residues-long alpha (I) chain of calf skin collagen. PMID:6354180

  16. Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction.

    PubMed

    Huang, Ying; Chen, Shi-Yi; Deng, Feilong

    2016-01-01

    In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.

  17. Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests

    PubMed Central

    Šikić, Mile; Tomić, Sanja; Vlahoviček, Kristian

    2009-01-01

    Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i) a combination of sequence- and structure-derived parameters and (ii) sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras–Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information. PMID:19180183

  18. Computational prediction of the tolerance to amino-acid deletion in green-fluorescent protein

    PubMed Central

    Jackson, Eleisha L.; Spielman, Stephanie J.

    2017-01-01

    Proteins evolve through two primary mechanisms: substitution, where mutations alter a protein’s amino-acid sequence, and insertions and deletions (indels), where amino acids are either added to or removed from the sequence. Protein structure has been shown to influence the rate at which substitutions accumulate across sites in proteins, but whether structure similarly constrains the occurrence of indels has not been rigorously studied. Here, we investigate the extent to which structural properties known to covary with protein evolutionary rates might also predict protein tolerance to indels. Specifically, we analyze a publicly available dataset of single—amino-acid deletion mutations in enhanced green fluorescent protein (eGFP) to assess how well the functional effect of deletions can be predicted from protein structure. We find that weighted contact number (WCN), which measures how densely packed a residue is within the protein’s three-dimensional structure, provides the best single predictor for whether eGFP will tolerate a given deletion. We additionally find that using protein design to explicitly model deletions results in improved predictions of functional status when combined with other structural predictors. Our work suggests that structure plays fundamental role in constraining deletions at sites in proteins, and further that similar biophysical constraints influence both substitutions and deletions. This study therefore provides a solid foundation for future work to examine how protein structure influences tolerance of more complex indel events, such as insertions or large deletions. PMID:28369116

  19. Complete amino acid sequence and structure characterization of the taste-modifying protein, miraculin.

    PubMed

    Theerasilp, S; Hitotsuya, H; Nakajo, S; Nakaya, K; Nakamura, Y; Kurihara, Y

    1989-04-25

    The taste-modifying protein, miraculin, has the unusual property of modifying sour taste into sweet taste. The complete amino acid sequence of miraculin purified from miracle fruits by a newly developed method (Theerasilp, S., and Kurihara, Y. (1988) J. Biol. Chem. 263, 11536-11539) was determined by an automatic Edman degradation method. Miraculin was a single polypeptide with 191 amino acid residues. The calculated molecular weight based on the amino acid sequence and the carbohydrate content (13.9%) was 24,600. Asn-42 and Asn-186 were linked N-glycosidically to carbohydrate chains. High homology was found between the amino acid sequences of miraculin and soybean trypsin inhibitor.

  20. Three Dimensional Structure Prediction of Fatty Acid Binding Site on Human Transmembrane Receptor CD36.

    PubMed

    Tarhda, Zineb; Semlali, Oussama; Kettani, Anas; Moussa, Ahmed; Abumrad, Nada A; Ibrahimi, Azeddine

    2013-01-01

    CD36 is an integral membrane protein which is thought to have a hairpin-like structure with alpha-helices at the C and N terminals projecting through the membrane as well as a larger extracellular loop. This receptor interacts with a number of ligands including oxidized low density lipoprotein and long chain fatty acids (LCFAs). It is also implicated in lipid metabolism and heart diseases. It is therefore important to determine the 3D structure of the CD36 site involved in lipid binding. In this study, we predict the 3D structure of the fatty acid (FA) binding site [127-279 aa] of the CD36 receptor based on homology modeling with X-ray structure of Human Muscle Fatty Acid Binding Protein (PDB code: 1HMT). Qualitative and quantitative analysis of the resulting model suggests that this model was reliable and stable, taking in consideration over 97.8% of the residues in the most favored regions as well as the significant overall quality factor. Protein analysis, which relied on the secondary structure prediction of the target sequence and the comparison of 1HMT and CD36 [127-279 aa] secondary structures, led to the determination of the amino acid sequence consensus. These results also led to the identification of the functional sites on CD36 and revealed the presence of residues which may play a major role during ligand-protein interactions.

  1. Three Dimensional Structure Prediction of Fatty Acid Binding Site on Human Transmembrane Receptor CD36

    PubMed Central

    Tarhda, Zineb; Semlali, Oussama; Kettani, Anas; Moussa, Ahmed; Abumrad, Nada A.; Ibrahimi, Azeddine

    2013-01-01

    CD36 is an integral membrane protein which is thought to have a hairpin-like structure with alpha-helices at the C and N terminals projecting through the membrane as well as a larger extracellular loop. This receptor interacts with a number of ligands including oxidized low density lipoprotein and long chain fatty acids (LCFAs). It is also implicated in lipid metabolism and heart diseases. It is therefore important to determine the 3D structure of the CD36 site involved in lipid binding. In this study, we predict the 3D structure of the fatty acid (FA) binding site [127–279 aa] of the CD36 receptor based on homology modeling with X-ray structure of Human Muscle Fatty Acid Binding Protein (PDB code: 1HMT). Qualitative and quantitative analysis of the resulting model suggests that this model was reliable and stable, taking in consideration over 97.8% of the residues in the most favored regions as well as the significant overall quality factor. Protein analysis, which relied on the secondary structure prediction of the target sequence and the comparison of 1HMT and CD36 [127–279 aa] secondary structures, led to the determination of the amino acid sequence consensus. These results also led to the identification of the functional sites on CD36 and revealed the presence of residues which may play a major role during ligand-protein interactions. PMID:24348024

  2. Detection and isolation of nucleic acid sequences using a bifunctional hybridization probe

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    2000-01-01

    A method for detecting and isolating a target sequence in a sample of nucleic acids is provided using a bifunctional hybridization probe capable of hybridizing to the target sequence that includes a detectable marker and a first complexing agent capable of forming a binding pair with a second complexing agent. A kit is also provided for detecting a target sequence in a sample of nucleic acids using a bifunctional hybridization probe according to this method.

  3. Complete amino acid sequence of a Lolium perenne (perennial rye grass) pollen allergen, Lol p II.

    PubMed

    Ansari, A A; Shenbagamurthi, P; Marsh, D G

    1989-07-05

    The complete amino acid sequence of a Lolium perenne (rye grass) pollen allergen, Lol p II was determined by automated Edman degradation of the protein and selected fragments. Cleavage of the protein by enzymatic and chemical techniques established an unambiguous sequence for the protein. Lol p II contains 97 amino acid residues, with a calculated molecular weight of 10,882. The protein lacks cysteine and glutamine and shows no evidence of glycosylation. Theoretical predictions by Fraga's (Fraga, S. (1982) Can. J. Chem. 60, 2606-2610) and Hopp and Woods' (Hopp, T. P., and Woods, K. R. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 3824-3828) methods indicate the presence of four hydrophilic regions, which may contribute to sequential or parts of conformational B-cell epitopes. Analysis of amphipathic regions by Berzofsky's method indicates the presence of a highly amphipathic region, which may contain, or contribute to, an Ia/T-cell epitope. This latter segment of Lol p II was found to be highly homologous with an antibody-binding segment of the major rye allergen Lol p I and may explain why immune responsiveness to both the allergens is associated with HLA-DR3.

  4. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data.

    PubMed

    Hawkins, Troy; Chitale, Meghana; Luban, Stanislav; Kihara, Daisuke

    2009-02-15

    Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence-based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI-BLAST search by extracting and scoring individual functional attributes, searching a wide range of E-value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low-resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome-scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60-90% could be annotated with a GO molecular function term at high confidence (>or=80%). We also applied our predictions to the protein-protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (>or=90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI-BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP-SIG '05 and CASP7 function (FN) assessments. PFP is available as a web service at http://dragon.bio.purdue.edu/pfp/.

  5. Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns

    PubMed Central

    Vingron, Martin

    2016-01-01

    Non-methylated islands (NMIs) of DNA are genomic regions that are important for gene regulation and development. A recent study of genome-wide non-methylation data in vertebrates by Long et al. (eLife 2013;2:e00348) has shown that many experimentally identified non-methylated regions do not overlap with classically defined CpG islands which are computationally predicted using simple DNA sequence features. This is especially true in cold-blooded vertebrates such as Danio rerio (zebrafish). In order to investigate how predictive DNA sequence is of a region’s methylation status, we applied a supervised learning approach using a spectrum kernel support vector machine, to see if a more complex model and supervised learning can be used to improve non-methylated island prediction and to understand the sequence properties of these regions. We demonstrate that DNA sequence is highly predictive of methylation status, and that in contrast to existing CpG island prediction methods our method is able to provide more useful predictions of NMIs genome-wide in all vertebrate organisms that were studied. Our results also show that in cold-blooded vertebrates (Anolis carolinensis, Xenopus tropicalis and Danio rerio) where genome-wide classical CpG island predictions consist primarily of false positives, longer primarily AT-rich DNA sequence features are able to identify these regions much more accurately. PMID:27984582

  6. Affinity regression predicts the recognition code of nucleic acid binding proteins

    PubMed Central

    Pelossof, Raphael; Singh, Irtisha; Yang, Julie L.; Weirauch, Matthew T.; Hughes, Timothy R.; Leslie, Christina S.

    2016-01-01

    Predicting the affinity profiles of nucleic acid-binding proteins directly from the protein sequence is a major unsolved problem. We present a statistical approach for learning the recognition code of a family of transcription factors (TFs) or RNA-binding proteins (RBPs) from high-throughput binding assays. Our method, called affinity regression, trains on protein binding microarray (PBM) or RNA compete experiments to learn an interaction model between proteins and nucleic acids, using only protein domain and probe sequences as inputs. By training on mouse homeodomain PBM profiles, our model correctly identifies residues that confer DNA-binding specificity and accurately predicts binding motifs for an independent set of divergent homeodomains. Similarly, learning from RNA compete profiles for diverse RBPs, our model can predict the binding affinities of held-out proteins and identify key RNA-binding residues. More broadly, we envision applying our method to model and predict biological interactions in any setting where there is a high-throughput ‘affinity’ readout. PMID:26571099

  7. Amino acid sequences of alpha-helical segments from S-carbosymethylkerateine-A. Complete sequence of a type-I segment.

    PubMed Central

    Gough, K H; Inglis, A S; Crewther, W G

    1978-01-01

    The amino acid sequence of a type-I helical segment from the low-sulphur protein (S-carboxymethylkerateine-A) of wool was determined by combining automatic and manual-sequencing data. Whereas in the type-II helical segment most of the cationic groups occur in pairs, 11 of the 22 anionic residues in the sequence of the type-I segment were situated next to a second anionic residue. This suggests possible interactions between type-I and type-II helical segments in alpha-keratin. As observed with the sequence of a type-II helical segment a model constructed on 3.6 residues per turn of helix shows a line of hydrophobic residues along the helix, thereby supporting the physicochemical evidence that the molecule is predominantly helical and forms part of a coiled-coil structure. Examination of the sequence data by predictive methods indicates the possibilty of extensive sections of alpha-helix interspersed with discontinuities. The molecule contains a number of regions with peptide sequences identical with those found by other workers after enzymic digestion of fractions from oxidized wool. Images Fig. 1. PMID:697725

  8. FTIR spectroscopy and sequence prediction: Structure of human α2-macroglobulin

    NASA Astrophysics Data System (ADS)

    Dukor, Rina K.; Liebman, Michael N.; Yuan, Anna I.; Feinman, Richard D.

    1998-06-01

    The structure of a plasma proteinase inhibitor α2-Macroglobulin (α2m) is determined by FTIR spectroscopy and a number of sequence-structure prediction algorithms. In addition, α2M dimers and complexes with methylamine and trypsin are examined. Our FTIR results estimate a helix content of 5-15% and a β-sheet content of 28-36%. None of the sequence prediction algorithms used in this study predicted values close to experimental data. Considerable differences in the FTIR spectra of α2M dimer are observed and somewhat smaller changes are seen upon reaction of α2M with methylamine and dithiodipyridine (DTP).

  9. Theoretical prediction of binding modes and hot sequences for allopsoralen DNA interaction

    NASA Astrophysics Data System (ADS)

    Méndez, Patricia Saenz; Guedes, Rita C.; dos Santos, Daniel J. V. A.; Eriksson, Leif A.

    2007-12-01

    Molecular docking studies of two duplex DNA sequences as target fragments and allopsoralen as ligand were performed. The calculated interaction energies showed that the ligand can be docked into the minor groove as well as become intercalated. However, unlike psoralen, allopsoralen preferred binding mode for non-poly-TA sequences is minor groove binding. Calculated energies for intercalation between different base pairs suggest that the predicted sequence selectivity for allopsoralen is analogous to that observed for psoralen. Intercalation is favored in 5'-TpA sites in poly-TA sequences.

  10. Prediction of human rotavirus serotype by nucleotide sequence analysis of the VP7 protein gene.

    PubMed Central

    Green, K Y; Sears, J F; Taniguchi, K; Midthun, K; Hoshino, Y; Gorziglia, M; Nishikawa, K; Urasawa, S; Kapikian, A Z; Chanock, R M

    1988-01-01

    Human rotavirus field isolates were characterized by direct sequence analysis of the gene encoding the serotype-specific major neutralization protein (VP7). Single-stranded RNA transcripts were prepared from virus particles obtained directly from stool specimens or after two or three passages in MA-104 cells. Two regions of the gene (nucleotides 307 through 351 and 670 through 711) which had previously been shown to contain regions of sequence divergence among rotavirus serotypes were sequenced by the dideoxynucleotide method with two different synthetic oligonucleotide primers. The resulting nucleotide sequences were compared with the corresponding sequences from rotaviruses of known serotype (serotype 1, 2, 3, or 4). A total of 25 field isolates and 10 laboratory strains examined by this method exhibited marked sequence identity in both areas of the gene with the corresponding regions of 1 of the 4 reference strains. In addition, the predicted serotype from the sequence analysis correlated in each case with the serotype determined when the rotaviruses were examined by plaque reduction neutralization or reactivity with serotype-specific monoclonal antibodies. These data suggest that as a result of the high degree of sequence conservation observed among rotaviruses of the same serotype, it is possible to predict the serotype of a rotavirus isolate by direct sequence analysis of its VP7 gene. PMID:2833626

  11. Predicting subcellular location of apoptosis proteins with pseudo amino acid composition: approach from amino acid substitution matrix and auto covariance transformation.

    PubMed

    Yu, Xiaoqing; Zheng, Xiaoqi; Liu, Taigang; Dou, Yongchao; Wang, Jun

    2012-05-01

    Apoptosis proteins are very important for understanding the mechanism of programmed cell death. Obtaining information on subcellular location of apoptosis proteins is very helpful to understand the apoptosis mechanism. In this paper, based on amino acid substitution matrix and auto covariance transformation, we introduce a new sequence-based model, which not only quantitatively describes the differences between amino acids, but also partially incorporates the sequence-order information. This method is applied to predict the apoptosis proteins' subcellular location of two widely used datasets by the support vector machine classifier. The results obtained by jackknife test are quite promising, indicating that the proposed method might serve as a potential and efficient prediction model for apoptosis protein subcellular location prediction.

  12. [Cloning of full-length coding sequence of tree shrew CD4 and prediction of its molecular characteristics].

    PubMed

    Tian, Wei-Wei; Gao, Yue-Dong; Guo, Yan; Huang, Jing-Fei; Xiao, Chang; Li, Zuo-Sheng; Zhang, Hua-Tang

    2012-02-01

    The tree shrews, as an ideal animal model receiving extensive attentions to human disease research, demands essential research tools, in particular cellular markers and monoclonal antibodies for immunological studies. In this paper, a 1 365 bp of the full-length CD4 cDNA encoding sequence was cloned from total RNA in peripheral blood of tree shrews, the sequence completes two unknown fragment gaps of tree shrews predicted CD4 cDNA in the GenBank database, and its molecular characteristics were analyzed compared with other mammals by using biology software such as Clustal W2.0 and so forth. The results showed that the extracellular and intracellular domains of tree shrews CD4 amino acid sequence are conserved. The tree shrews CD4 amino acid sequence showed a close genetic relationship with Homo sapiens and Macaca mulatta. Most regions of the tree shrews CD4 molecule surface showed positive charges as humans. However, compared with CD4 extracellular domain D1 of human, CD4 D1 surface of tree shrews showed more negative charges, and more two N-glycosylation sites, which may affect antibody binding. This study provides a theoretical basis for the preparation and functional studies of CD4 monoclonal antibody.

  13. Trichomonas vaginalis acidic phospholipase A2: isolation and partial amino acid sequence.

    PubMed

    Escobedo-Guajardo, Brenda L; González-Salazar, Francisco; Palacios-Corona, Rebeca; Torres de la Cruz, Víctor M; Morales-Vallarta, Mario; Mata-Cárdenas, Benito D; Garza-González, Jesús N; Rivera-Silva, Gerardo; Vargas-Villarreal, Javier

    2013-12-01

    Sexually transmitted diseases are a major cause of acute disease worldwide, and trichomoniasis is the most common and curable disease, generating more than 170 million cases annually worldwide. Trichomonas vaginalis is the causal agent of trichomoniasis and has the ability to destroy in vitro cell monolayers of the vaginal mucosa, where the phospholipases A2 (PLA2) have been reported as potential virulence factors. These enzymes have been partially characterized from the subcellular fraction S30 of pathogenic T. vaginalis strains. The main objective of this study was to purify a phospholipase A2 from T. vaginalis, make a partial characterization, obtain a partial amino acid sequence, and determine its enzymatic participation as hemolytic factor causing lysis of erythrocytes. Trichomonas S30, RF30 and UFF30 sub-fractions from GT-15 strain have the capacity to hydrolyze [2-(14)C-PA]-PC at pH 6.0. Proteins from the UFF30 sub-fraction were separated by affinity chromatography into two eluted fractions with detectable PLA A2 activity. The EDTA-eluted fraction was analyzed by HPLC using on-line HPLC-tandem mass spectrometry and two protein peaks were observed at 8.2 and 13 kDa. Peptide sequences were identified from the proteins present in the eluted EDTA UFF30 fraction; bioinformatic analysis using Protein Link Global Server charged with T. vaginalis protein database suggests that eluted peptides correspond a putative ubiquitin protein in the 8.2 kDa fraction and a phospholipase preserved in the 13 kDa fraction. The EDTA-eluted fraction hydrolyzed [2-(14)C-PA]-PC lyses erythrocytes from Sprague-Dawley in a time and dose-dependent manner. The acidic hemolytic activity decreased by 84% with the addition of 100 μM of Rosenthal's inhibitor.

  14. Frequencies of amino acid strings in globular protein sequences indicate suppression of blocks of consecutive hydrophobic residues

    PubMed Central

    Schwartz, Russell; Istrail, Sorin; King, Jonathan

    2001-01-01

    Patterns of hydrophobic and hydrophilic residues play a major role in protein folding and function. Long, predominantly hydrophobic strings of 20–22 amino acids each are associated with transmembrane helices and have been used to identify such sequences. Much less attention has been paid to hydrophobic sequences within globular proteins. In prior work on computer simulations of the competition between on-pathway folding and off-pathway aggregate formation, we found that long sequences of consecutive hydrophobic residues promoted aggregation within the model, even controlling for overall hydrophobic content. We report here on an analysis of the frequencies of different lengths of contiguous blocks of hydrophobic residues in a database of amino acid sequences of proteins of known structure. Sequences of three or more consecutive hydrophobic residues are found to be significantly less common in actual globular proteins than would be predicted if residues were selected independently. The result may reflect selection against long blocks of hydrophobic residues within globular proteins relative to what would be expected if residue hydrophobicities were independent of those of nearby residues in the sequence. PMID:11316883

  15. Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto-encoder deep neural network.

    PubMed

    Lyons, James; Dehzangi, Abdollah; Heffernan, Rhys; Sharma, Alok; Paliwal, Kuldip; Sattar, Abdul; Zhou, Yaoqi; Yang, Yuedong

    2014-10-30

    Because a nearly constant distance between two neighbouring Cα atoms, local backbone structure of proteins can be represented accurately by the angle between C(αi-1)-C(αi)-C(αi+1) (θ) and a dihedral angle rotated about the C(αi)-C(αi+1) bond (τ). θ and τ angles, as the representative of structural properties of three to four amino-acid residues, offer a description of backbone conformations that is complementary to φ and ψ angles (single residue) and secondary structures (>3 residues). Here, we report the first machine-learning technique for sequence-based prediction of θ and τ angles. Predicted angles based on an independent test have a mean absolute error of 9° for θ and 34° for τ with a distribution on the θ-τ plane close to that of native values. The average root-mean-square distance of 10-residue fragment structures constructed from predicted θ and τ angles is only 1.9Å from their corresponding native structures. Predicted θ and τ angles are expected to be complementary to predicted ϕ and ψ angles and secondary structures for using in model validation and template-based as well as template-free structure prediction. The deep neural network learning technique is available as an on-line server called Structural Property prediction with Integrated DEep neuRal network (SPIDER) at http://sparks-lab.org.

  16. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1998-03-24

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.

  17. Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1998-01-01

    A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.

  18. The amino acid sequence of protein CM-3 from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Joubert, F J

    1985-01-01

    Protein CM-3 from Dendroaspis polylepis polylepis venom was purified by gel filtration and ion exchange chromatography. It comprises 65 amino acids including eight half-cystines. The complete amino acid sequence of protein CM-3 has been elucidated. The sequence (residues 1-50) resembles that of the N-terminal sequence of the subunits of a synergistic type protein and residues 51-65 that of the C-terminal sequence of an angusticeps type protein. Mixtures of protein CM-3 and angusticeps type proteins showed no apparent synergistic effect, in that their toxicity in combination was no greater than the sum of their individual toxicities.

  19. The amino acid sequences of the Fd fragments of two human γ heavy chains

    PubMed Central

    Press, E. M.; Hogg, N. M.

    1970-01-01

    The amino acid sequences of the Fd fragments of two human pathological immunoglobulins of the immunoglobulin G1 class are reported. Comparison of the two sequences shows that the heavy-chain variable regions are similar in length to those of the light chains. The existence of heavy chain variable region subgroups is also deduced, from a comparison of these two sequences with those of another γ 1 chain, Eu, a μ chain, Ou, and the partial sequence of a fourth γ 1 chain, Ste. Carbohydrate has been found to be linked to an aspartic acid residue in the variable region of one of the γ 1 chains, Cor. PMID:5449120

  20. EST-PAC a web package for EST annotation and protein sequence prediction.

    PubMed

    Strahm, Yvan; Powell, David; Lefèvre, Christophe

    2006-10-12

    With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST) from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST) annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1) searching local or remote biological databases for sequence similarities using Blast services, 2) predicting protein coding sequence from EST data and, 3) annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics.

  1. Developmental variation and amino acid sequences of cytochromes c of the fruit fly Drosophila melanogaster and the flesh fly Boettcherisca peregrina.

    PubMed

    Inoue, S; Inoue, H; Hiroyoshi, T; Matsubara, H; Yamanaka, T

    1986-10-01

    The amino acid sequences of cytochromes c purified from the fruit fly Drosophila melanogaster and the flesh fly Boettcherisca peregrina were determined. In contrast with the case of the housefly, isocytochromes c were not detected in these flies at any developmental stage. The sequence of fruit fly cytochrome c differed from that reported previously but was identical with that predicted from the nucleotide sequence of the fruit fly cytochrome c gene (DC4) (Limbach, K.J. & Wu, R. (1985) Nucl. Acids Res. 13, 631-644). Isocytochrome c of the fruit fly, reported to be encoded by the DC3 gene, was not detected as a functional cytochrome c molecule.

  2. State of the art and challenges in sequence based T-cell epitope prediction

    PubMed Central

    2010-01-01

    Sequence based T-cell epitope predictions have improved immensely in the last decade. From predictions of peptide binding to major histocompatibility complex molecules with moderate accuracy, limited allele coverage, and no good estimates of the other events in the antigen-processing pathway, the field has evolved significantly. Methods have now been developed that produce highly accurate binding predictions for many alleles and integrate both proteasomal cleavage and transport events. Moreover have so-called pan-specific methods been developed, which allow for prediction of peptide binding to MHC alleles characterized by limited or no peptide binding data. Most of the developed methods are publicly available, and have proven to be very useful as a shortcut in epitope discovery. Here, we will go through some of the history of sequence-based predictions of helper as well as cytotoxic T cell epitopes. We will focus on some of the most accurate methods and their basic background. PMID:21067545

  3. Phenotype-optimized sequence ensembles substantially improve prediction of disease-causing mutation in cystic fibrosis.

    PubMed

    Masica, David L; Sosnay, Patrick R; Cutting, Garry R; Karchin, Rachel

    2012-08-01

    Cystic fibrosis transmembrane conductance regulator (CFTR) mutation is associated with a phenotypic spectrum that includes cystic fibrosis (CF). The disease liability of some common CFTR mutations is known, but rare mutations are seen in too few patients to categorize unequivocally, making genetic diagnosis difficult. Computational methods can predict the impact of mutation, but prediction specificity is often below that required for clinical utility. Here, we present a novel supervised learning approach for predicting CF from CFTR missense mutation. The algorithm begins by constructing custom multiple sequence alignments called phenotype-optimized sequence ensembles (POSEs). POSEs are constructed iteratively, by selecting sequences that optimize predictive performance on a training set of CFTR mutations of known clinical significance. Next, we predict CF disease liability from a different set of CFTR mutations (test-set mutations). This approach achieves improved prediction performance relative to popular methods recently assessed using the same test-set mutations. Of clinical significance, our method achieves 94% prediction specificity. Because databases such as HGMD and locus-specific mutation databases are growing rapidly, methods that automatically tailor their predictions for a specific phenotype may be of immediate utility. If the performance achieved here generalizes to other systems, the approach could be an excellent tool to help establish genetic diagnoses.

  4. The Chinese hamster Alu-equivalent sequence: a conserved highly repetitious, interspersed deoxyribonucleic acid sequence in mammals has a structure suggestive of a transposable element.

    PubMed Central

    Haynes, S R; Toomey, T P; Leinwand, L; Jelinek, W R

    1981-01-01

    A consensus sequence has been determined for a major interspersed deoxyribonucleic acid repeat in the genome of Chinese hamster ovary cells (CHO cells). This sequence is extensively homologous to (i) the human Alu sequence (P. L. Deininger et al., J. Mol. Biol., in press), (ii) the mouse B1 interspersed repetitious sequence (Krayev et al., Nucleic Acids Res. 8:1201-1215, 1980) (iii) an interspersed repetitious sequence from African green monkey deoxyribonucleic acid (Dhruva et al., Proc. Natl. Acad. Sci. U.S.A. 77:4514-4518, 1980) and (iv) the CHO and mouse 4.5S ribonucleic acid (this report; F. Harada and N. Kato, Nucleic Acids Res. 8:1273-1285, 1980). Because the CHO consensus sequence shows significant homology to the human Alu sequence it is termed the CHO Alu-equivalent sequence. A conserved structure surrounding CHO Alu-equivalent family members can be recognized. It is similar to that surrounding the human Alu and the mouse B1 sequences, and is represented as follows: direct repeat-CHO-Alu-A-rich sequence-direct repeat. A composite interspersed repetitious sequence has been identified. Its structure is represented as follows: direct repeat-residue 47 to 107 of CHO-Alu-non-Alu repetitious sequence-A-rich sequence-direct repeat. Because the Alu flanking sequences resemble those that flank known transposable elements, we think it likely that the Alu sequence dispersed throughout the mammalian genome by transposition. Images PMID:9279371

  5. Predicting the Genetic Stability of Engineered DNA Sequences with the EFM Calculator.

    PubMed

    Jack, Benjamin R; Leonard, Sean P; Mishler, Dennis M; Renda, Brian A; Leon, Dacia; Suárez, Gabriel A; Barrick, Jeffrey E

    2015-08-21

    Unwanted evolution can rapidly degrade the performance of genetically engineered circuits and metabolic pathways installed in living organisms. We created the Evolutionary Failure Mode (EFM) Calculator to computationally detect common sources of genetic instability in an input DNA sequence. It predicts two types of mutational hotspots: deletions mediated by homologous recombination and indels caused by replication slippage on simple sequence repeats. We tested the performance of our algorithm on genetic circuits that were previously redesigned for greater evolutionary reliability and analyzed the stability of sequences in the iGEM Registry of Standard Biological Parts. More than half of the parts in the Registry are predicted to experience >100-fold elevated mutation rates due to the inclusion of unstable sequence configurations. We anticipate that the EFM Calculator will be a useful negative design tool for avoiding volatile DNA encodings, thereby increasing the evolutionary lifetimes of synthetic biology devices.

  6. Severe accident source term characteristics for selected Peach Bottom sequences predicted by the MELCOR Code

    SciTech Connect

    Carbajo, J.J.

    1993-09-01

    The purpose of this report is to compare in-containment source terms developed for NUREG-1159, which used the Source Term Code Package (STCP), with those generated by MELCOR to identify significant differences. For this comparison, two short-term depressurized station blackout sequences (with a dry cavity and with a flooded cavity) and a Loss-of-Coolant Accident (LOCA) concurrent with complete loss of the Emergency Core Cooling System (ECCS) were analyzed for the Peach Bottom Atomic Power Station (a BWR-4 with a Mark I containment). The results indicate that for the sequences analyzed, the two codes predict similar total in-containment release fractions for each of the element groups. However, the MELCOR/CORBH Package predicts significantly longer times for vessel failure and reduced energy of the released material for the station blackout sequences (when compared to the STCP results). MELCOR also calculated smaller releases into the environment than STCP for the station blackout sequences.

  7. Genomic prediction for beef fatty acid profile in Nellore cattle.

    PubMed

    Chiaia, Hermenegildo Lucas Justino; Peripoli, Elisa; Silva, Rafael Medeiros de Oliveira; Aboujaoude, Carolyn; Feitosa, Fabiele Loise Braga; Lemos, Marcos Vinicius Antunes de; Berton, Mariana Piatto; Olivieri, Bianca Ferreira; Espigolan, Rafael; Tonussi, Rafael Lara; Gordo, Daniel Gustavo Mansan; Bresolin, Tiago; Magalhães, Ana Fabrícia Braga; Júnior, Gerardo Alves Fernandes; Albuquerque, Lúcia Galvão de; Oliveira, Henrique Nunes de; Furlan, Joyce de Jesus Mangini; Ferrinho, Adrielle Mathias; Mueller, Lenise Freitas; Tonhati, Humberto; Pereira, Angélica Simone Cravo; Baldi, Fernando

    2017-06-01

    The objective of this study was to compare SNP-BLUP, BayesCπ, BayesC and Bayesian Lasso methodologies to predict the direct genomic value for saturated, monounsaturated, and polyunsaturated fatty acid profile, omega 3 and 6 in the Longissimus thoracis muscle of Nellore cattle finished in feedlot. A total of 963 Nellore bulls with phenotype for fatty acid profiles, were genotyped using the Illumina BovineHD BeadChip (Illumina, San Diego, CA) with 777,962 SNP. The predictive ability was evaluated using cross validation. To compare the methodologies, the correlation between DGV and pseudo-phenotypes was calculated. The accuracy varied from -0.40 to 0.62. Our results indicate that none of the methods excelled in terms of accuracy, however, the SNP-BLUP method allows obtaining less biased genomic evaluations, thereby; this method is more feasible when taking into account the analyses' operating cost. Despite the lowest bias observed for EBV, the adjusted phenotype is the preferred pseudophenotype considering the genomic prediction accuracies regarding the context of the present study.

  8. The amino acid sequence of goat beta-lactoglobulin.

    PubMed

    Préaux, G; Braunitzer, G; Schrank, B; Stangl, A

    1979-11-01

    The isolation of beta-lactoglobulin from milk of the goat is described. The purified protein was checked for purity and has been characterized by its gross composition and end groups. The native or the modified protein was then degraded by tryptic and cyanogen bromide cleavage. The cleavage products were isolated and sequenced in the sequenator using a Quadrol and propyne program. These data provide the complete sequence of beta-lactoglobulin of the goat. The results are discussed and compared particularly with bovine beta-lactoglobulin components AB. Some biological aspects are described.

  9. Layered materials with coexisting acidic and basic sites for catalytic one-pot reaction sequences.

    PubMed

    Motokura, Ken; Tada, Mizuki; Iwasawa, Yasuhiro

    2009-06-17

    Acidic montmorillonite-immobilized primary amines (H-mont-NH(2)) were found to be excellent acid-base bifunctional catalysts for one-pot reaction sequences, which are the first materials with coexisting acid and base sites active for acid-base tamdem reactions. For example, tandem deacetalization-Knoevenagel condensation proceeded successfully with the H-mont-NH(2), affording the corresponding condensation product in a quantitative yield. The acidity of the H-mont-NH(2) was strongly influenced by the preparation solvent, and the base-catalyzed reactions were enhanced by interlayer acid sites.

  10. Synthesis of gamma,delta-unsaturated glycolic acids via sequenced brook and Ireland--claisen rearrangements.

    PubMed

    Schmitt, Daniel C; Johnson, Jeffrey S

    2010-03-05

    Organozinc, -magnesium, and -lithium nucleophiles initiate a Brook/Ireland-Claisen rearrangement sequence of allylic silyl glyoxylates resulting in the formation of gamma,delta-unsaturated alpha-silyloxy acids.

  11. Computer Simulation of the Determination of Amino Acid Sequences in Polypeptides

    ERIC Educational Resources Information Center

    Daubert, Stephen D.; Sontum, Stephen F.

    1977-01-01

    Describes a computer program that generates a random string of amino acids and guides the student in determining the correct sequence of a given protein by using experimental analytic data for that protein. (MLH)

  12. In Silico Prediction of Scaffold/Matrix Attachment Regions in Large Genomic Sequences

    PubMed Central

    Frisch, Matthias; Frech, Kornelie; Klingenhoff, Andreas; Cartharius, Kerstin; Liebich, Ines; Werner, Thomas

    2002-01-01

    Scaffold/matrix attachment regions (S/MARs) are essential regulatory DNA elements of eukaryotic cells. They are major determinants of locus control of gene expression and can shield gene expression from position effects. Experimental detection of S/MARs requires substantial effort and is not suitable for large-scale screening of genomic sequences. In silico prediction of S/MARs can provide a crucial first selection step to reduce the number of candidates. We used experimentally defined S/MAR sequences as the training set and generated a library of new S/MAR-associated, AT-rich patterns described as weight matrices. A new tool called SMARTest was developed that identifies potential S/MARs by performing a density analysis based on the S/MAR matrix library (http://www.genomatix.de/cgi-bin/smartest_pd/smartest.pl). S/MAR predictions were evaluated by using six genomic sequences from animal and plant for which S/MARs and non-S/MARs were experimentally mapped. SMARTest reached a sensitivity of 38% and a specificity of 68%. In contrast to previous algorithms, the SMARTest approach does not depend on the sequence context and is suitable to analyze long genomic sequences up to the size of whole chromosomes. To demonstrate the feasibility of large-scale S/MAR prediction, we analyzed the recently published chromosome 22 sequence and found 1198 S/MAR candidates. PMID:11827955

  13. Genome sequence of the acid-tolerant strain Rhizobium sp. LPU83.

    PubMed

    Wibberg, Daniel; Tejerizo, Gonzalo Torres; Del Papa, María Florencia; Martini, Carla; Pühler, Alfred; Lagares, Antonio; Schlüter, Andreas; Pistorio, Mariano

    2014-04-20

    Rhizobia are important members of the soil microbiome since they enter into nitrogen-fixing symbiosis with different legume host plants. Rhizobium sp. LPU83 is an acid-tolerant Rhizobium strain featuring a broad-host-range. However, it is ineffective in nitrogen fixation. Here, the improved draft genome sequence of this strain is reported. Genome sequence information provides the basis for analysis of its acid tolerance, symbiotic properties and taxonomic classification.

  14. αIIbβ3 variants defined by next-generation sequencing: Predicting variants likely to cause Glanzmann thrombasthenia

    PubMed Central

    Buitrago, Lorena; Rendon, Augusto; Liang, Yupu; Simeoni, Ilenia; Negri, Ana; Filizola, Marta; Ouwehand, Willem H.; Coller, Barry S.; Alessi, Marie-Christine; Ballmaier, Matthias; Bariana, Tadbir; Bellissimo, Daniel; Bertoli, Marta; Bray, Paul; Bury, Loredana; Carrell, Robin; Cattaneo, Marco; Collins, Peter; French, Deborah; Favier, Remi; Freson, Kathleen; Furie, Bruce; Germeshausen, Manuela; Ghevaert, Cedric; Gomez, Keith; Goodeve, Anne; Gresele, Paolo; Guerrero, Jose; Hampshire, Dan J.; Hadinnapola, Charaka; Heemskerk, Johan; Henskens, Yvonne; Hill, Marian; Hogg, Nancy; Johnsen, Jill; Kahr, Walter; Kerr, Ron; Kunishima, Shinji; Laffan, Michael; Natwani, Amit; Neerman-Arbez, Marguerite; Nurden, Paquita; Nurden, Alan; Ormiston, Mark; Othman, Maha; Ouwehand, Willem; Perry, David; Vilk, Shoshana Ravel; Reitsma, Pieter; Rondina, Matthew; Simeoni, Ilenia; Smethurst, Peter; Stephens, Jonathan; Stevenson, William; Szkotak, Artur; Turro, Ernest; Van Geet, Christel; Vries, Minka; Ward, June; Waye, John; Westbury, Sarah; Whiteheart, Sidney; Wilcox, David; Zhang, Bi

    2015-01-01

    Next-generation sequencing is transforming our understanding of human genetic variation but assessing the functional impact of novel variants presents challenges. We analyzed missense variants in the integrin αIIbβ3 receptor subunit genes ITGA2B and ITGB3 identified by whole-exome or -genome sequencing in the ThromboGenomics project, comprising ∼32,000 alleles from 16,108 individuals. We analyzed the results in comparison with 111 missense variants in these genes previously reported as being associated with Glanzmann thrombasthenia (GT), 20 associated with alloimmune thrombocytopenia, and 5 associated with aniso/macrothrombocytopenia. We identified 114 novel missense variants in ITGA2B (affecting ∼11% of the amino acids) and 68 novel missense variants in ITGB3 (affecting ∼9% of the amino acids). Of the variants, 96% had minor allele frequencies (MAF) < 0.1%, indicating their rarity. Based on sequence conservation, MAF, and location on a complete model of αIIbβ3, we selected three novel variants that affect amino acids previously associated with GT for expression in HEK293 cells. αIIb P176H and β3 C547G severely reduced αIIbβ3 expression, whereas αIIb P943A partially reduced αIIbβ3 expression and had no effect on fibrinogen binding. We used receiver operating characteristic curves of combined annotation-dependent depletion, Polyphen 2-HDIV, and sorting intolerant from tolerant to estimate the percentage of novel variants likely to be deleterious. At optimal cut-off values, which had 69–98% sensitivity in detecting GT mutations, between 27% and 71% of the novel αIIb or β3 missense variants were predicted to be deleterious. Our data have implications for understanding the evolutionary pressure on αIIbβ3 and highlight the challenges in predicting the clinical significance of novel missense variants. PMID:25827233

  15. αIIbβ3 variants defined by next-generation sequencing: predicting variants likely to cause Glanzmann thrombasthenia.

    PubMed

    Buitrago, Lorena; Rendon, Augusto; Liang, Yupu; Simeoni, Ilenia; Negri, Ana; Filizola, Marta; Ouwehand, Willem H; Coller, Barry S

    2015-04-14

    Next-generation sequencing is transforming our understanding of human genetic variation but assessing the functional impact of novel variants presents challenges. We analyzed missense variants in the integrin αIIbβ3 receptor subunit genes ITGA2B and ITGB3 identified by whole-exome or -genome sequencing in the ThromboGenomics project, comprising ∼32,000 alleles from 16,108 individuals. We analyzed the results in comparison with 111 missense variants in these genes previously reported as being associated with Glanzmann thrombasthenia (GT), 20 associated with alloimmune thrombocytopenia, and 5 associated with aniso/macrothrombocytopenia. We identified 114 novel missense variants in ITGA2B (affecting ∼11% of the amino acids) and 68 novel missense variants in ITGB3 (affecting ∼9% of the amino acids). Of the variants, 96% had minor allele frequencies (MAF) < 0.1%, indicating their rarity. Based on sequence conservation, MAF, and location on a complete model of αIIbβ3, we selected three novel variants that affect amino acids previously associated with GT for expression in HEK293 cells. αIIb P176H and β3 C547G severely reduced αIIbβ3 expression, whereas αIIb P943A partially reduced αIIbβ3 expression and had no effect on fibrinogen binding. We used receiver operating characteristic curves of combined annotation-dependent depletion, Polyphen 2-HDIV, and sorting intolerant from tolerant to estimate the percentage of novel variants likely to be deleterious. At optimal cut-off values, which had 69-98% sensitivity in detecting GT mutations, between 27% and 71% of the novel αIIb or β3 missense variants were predicted to be deleterious. Our data have implications for understanding the evolutionary pressure on αIIbβ3 and highlight the challenges in predicting the clinical significance of novel missense variants.

  16. SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

    PubMed Central

    Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

    2001-01-01

    Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202

  17. The amino acid sequence of monal pheasant lysozyme and its activity.

    PubMed

    Araki, T; Matsumoto, T; Torikata, T

    1998-10-01

    The amino acid sequence of monal pheasant lysozyme and its activity were analyzed. Carboxymethylated lysozyme was digested with trypsin and the resulting peptides were sequenced. The established amino acid sequence had one amino acid substitution at position 102 (Arg to Gly) comparing with Indian peafowl lysozyme and four amino acid substitutions at positions 3 (Phe to Tyr), 15 (His to Leu), 41 (Gln to His), and 121 (Gln to His) with chicken lysozyme. Analysis of the time-courses of reaction using N-acetylglucosamine pentamer as a substrate showed a difference of binding free energy change (-0.4 kcal/mol) at subsites A between monal pheasant and Indian peafowl lysozyme. This was assumed to be caused by the amino acid substitution at subsite A with loss of a positive charge at position 102 (Arg102 to Gly).

  18. Complete genome sequence of Enterococcus mundtii QU 25, an efficient L-(+)-lactic acid-producing bacterium.

    PubMed

    Shiwa, Yuh; Yanase, Hiroaki; Hirose, Yuu; Satomi, Shohei; Araya-Kojima, Tomoko; Watanabe, Satoru; Zendo, Takeshi; Chibazakura, Taku; Shimizu-Kadota, Mariko; Yoshikawa, Hirofumi; Sonomoto, Kenji

    2014-08-01

    Enterococcus mundtii QU 25, a non-dairy bacterial strain of ovine faecal origin, can ferment both cellobiose and xylose to produce l-lactic acid. The use of this strain is highly desirable for economical l-lactate production from renewable biomass substrates. Genome sequence determination is necessary for the genetic improvement of this strain. We report the complete genome sequence of strain QU 25, primarily determined using Pacific Biosciences sequencing technology. The E. mundtii QU 25 genome comprises a 3 022 186-bp single circular chromosome (GC content, 38.6%) and five circular plasmids: pQY182, pQY082, pQY039, pQY024, and pQY003. In all, 2900 protein-coding sequences, 63 tRNA genes, and 6 rRNA operons were predicted in the QU 25 chromosome. Plasmid pQY024 harbours genes for mundticin production. We found that strain QU 25 produces a bacteriocin, suggesting that mundticin-encoded genes on plasmid pQY024 were functional. For lactic acid fermentation, two gene clusters were identified-one involved in the initial metabolism of xylose and uptake of pentose and the second containing genes for the pentose phosphate pathway and uptake of related sugars. This is the first complete genome sequence of an E. mundtii strain. The data provide insights into lactate production in this bacterium and its evolution among enterococci.

  19. Single-chain structure of human ceruloplasmin: the complete amino acid sequence of the whole molecule.

    PubMed Central

    Takahashi, N; Ortel, T L; Putnam, F W

    1984-01-01

    We have determined the amino acid sequence of the amino-terminal 67,000-dalton (67-kDa) fragment of human ceruloplasmin and have established overlapping sequences between the 67-kDa and 50-kDa fragments and between the 50-kDa and 19-kDa fragments. The 67-kDa fragment contains 480 amino acid residues and three glucosamine oligosaccharides. These results together with our previous sequence data for the 50-kDa and 19-kDa fragments complete the amino acid sequence of human ceruloplasmin. The polypeptide chain has a total of 1,046 amino acid residues (Mr 120,085) and has attachment sites for four glucosamine oligosaccharides; together these account for the total molecular mass of human ceruloplasmin (132 kDa). The sequence analysis of the peptides overlapping the fragments showed that one additional amino acid, arginine, is present between the 67-kDa and 50-kDa fragments, and another, lysine, is between the 50-kDa and 19-kDa fragments. Only two apparent sites of amino acid interchange have been identified in the polypeptide chain. Both involve a single-point interchange of glycine and lysine that would result in a difference in charge. The results of the complete sequence analysis verified that human ceruloplasmin is composed of a single polypeptide chain and that the subunit-like fragments are produced by proteolytic cleavage during purification (and possibly also in vivo). PMID:6582496

  20. Multiple Genome Sequences of Important Beer-Spoiling Lactic Acid Bacteria

    PubMed Central

    Geissler, Andreas J.; Vogel, Rudi F.

    2016-01-01

    Seven strains of important beer-spoiling lactic acid bacteria were sequenced using single-molecule real-time sequencing. Complete genomes were obtained for strains of Lactobacillus paracollinoides, Lactobacillus lindneri, and Pediococcus claussenii. The analysis of these genomes emphasizes the role of plasmids as the genomic foundation of beer-spoiling ability. PMID:27795248

  1. Complete genome sequence of the probiotic lactic acid bacterium Lactobacillus acidophilus NCFM

    PubMed Central

    Altermann, Eric; Russell, W. Michael; Azcarate-Peril, M. Andrea; Barrangou, Rodolphe; Buck, B. Logan; McAuliffe, Olivia; Souther, Nicole; Dobson, Alleson; Duong, Tri; Callanan, Michael; Lick, Sonja; Hamrick, Alice; Cano, Raul; Klaenhammer, Todd R.

    2005-01-01

    Lactobacillus acidophilus NCFM is a probiotic bacterium that has been produced commercially since 1972. The complete genome is 1,993,564 nt and devoid of plasmids. The average GC content is 34.71% with 1,864 predicted ORFs, of which 72.5% were functionally classified. Nine phage-related integrases were predicted, but no complete prophages were found. However, three unique regions designated as potential autonomous units (PAUs) were identified. These units resemble a unique structure and bear characteristics of both plasmids and phages. Analysis of the three PAUs revealed the presence of two R/M systems and a prophage maintenance system killer protein. A spacers interspersed direct repeat locus containing 32 nearly perfect 29-bp repeats was discovered and may provide a unique molecular signature for this organism. In silico analyses predicted 17 transposase genes and a chromosomal locus for lactacin B, a class II bacteriocin. Several mucus- and fibronectin-binding proteins, implicated in adhesion to human intestinal cells, were also identified. Gene clusters for transport of a diverse group of carbohydrates, including fructooligosaccharides and raffinose, were present and often accompanied by transcriptional regulators of the lacI family. For protein degradation and peptide utilization, the organism encoded 20 putative peptidases, homologs for PrtP and PrtM, and two complete oligopeptide transport systems. Nine two-component regulatory systems were predicted, some associated with determinants implicated in bacteriocin production and acid tolerance. Collectively, these features within the genome sequence of L. acidophilus are likely to contribute to the organisms' gastric survival and promote interactions with the intestinal mucosa and microbiota. PMID:15671160

  2. All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences

    PubMed Central

    Hayat, Sikander; Sander, Chris; Marks, Debora S.

    2015-01-01

    Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand–strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases. PMID:25858953

  3. All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences.

    PubMed

    Hayat, Sikander; Sander, Chris; Marks, Debora S; Elofsson, Arne

    2015-04-28

    Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand-strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases.

  4. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

    PubMed

    Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong; Warnow, Tandy

    2015-05-01

    We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.

  5. SETG: Nucleic Acid Extraction and Sequencing for In Situ Life Detection on Mars

    NASA Astrophysics Data System (ADS)

    Mojarro, A.; Hachey, J.; Tani, J.; Smith, A.; Bhattaru, S. A.; Pontefract, A.; Doebler, R.; Brown, M.; Ruvkun, G.; Zuber, M. T.; Carr, C. E.

    2016-10-01

    We are developing an integrated nucleic acid extraction and sequencing instrument: the Search for Extra-Terrestrial Genomes (SETG) for in situ life detection on Mars. Our goals are to identify related or unrelated nucleic acid-based life on Mars.

  6. Parvalbumins from coelacanth muscle. III. Amino acid sequence of the major component.

    PubMed

    Jauregui-Adell, J; Pechere, J F

    1978-09-26

    The primary structure of the major parvalbumin (pI = 4.52) from coelacanth muscle (Latimeria chalumnae) has been determined. Sequence analysis of the tryptic peptides, in some cases obtained with beta-trypsin, accounts for the total amino acid content of the protein. Chymotryptic peptides provide appropriate sequence overlaps, to complete the localization of the tryptic peptides. Examination of the amino acid sequence of this protein shows the typical structure of a beta-parvalbumin. Its position in the dendrogram of related calcium-binding proteins corresponds to that usually accepted for crossopterygians.

  7. Applying a predict-observe-explain sequence in teaching of buoyant force

    NASA Astrophysics Data System (ADS)

    Radovanović, Jelena; Sliško, Josip

    2013-01-01

    An active learning sequence based on the predict-observe-explain teaching strategy is applied to a lesson on buoyant force. The results obtained clearly justify the use of this teaching method and suggest devising a series of activities to enable more effective removal of students’ commonly held alternative conceptions regarding floating and sinking.

  8. Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families.

    PubMed

    Röttig, Marc; Rausch, Christian; Kohlbacher, Oliver

    2010-01-08

    An important aspect of the functional annotation of enzymes is not only the type of reaction catalysed by an enzyme, but also the substrate specificity, which can vary widely within the same family. In many cases, prediction of family membership and even substrate specificity is possible from enzyme sequence alone, using a nearest neighbour classification rule. However, the combination of structural information and sequence information can improve the interpretability and accuracy of predictive models. The method presented here, Active Site Classification (ASC), automatically extracts the residues lining the active site from one representative three-dimensional structure and the corresponding residues from sequences of other members of the family. From a set of representatives with known substrate specificity, a Support Vector Machine (SVM) can then learn a model of substrate specificity. Applied to a sequence of unknown specificity, the SVM can then predict the most likely substrate. The models can also be analysed to reveal the underlying structural reasons determining substrate specificities and thus yield valuable insights into mechanisms of enzyme specificity. We illustrate the high prediction accuracy achieved on two benchmark data sets and the structural insights gained from ASC by a detailed analysis of the family of decarboxylating dehydrogenases. The ASC web service is available at http://asc.informatik.uni-tuebingen.de/.

  9. Applying a Predict-Observe-Explain Sequence in Teaching of Buoyant Force

    ERIC Educational Resources Information Center

    Radovanovic, Jelena; Slisko, Josip

    2013-01-01

    An active learning sequence based on the predict-observe-explain teaching strategy is applied to a lesson on buoyant force. The results obtained clearly justify the use of this teaching method and suggest devising a series of activities to enable more effective removal of students' commonly held alternative conceptions regarding floating and…

  10. Linguistic and Spatial Skills Predict Early Arithmetic Development via Counting Sequence Knowledge

    ERIC Educational Resources Information Center

    Zhang, Xiao; Koponen, Tuire; Räsänen, Pekka; Aunola, Kaisa; Lerkkanen, Marja-Kristiina; Nurmi, Jari-Erik

    2014-01-01

    Utilizing a longitudinal sample of Finnish children (ages 6-10), two studies examined how early linguistic (spoken vs. written) and spatial skills predict later development of arithmetic, and whether counting sequence knowledge mediates these associations. In Study 1 (N = 1,880), letter knowledge and spatial visualization, measured in…

  11. Predicting Salmonella enterica subsp. enterica Serotypes by Repetitive Extragenic Palindromic Sequence-Based PCR

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The DiversiLabTM System, which employs repetitive extragenic palindromic sequence-based PCR (rep-PCR) to genotype microorganisms, was evaluated as a method to predict the serotype of Salmonella isolates. Two hundred and thirty-three Salmonella isolates belonging to 14 frequently isolated serotypes f...

  12. Cross-species protein sequence and gene structure prediction with fine-tuned Webscipio 2.0 and Scipio

    PubMed Central

    2011-01-01

    Background Obtaining transcripts of homologs of closely related organisms and retrieving the reconstructed exon-intron patterns of the genes is a very important process during the analysis of the evolution of a protein family and the comparative analysis of the exon-intron structure of a certain gene from different species. Due to the ever-increasing speed of genome sequencing, the gap to genome annotation is growing. Thus, tools for the correct prediction and reconstruction of genes in related organisms become more and more important. The tool Scipio, which can also be used via the graphical interface WebScipio, performs significant hit processing of the output of the Blat program to account for sequencing errors, missing sequence, and fragmented genome assemblies. However, Scipio has so far been limited to high sequence similarity and unable to reconstruct short exons. Results Scipio and WebScipio have fundamentally been extended to better reconstruct very short exons and intron splice sites and to be better suited for cross-species gene structure predictions. The Needleman-Wunsch algorithm has been implemented for the search for short parts of the query sequence that were not recognized by Blat. Those regions might either be short exons, divergent sequence at intron splice sites, or very divergent exons. We have shown the benefit and use of new parameters with several protein examples from completely different protein families in searches against species from several kingdoms of the eukaryotes. The performance of the new Scipio version has been tested in comparison with several similar tools. Conclusions With the new version of Scipio very short exons, terminal and internal, of even just one amino acid can correctly be reconstructed. Scipio is also able to correctly predict almost all genes in cross-species searches even if the ancestors of the species separated more than 100 Myr ago and if the protein sequence identity is below 80%. For our test cases Scipio

  13. Prediction and identification of some forbidden lines in the Ne I sequence. [in solar spectrum

    NASA Technical Reports Server (NTRS)

    Kastner, S. O.

    1974-01-01

    A magnetic quadrupole transition which according to a prediction by Garstang (1969) is to have an appreciable transition probability in the higher ions of the Ne I sequence has recently been observed in Fe XVII with high resolution by Parkinson (1973), at 17.086 A. Values of an interval predicted by calculations of Crance (1973) are plotted in a graph. Interval values obtained from the curve are used to predict the values of certain transition wavelengths in the ions Si V Cr XV.

  14. Prediction of high-risk types of human papillomaviruses using statistical model of protein "sequence space".

    PubMed

    Wang, Cong; Hai, Yabing; Liu, Xiaoqing; Liu, Nanfang; Yao, Yuhua; He, Pingan; Dai, Qi

    2015-01-01

    Discrimination of high-risk types of human papillomaviruses plays an important role in the diagnosis and remedy of cervical cancer. Recently, several computational methods have been proposed based on protein sequence-based and structure-based information, but the information of their related proteins has not been used until now. In this paper, we proposed using protein "sequence space" to explore this information and used it to predict high-risk types of HPVs. The proposed method was tested on 68 samples with known HPV types and 4 samples without HPV types and further compared with the available approaches. The results show that the proposed method achieved the best performance among all the evaluated methods with accuracy 95.59% and F1-score 90.91%, which indicates that protein "sequence space" could potentially be used to improve prediction of high-risk types of HPVs.

  15. In-silico prediction of disorder content using hybrid sequence representation

    PubMed Central

    2011-01-01

    Background Intrinsically disordered proteins play important roles in various cellular activities and their prevalence was implicated in a number of human diseases. The knowledge of the content of the intrinsic disorder in proteins is useful for a variety of studies including estimation of the abundance of disorder in protein families, classes, and complete proteomes, and for the analysis of disorder-related protein functions. The above investigations currently utilize the disorder content derived from the per-residue disorder predictions. We show that these predictions may over-or under-predict the overall amount of disorder, which motivates development of novel tools for direct and accurate sequence-based prediction of the disorder content. Results We hypothesize that sequence-level aggregation of input information may provide more accurate content prediction when compared with the content extracted from the local window-based residue-level disorder predictors. We propose a novel predictor, DisCon, that takes advantage of a small set of 29 custom-designed descriptors that aggregate and hybridize information concerning sequence, evolutionary profiles, and predicted secondary structure, solvent accessibility, flexibility, and annotation of globular domains. Using these descriptors and a ridge regression model, DisCon predicts the content with low, 0.05, mean squared error and high, 0.68, Pearson correlation. This is a statistically significant improvement over the content computed from outputs of ten modern disorder predictors on a test dataset with proteins that share low sequence identity with the training sequences. The proposed predictive model is analyzed to discuss factors related to the prediction of the disorder content. Conclusions DisCon is a high-quality alternative for high-throughput annotation of the disorder content. We also empirically demonstrate that the DisCon's predictions can be used to improve binary annotations of the disordered residues from

  16. Purification, characterization and partial amino acid sequence of glycogen synthase from Saccharomyces cerevisiae.

    PubMed Central

    Carabaza, A; Arino, J; Fox, J W; Villar-Palasi, C; Guinovart, J J

    1990-01-01

    Glycogen synthase from Saccharomyces cerevisiae was purified to homogeneity. The enzyme showed a subunit molecular mass of 80 kDa. The holoenzyme appears to be a tetramer. Antibodies developed against purified yeast glycogen synthase inactivated the enzyme in yeast extracts and allowed the detection of the protein in Western blots. Amino acid analysis showed that the enzyme is very rich in glutamate and/or glutamine residues. The N-terminal sequence (11 amino acid residues) was determined. In addition, selected tryptic-digest peptides were purified by reverse-phase h.p.l.c. and submitted to gas-phase sequencing. Up to eight sequences (79 amino acid residues) could be aligned with the human muscle enzyme sequence. Levels of identity range between 37 and 100%, indicating that, although human and yeast glycogen synthases probably share some conserved regions, significant differences in their primary structure should be expected. Images Fig. 1. Fig. 2. Fig. 3. PMID:2114092

  17. Amino acid sequence of anionic peroxidase from the windmill palm tree Trachycarpus fortunei.

    PubMed

    Baker, Margaret R; Zhao, Hongwei; Sakharov, Ivan Yu; Li, Qing X

    2014-12-10

    Palm peroxidases are extremely stable and have uncommon substrate specificity. This study was designed to fill in the knowledge gap about the structures of a peroxidase from the windmill palm tree Trachycarpus fortunei. The complete amino acid sequence and partial glycosylation were determined by MALDI-top-down sequencing of native windmill palm tree peroxidase (WPTP), MALDI-TOF/TOF MS/MS of WPTP tryptic peptides, and cDNA sequencing. The propeptide of WPTP contained N- and C-terminal signal sequences which contained 21 and 17 amino acid residues, respectively. Mature WPTP was 306 amino acids in length, and its carbohydrate content ranged from 21% to 29%. Comparison to closely related royal palm tree peroxidase revealed structural features that may explain differences in their substrate specificity. The results can be used to guide engineering of WPTP and its novel applications.

  18. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations.

    PubMed

    Abascal, Federico; Zardoya, Rafael; Telford, Maximilian J

    2010-07-01

    We present TranslatorX, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. Many comparisons between biological sequences (nucleic acids and proteins) involve the construction of multiple alignments. Alignments represent a statement regarding the homology between individual nucleotides or amino acids within homologous genes. As protein-coding DNA sequences evolve as triplets of nucleotides (codons) and it is known that sequence similarity degrades more rapidly at the DNA than at the amino acid level, alignments are generally more accurate when based on amino acids than on their corresponding nucleotides. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. The TranslatorX server is freely available at http://translatorx.co.uk.

  19. Amino acid sequence of homologous rat atrial peptides: natriuretic activity of native and synthetic forms.

    PubMed Central

    Seidah, N G; Lazure, C; Chrétien, M; Thibault, G; Garcia, R; Cantin, M; Genest, J; Nutt, R F; Brady, S F; Lyle, T A

    1984-01-01

    A substance called atrial natriuretic factor (ANF), localized in secretory granules of atrial cardiocytes, was isolated as four homologous natriuretic peptides from homogenates of rat atria. The complete sequence of the longest form showed that it is composed of 33 amino acids. The three other shorter forms (2-33, 3-33, and 8-33) represent amino-terminally truncated versions of the 33 amino acid parent molecule as shown by analysis of sequence, amino acid composition, or both. The proposed primary structure agrees entirely with the amino acid composition and reveals no significant sequence homology with any known protein or segment of protein. The short form ANF-(8-33) was synthesized by a multi-fragment condensation approach and the synthetic product was shown to exhibit specific activity comparable to that of the natural ANF-(3-33). PMID:6232612

  20. Nucleotide and deduced amino acid sequences of a new subtilisin from an alkaliphilic Bacillus isolate.

    PubMed

    Saeki, Katsuhisa; Magallones, Marietta V; Takimura, Yasushi; Hatada, Yuji; Kobayashi, Tohru; Kawai, Shuji; Ito, Susumu

    2003-10-01

    The gene for a new subtilisin from the alkaliphilic Bacillus sp. KSM-LD1 was cloned and sequenced. The open reading frame of the gene encoded a 97 amino-acid prepro-peptide plus a 307 amino-acid mature enzyme that contained a possible catalytic triad of residues, Asp32, His66, and Ser224. The deduced amino acid sequence of the mature enzyme (LD1) showed approximately 65% identity to those of subtilisins SprC and SprD from alkaliphilic Bacillus sp. LG12. The amino acid sequence identities of LD1 to those of previously reported true subtilisins and high-alkaline proteases were below 60%. LD1 was characteristically stable during incubation with surfactants and chemical oxidants. Interestingly, an oxidizable Met residue is located next to the catalytic Ser224 of the enzyme as in the cases of the oxidation-susceptible subtilisins reported to date.

  1. FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences.

    PubMed

    Schiex, Thomas; Gouzy, Jérôme; Moisan, Annick; de Oliveira, Yannick

    2003-07-01

    We describe FrameD, a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial GC rich genomes, the gene model used in FrameD also allows to predict genes in the presence of frameshifts and partially undetermined sequences which makes it also very suitable for gene prediction and frameshift correction in unfinished sequences such as EST and EST cluster sequences. Like recent eukaryotic gene prediction programs, FrameD also includes the ability to take into account protein similarity information both in its prediction and its graphical output. Its performances are evaluated on different bacterial genomes. The web site (http://genopole.toulouse.inra.fr/bioinfo/FrameD/FD) allows direct prediction, sequence correction and translation and the ability to learn new models for new organisms.

  2. FrameD: a flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences

    PubMed Central

    Schiex, Thomas; Gouzy, Jérôme; Moisan, Annick; de Oliveira, Yannick

    2003-01-01

    We describe FrameD, a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial GC rich genomes, the gene model used in FrameD also allows to predict genes in the presence of frameshifts and partially undetermined sequences which makes it also very suitable for gene prediction and frameshift correction in unfinished sequences such as EST and EST cluster sequences. Like recent eukaryotic gene prediction programs, FrameD also includes the ability to take into account protein similarity information both in its prediction and its graphical output. Its performances are evaluated on different bacterial genomes. The web site (http://genopole.toulouse.inra.fr/bioinfo/FrameD/FD) allows direct prediction, sequence correction and translation and the ability to learn new models for new organisms. PMID:12824407

  3. Comprehensive red blood cell and platelet antigen prediction from whole genome sequencing: proof of principle

    PubMed Central

    Westhoff, Connie M.; Uy, Jon Michael; Aguad, Maria; Smeland‐Wagman, Robin; Kaufman, Richard M.; Rehm, Heidi L.; Green, Robert C.; Silberstein, Leslie E.

    2015-01-01

    BACKGROUND There are 346 serologically defined red blood cell (RBC) antigens and 33 serologically defined platelet (PLT) antigens, most of which have known genetic changes in 45 RBC or six PLT genes that correlate with antigen expression. Polymorphic sites associated with antigen expression in the primary literature and reference databases are annotated according to nucleotide positions in cDNA. This makes antigen prediction from next‐generation sequencing data challenging, since it uses genomic coordinates. STUDY DESIGN AND METHODS The conventional cDNA reference sequences for all known RBC and PLT genes that correlate with antigen expression were aligned to the human reference genome. The alignments allowed conversion of conventional cDNA nucleotide positions to the corresponding genomic coordinates. RBC and PLT antigen prediction was then performed using the human reference genome and whole genome sequencing (WGS) data with serologic confirmation. RESULTS Some major differences and alignment issues were found when attempting to convert the conventional cDNA to human reference genome sequences for the following genes: ABO, A4GALT, RHD, RHCE, FUT3, ACKR1 (previously DARC), ACHE, FUT2, CR1, GCNT2, and RHAG. However, it was possible to create usable alignments, which facilitated the prediction of all RBC and PLT antigens with a known molecular basis from WGS data. Traditional serologic typing for 18 RBC antigens were in agreement with the WGS‐based antigen predictions, providing proof of principle for this approach. CONCLUSION Detailed mapping of conventional cDNA annotated RBC and PLT alleles can enable accurate prediction of RBC and PLT antigens from whole genomic sequencing data. PMID:26634332

  4. Complete cDNA and derived amino acid sequence of human factor V

    SciTech Connect

    Jenny, R.J.; Pittman, D.D.; Toole, J.J.; Kriz, R.W.; Aldape, R.A.; Hewick, R.M.; Kaufman, R.J.; Mann, K.G.

    1987-07-01

    cDNA clones encoding human factor V have been isolated from an oligo(dT)-primed human fetal liver cDNA library prepared with vector Charon 21A. The cDNA sequence of factor V from three overlapping clones includes a 6672-base-pair (bp) coding region, a 90-bp 5' untranslated region, and a 163-bp 3' untranslated region within which is a poly(A)tail. The deduced amino acid sequence consists of 2224 amino acids inclusive of a 28-amino acid leader peptide. Direct comparison with human factor VIII reveals considerable homology between proteins in amino acid sequence and domain structure: a triplicated A domain and duplicated C domain show approx. 40% identity with the corresponding domains in factor VIII. As in factor VIII, the A domains of factor V share approx. 40% amino acid-sequence homology with the three highly conserved domains in ceruloplasmin. The B domain of factor V contains 35 tandem and approx. 9 additional semiconserved repeats of nine amino acids of the form Asp-Leu-Ser-Gln-Thr-Thr/Asn-Leu-Ser-Pro and 2 additional semiconserved repeats of 17 amino acids. Factor V contains 37 potential N-linked glycosylation sites, 25 of which are in the B domain, and a total of 19 cysteine residues.

  5. Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources

    PubMed Central

    Mizianty, Marcin J.; Stach, Wojciech; Chen, Ke; Kedarisetti, Kanaka Durga; Disfani, Fatemeh Miri; Kurgan, Lukasz

    2010-01-01

    Motivation: Intrinsically disordered proteins play a crucial role in numerous regulatory processes. Their abundance and ubiquity combined with a relatively low quantity of their annotations motivate research toward the development of computational models that predict disordered regions from protein sequences. Although the prediction quality of these methods continues to rise, novel and improved predictors are urgently needed. Results: We propose a novel method, named MFDp (Multilayered Fusion-based Disorder predictor), that aims to improve over the current disorder predictors. MFDp is as an ensemble of 3 Support Vector Machines specialized for the prediction of short, long and generic disordered regions. It combines three complementary disorder predictors, sequence, sequence profiles, predicted secondary structure, solvent accessibility, backbone dihedral torsion angles, residue flexibility and B-factors. Our method utilizes a custom-designed set of features that are based on raw predictions and aggregated raw values and recognizes various types of disorder. The MFDp is compared at the residue level on two datasets against eight recent disorder predictors and top-performing methods from the most recent CASP8 experiment. In spite of using training chains with ≤25% similarity to the test sequences, our method consistently and significantly outperforms the other methods based on the MCC index. The MFDp outperforms modern disorder predictors for the binary disorder assignment and provides competitive real-valued predictions. The MFDp's outputs are also shown to outperform the other methods in the identification of proteins with long disordered regions. Availability: http://biomine.ece.ualberta.ca/MFDp.html Supplementary information: Supplementary data are available at Bioinformatics online. Contact: lkurgan@ece.ualberta.ca PMID:20823312

  6. An analysis of amino acid sequences surrounding archaeal glycoprotein sequons.

    PubMed

    Abu-Qarn, Mehtap; Eichler, Jerry

    2007-05-01

    Despite having provided the first example of a prokaryal glycoprotein, little is known of the rules governing the N-glycosylation process in Archaea. As in Eukarya and Bacteria, archaeal N-glycosylation takes place at the Asn residues of Asn-X-Ser/Thr sequons. Since not all sequons are utilized, it is clear that other factors, including the context in which a sequon exists, affect glycosylation efficiency. As yet, the contribution to N-glycosylation made by sequon-bordering residues and other related factors in Archaea remains unaddressed. In the following, the surroundings of Asn residues confirmed by experiment as modified were analyzed in an attempt to define sequence rules and requirements for archaeal N-glycosylation.

  7. Computer Aided Prediction of Biological Activity Spectra: Study of Correlation between Predicted and Observed Activities for Coumarin-4-Acetic Acids

    PubMed Central

    Basanagouda, M.; Jadhav, V. B.; Kulkarni, M. V.; Rao, R. Nagendra

    2011-01-01

    Coumarin-4-acetic acids have been synthesized from various phenols and citric acid under Pechmann cyclisation conditions. All the compounds have been evaluated for antiinflammatory and analgesic activity in acute models. Compounds have also been evaluated for their ulcerogenic potential. Using the computer program, prediction of activity spectra for substances, prediction results and their Pharma Expert software, we have found a correlation between the observed and predicted antiinflammatory activity. PMID:22131629

  8. Using sequence-specific chemical and structural properties of DNA to predict transcription factor binding sites.

    PubMed

    Bauer, Amy L; Hlavacek, William S; Unkefer, Pat J; Mu, Fangping

    2010-11-18

    An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate.

  9. Plasma long-chain free fatty acids predict mammalian longevity.

    PubMed

    Jové, Mariona; Naudí, Alba; Aledo, Juan Carlos; Cabré, Rosanna; Ayala, Victoria; Portero-Otin, Manuel; Barja, Gustavo; Pamplona, Reinald

    2013-11-28

    Membrane lipid composition is an important correlate of the rate of aging of animals and, therefore, the determination of their longevity. In the present work, the use of high-throughput technologies allowed us to determine the plasma lipidomic profile of 11 mammalian species ranging in maximum longevity from 3.5 to 120 years. The non-targeted approach revealed a specie-specific lipidomic profile that accurately predicts the animal longevity. The regression analysis between lipid species and longevity demonstrated that the longer the longevity of a species, the lower is its plasma long-chain free fatty acid (LC-FFA) concentrations, peroxidizability index, and lipid peroxidation-derived products content. The inverse association between longevity and LC-FFA persisted after correction for body mass and phylogenetic interdependence. These results indicate that the lipidomic signature is an optimized feature associated with animal longevity, emerging LC-FFA as a potential biomarker of longevity.

  10. Nucleic and Amino Acid Sequences Support Structure-Based Viral Classification

    PubMed Central

    Sinclair, Robert M.; Ravantti, Janne J.

    2017-01-01

    ABSTRACT Viral capsids ensure viral genome integrity by protecting the enclosed nucleic acids. Interactions between the genome and capsid and between individual capsid proteins (i.e., capsid architecture) are intimate and are expected to be characterized by strong evolutionary conservation. For this reason, a capsid structure-based viral classification has been proposed as a way to bring order to the viral universe. The seeming lack of sufficient sequence similarity to reproduce this classification has made it difficult to reject structural convergence as the basis for the classification. We reinvestigate whether the structure-based classification for viral coat proteins making icosahedral virus capsids is in fact supported by previously undetected sequence similarity. Since codon choices can influence nascent protein folding cotranslationally, we searched for both amino acid and nucleotide sequence similarity. To demonstrate the sensitivity of the approach, we identify a candidate gene for the pandoravirus capsid protein. We show that the structure-based classification is strongly supported by amino acid and also nucleotide sequence similarities, suggesting that the similarities are due to common descent. The correspondence between structure-based and sequence-based analyses of the same proteins shown here allow them to be used in future analyses of the relationship between linear sequence information and macromolecular function, as well as between linear sequence and protein folds. IMPORTANCE Viral capsids protect nucleic acid genomes, which in turn encode capsid proteins. This tight coupling of protein shell and nucleic acids, together with strong functional constraints on capsid protein folding and architecture, leads to the hypothesis that capsid protein-coding nucleotide sequences may retain signatures of ancient viral evolution. We have been able to show that this is indeed the case, using the major capsid proteins of viruses forming icosahedral capsids

  11. Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology.

    PubMed

    Bakhtiarizadeh, Mohammad Reza; Moradi-Shahrbabak, Mohammad; Ebrahimi, Mansour; Ebrahimie, Esmaeil

    2014-09-07

    Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods.

  12. Discriminative Prediction of A-To-I RNA Editing Events from DNA Sequence

    PubMed Central

    Sun, Jiangming; Singh, Pratibha; Bagge, Annika; Valtat, Bérengère; Vikman, Petter; Spégel, Peter; Mulder, Hindrik

    2016-01-01

    RNA editing is a post-transcriptional alteration of RNA sequences that, via insertions, deletions or base substitutions, can affect protein structure as well as RNA and protein expression. Recently, it has been suggested that RNA editing may be more frequent than previously thought. A great impediment, however, to a deeper understanding of this process is the paramount sequencing effort that needs to be undertaken to identify RNA editing events. Here, we describe an in silico approach, based on machine learning, that ameliorates this problem. Using 41 nucleotide long DNA sequences, we show that novel A-to-I RNA editing events can be predicted from known A-to-I RNA editing events intra- and interspecies. The validity of the proposed method was verified in an independent experimental dataset. Using our approach, 203 202 putative A-to-I RNA editing events were predicted in the whole human genome. Out of these, 9% were previously reported. The remaining sites require further validation, e.g., by targeted deep sequencing. In conclusion, the approach described here is a useful tool to identify potential A-to-I RNA editing events without the requirement of extensive RNA sequencing. PMID:27764195

  13. Classification of mouse VK groups based on the partial amino acid sequence to the first invariant tryptophan: impact of 14 new sequences from IgG myeloma proteins.

    PubMed

    Potter, M; Newell, J B; Rudikoff, S; Haber, E

    1982-12-01

    Fourteen new VK sequences derived from BALB/c IgG myeloma proteins were determined to the first invariant tryptophan (Trp 35). These partial sequences were compared with 65 other published VK sequences using a computer program. The 79 sequences were organized according to the length of the sequence from the amino terminus to the first invariant tryptophan (Trp 35), into seven groups (33, 34, 35, 36, 39, 40 and 41aa). A distance matrix of all 79 sequences was then computed, i.e. the number of amino acid substitutions necessary to convert one sequence to another was determined. From these data a dendrogram was constructed. Most of the VK sequences fell into clusters or closely related groups. The definition of a sequence group is arbitrary but facilitates the classification of VK proteins. We used 12 substitutions as the basis for defining a sequence group based on the known number of substitutions that are found in the VK21 proteins. By this criterion there were 18 groups in the Trp 35 dendrogram. Twelve of the 14 new sequences fell into one of these sequence groups; two formed new sequence groups. Collective amino acid sequencing is still encountering new VK structures indicating more sequences will be required to attain an accurate estimate of the total number of VK groups. Updated dendrograms can be quickly generated to include newly generated sequences.

  14. Predicting the Viscosity of Low VOC Vinyl Ester and Fatty Acid-Based Resins

    DTIC Science & Technology

    2005-12-01

    The sample was titrated with the perchloric acid / peracetic acid solution (Aldrich) until the indicator, 0.1% crystal violet in acetic acid (Aldrich...Predicting the Viscosity of Low VOC Vinyl Ester and Fatty Acid -Based Resins by John J. La Scala, Amutha Jeyarajasingam, Cherise Winston...Aberdeen Proving Ground, MD 21005-5069 ARL-TR-3681 December 2005 Predicting the Viscosity of Low VOC Vinyl Ester and Fatty Acid -Based

  15. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, J.N.; Straume, T.; Bogen, K.T.

    1997-04-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided. 7 figs.

  16. Detection and isolation of nucleic acid sequences using competitive hybridization probes

    DOEpatents

    Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

    1997-01-01

    A method for detecting a target nucleic acid sequence in a sample is provided using hybridization probes which competitively hybridize to a target nucleic acid. According to the method, a target nucleic acid sequence is hybridized to first and second hybridization probes which are complementary to overlapping portions of the target nucleic acid sequence, the first hybridization probe including a first complexing agent capable of forming a binding pair with a second complexing agent and the second hybridization probe including a detectable marker. The first complexing agent attached to the first hybridization probe is contacted with a second complexing agent, the second complexing agent being attached to a solid support such that when the first and second complexing agents are attached, target nucleic acid sequences hybridized to the first hybridization probe become immobilized on to the solid support. The immobilized target nucleic acids are then separated and detected by detecting the detectable marker attached to the second hybridization probe. A kit for performing the method is also provided.

  17. Noise occlusion in discrete tone sequences as a tool towards auditory predictive processing?

    PubMed

    Bendixen, Alexandra; Duwe, Susann; Reiche, Martin

    2015-11-11

    The notion of predictive coding is a common feature of many theories of auditory information processing. Experimental demonstrations of predictive auditory processing often rest on omitting predictable input in order to uncover the prediction made by the brain. Findings show that auditory cortical activity elicited by the omission of a predictable tone resembles the activity elicited by the actual tone. Here we attempted to extend this approach towards using noises instead of omissions in order to capture a more prevalent case of degraded sensory input. By applying a subtraction approach to remove ERP effects of the noise itself, auditory cortical activity elicited "behind" the noise was uncovered. We hypothesized that ERPs elicited behind noise stimuli covering predictable tones should be more similar to ERPs elicited by the actual tones than when the same comparison is made for unpredictable tones. ERP results during passive listening partly confirm this hypothesis, but also point towards some methodological caveats in this particular approach towards studying neural correlates of predictive auditory processing due to contributions from predictability-unrelated factors. A follow-up active listening condition indicated that participants were not more likely to perceive the tone sequence as continuous when a predictable tone was covered with noise than when this pertained to an unpredictable tone. Overall, the noise-based paradigm in its present form was not shown to be successful in revealing predictive processing in perceptual judgments or early neural correlates of sound processing. We discuss these findings in the contexts of predictive processing and illusory auditory continuity. This article is part of a Special Issue entitled SI: Prediction and Attention.

  18. Amino acid sequence and some properties of phytolacain G, a cysteine protease from growing fruit of pokeweed, Phytolacca americana.

    PubMed

    Uchikoba, T; Arima, K; Yonezawa, H; Shimada, M; Kaneda, M

    2000-10-18

    A protease, phytolacain G, has been found to appear on CM-Sepharose ion-exchange chromatography of greenish small-size fruits of pokeweed, Phytolacca americana L, from ca. 2 weeks after flowering, and increases during fruit enlargement. Reddish ripe fruit of the pokeweed contained both phytolacain G and R. The molecular mass of phytolacain G was estimated to be 25.5 kDa by SDS-PAGE. Its amino acid sequence was reconstructed by automated sequence analysis of the peptides obtained after cleavage with Achromobacter protease I, chymotrypsin, and cyanogen bromide. The enzyme is composed of 216 amino acid residues, of which it shares 152 identical amino acid residues (70%) with phytolacain R, 126 (58%) with melain G, 108 (50%) with papain, 106 (49%) with actinidain, and 96 (44%) with stem bromelain. The amino acid residues forming the substrate binding S(2) pocket of papain, Tyr67, Pro68, Trp69, Val133, and Phe207, were predicted to be replaced by Trp, Met, His, Ala, and Ser in phytolacain G, respectively. As a consequence of these substitutions, the S(2) pocket is expected to be less hydrophobic in phytolacain G than in papain.

  19. Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design.

    PubMed

    Smith, Colin A; Kortemme, Tanja

    2011-01-01

    Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.

  20. Amino acid sequence around the active-site serine residue in the acyltransferase domain of goat mammary fatty acid synthetase.

    PubMed Central

    Mikkelsen, J; Højrup, P; Rasmussen, M M; Roepstorff, P; Knudsen, J

    1985-01-01

    Goat mammary fatty acid synthetase was labelled in the acyltransferase domain by formation of O-ester intermediates by incubation with [1-14C]acetyl-CoA and [2-14C]malonyl-CoA. Tryptic-digest and CNBr-cleavage peptides were isolated and purified by high-performance reverse-phase and ion-exchange liquid chromatography. The sequences of the malonyl- and acetyl-labelled peptides were shown to be identical. The results confirm the hypothesis that both acetyl and malonyl groups are transferred to the mammalian fatty acid synthetase complex by the same transferase. The sequence is compared with those of other fatty acid synthetase transferases. PMID:3922356

  1. Ligation with nucleic acid sequence-based amplification.

    PubMed

    Ong, Carmichael; Tai, Warren; Sarma, Aartik; Opal, Steven M; Artenstein, Andrew W; Tripathi, Anubhav

    2012-01-01

    This work presents a novel method for detecting nucleic acid targets using a ligation step along with an isothermal, exponential amplification step. We use an engineered ssDNA with two variable regions on the ends, allowing us to design the probe for optimal reaction kinetics and primer binding. This two-part probe is ligated by T4 DNA Ligase only when both parts bind adjacently to the target. The assay demonstrates that the expected 72-nt RNA product appears only when the synthetic target, T4 ligase, and both probe fragments are present during the ligation step. An extraneous 38-nt RNA product also appears due to linear amplification of unligated probe (P3), but its presence does not cause a false-positive result. In addition, 40 mmol/L KCl in the final amplification mix was found to be optimal. It was also found that increasing P5 in excess of P3 helped with ligation and reduced the extraneous 38-nt RNA product. The assay was also tested with a single nucleotide polymorphism target, changing one base at the ligation site. The assay was able to yield a negative signal despite only a single-base change. Finally, using P3 and P5 with longer binding sites results in increased overall sensitivity of the reaction, showing that increasing ligation efficiency can improve the assay overall. We believe that this method can be used effectively for a number of diagnostic assays.

  2. Structure-templated predictions of novel protein interactions from sequence information.

    PubMed

    Betel, Doron; Breitkreuz, Kevin E; Isserlin, Ruth; Dewar-Darch, Danielle; Tyers, Mike; Hogue, Christopher W V

    2007-09-01

    The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain-motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information.

  3. The Motif Tool Assessment Platform (MTAP) for sequence-based transcription factor binding site prediction tools.

    PubMed

    Quest, Daniel; Ali, Hesham

    2010-01-01

    Predicting transcription factor binding sites (TFBS) from sequence is one of the most challenging problems in computational biology. The development of (semi-)automated computer-assisted prediction methods is needed to find TFBS over an entire genome, which is a first step in reconstructing mechanisms that control gene activity. Bioinformatics journals continue to publish diverse methods for predicting TFBS on a monthly basis. To help practitioners in deciding which method to use to predict for a particular TFBS, we provide a platform to assess the quality and applicability of the available methods. Assessment tools allow researchers to determine how methods can be expected to perform on specific organisms or on specific transcription factor families. This chapter introduces the TFBS detection problem and reviews current strategies for evaluating algorithm effectiveness. In this chapter, a novel and robust assessment tool, the Motif Tool Assessment Platform (MTAP), is introduced and discussed.

  4. Thin-film technology for direct visual detection of nucleic acid sequences: applications in clinical research.

    PubMed

    Jenison, Robert D; Bucala, Richard; Maul, Diana; Ward, David C

    2006-01-01

    Certain optical conditions permit the unaided eye to detect thickness changes on surfaces on the order of 20 A, which are of similar dimensions to monomolecular interactions between proteins or hybridization of complementary nucleic acid sequences. Such detection exploits specific interference of reflected white light, wherein thickness changes are perceived as surface color changes. This technology, termed thin-film detection, allows for the visualization of subattomole amounts of nucleic acid targets, even in complex clinical samples. Thin-film technology has been applied to a broad range of clinically relevant indications, including the detection of pathogenic bacterial and viral nucleic acid sequences and the discrimination of sequence variations in human genes causally related to susceptibility or severity of disease.

  5. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  6. RNA internal standard synthesis by nucleic acid sequence-based amplification for competitive quantitative amplification reactions.

    PubMed

    Lo, Wan-Yu; Baeumner, Antje J

    2007-02-15

    Nucleic acid sequence-based amplification (NASBA) reactions have been demonstrated to successfully synthesize new sequences based on deletion and insertion reactions. Two RNA internal standards were synthesized for use in competitive amplification reactions in which quantitative analysis can be achieved by coamplifying the internal standard with the wild type sample. The sequences were created in two consecutive NASBA reactions using the E. coli clpB mRNA sequence as model analyte. The primer sequences of the wild type sequence were maintained, and a 20-nt-long segment inside the amplicon region was exchanged for a new segment of similar GC content and melting temperature. The new RNA sequence was thus amplifiable using the wild type primers and detectable via a new inserted sequence. In the first reaction, the forwarding primer and an additional 20-nt-long sequence was deleted and replaced by a new 20-nt-long sequence. In the second reaction, a forwarding primer containing as 5' overhang sequence the wild type primer sequence was used. The presence of pure internal standard was verified using electrochemiluminescence and RNA lateral-flow biosensor analysis. Additional sequence deletion in order to shorten the internal standard amplicons and thus generate higher detection signals was found not to be required. Finally, a competitive NASBA reaction between one internal standard and the wild type sequence was carried out proving its functionality. This new rapid construction method via NASBA provides advantages over the traditional techniques since it requires no traditional cloning procedures, no thermocyclers, and can be completed in less than 4 h.

  7. Rat androgen-binding protein: evidence for identical subunits and amino acid sequence homology with human sex hormone-binding globulin.

    PubMed

    Joseph, D R; Hall, S H; French, F S

    1987-01-01

    The cDNA for rat androgen-binding protein (ABP) was previously isolated from a bacteriophage lambda gt11 rat testis cDNA library and its identity was confirmed by epitope selection. Hybrid-arrested translation studies have now demonstrated the identity of the isolates. The nucleotide sequence of a near full-length cDNA encodes a 403-amino acid precursor (Mr = 44,539), which agrees in size with the cell-free translation product (Mr = 45,000) of ABP mRNA. Putative sites of N-glycosylation and signal peptide cleavage were identified. Comparison of the predicted amino acid sequence of rat ABP with the amino-terminal amino acid sequence of human sex hormone-binding globulin revealed that 17 of 25 residues are identical. On the basis of the predicted amino acid sequence the molecular weight of the primary translation product, lacking the signal peptide, was 41,183. Hybridization analyses indicated that the two subunits of ABP are coded for by a single gene and a single mRNA species. Our results suggest that ABP consists of two subunits with identical primary sequences and that differences in post-translational processing result in the production of 47,000 and 41,000 molecular weight monomers.

  8. Functional Divergence in the Genus Oenococcus as Predicted by Genome Sequencing of the Newly-Described Species, Oenococcus kitaharae

    PubMed Central

    Borneman, Anthony R.; McCarthy, Jane M.; Chambers, Paul J.; Bartowsky, Eveline J.

    2012-01-01

    Oenococcus kitaharae is only the second member of the genus Oenococcus to be identified and is the closest relative of the industrially important wine bacterium Oenococcus oeni. To provide insight into this new species, the genome of the type strain of O. kitaharae, DSM 17330, was sequenced. Comparison of the sequenced genomes of both species show that the genome of O. kitaharae DSM 17330 contains many genes with predicted functions in cellular defence (bacteriocins, antimicrobials, restriction-modification systems and a CRISPR locus) which are lacking in O. oeni. The two genomes also appear to differentially encode several metabolic pathways associated with amino acid biosynthesis and carbohydrate utilization and which have direct phenotypic consequences. This would indicate that the two species have evolved different survival techniques to suit their particular environmental niches. O. oeni has adapted to survive in the harsh, but predictable, environment of wine that provides very few competitive species. However O. kitaharae appears to have adapted to a growth environment in which biological competition provides a significant selective pressure by accumulating biological defence molecules, such as bacteriocins and restriction-modification systems, throughout its genome. PMID:22235313

  9. Solubility Challenges in High Concentration Monoclonal Antibody Formulations: Relationship with Amino Acid Sequence and Intermolecular Interactions.

    PubMed

    Pindrus, Mariya; Shire, Steven J; Kelley, Robert F; Demeule, Barthélemy; Wong, Rita; Xu, Yiren; Yadav, Sandeep

    2015-11-02

    The purpose of this work was to elucidate the molecular interactions leading to monoclonal antibody self-association and precipitation and utilize biophysical measurements to predict solubility behavior at high protein concentration. Two monoclonal antibodies (mAb-G and mAb-R) binding to overlapping epitopes were investigated. Precipitation of mAb-G solutions was most prominent at high ionic strength conditions and demonstrated strong dependence on ionic strength, as well as slight dependence on solution pH. At similar conditions no precipitation was observed for mAb-R solutions. Intermolecular interactions (interaction parameter, kD) related well with high concentration solubility behavior of both antibodies. Upon increasing buffer ionic strength, interactions of mAb-R tended to weaken, while those of mAb-G became more attractive. To investigate the role of amino acid sequence on precipitation behavior, mutants were designed by substituting the CDR of mAb-R into the mAb-G framework (GM-1) or deleting two hydrophobic residues in the CDR of mAb-G (GM-2). No precipitation was observed at high ionic strength for either mutant. The molecular interactions of mutants were similar in magnitude to those of mAb-R. The results suggest that presence of hydrophobic groups in the CDR of mAb-G may be responsible for compromising its solubility at high ionic strength conditions since deleting these residues mitigated the solubility issue.

  10. Automated methods of predicting the function of biological sequences using GO and BLAST

    PubMed Central

    Jones, Craig E; Baumann, Ute; Brown, Alfred L

    2005-01-01

    Background With the exponential increase in genomic sequence data there is a need to develop automated approaches to deducing the biological functions of novel sequences with high accuracy. Our aim is to demonstrate how accuracy benchmarking can be used in a decision-making process evaluating competing designs of biological function predictors. We utilise the Gene Ontology, GO, a directed acyclic graph of functional terms, to annotate sequences with functional information describing their biological context. Initially we examine the effect on accuracy scores of increasing the allowed distance between predicted and a test set of curator assigned terms. Next we evaluate several annotator methods using accuracy benchmarking. Given an unannotated sequence we use the Basic Local Alignment Search Tool, BLAST, to find similar sequences that have already been assigned GO terms by curators. A number of methods were developed that utilise terms associated with the best five matching sequences. These methods were compared against a benchmark method of simply using terms associated with the best BLAST-matched sequence (best BLAST approach). Results The precision and recall of estimates increases rapidly as the amount of distance permitted between a predicted term and a correct term assignment increases. Accuracy benchmarking allows a comparison of annotation methods. A covering graph approach performs poorly, except where the term assignment rate is high. A term distance concordance approach has a similar accuracy to the best BLAST approach, demonstrating lower precision but higher recall. However, a discriminant function method has higher precision and recall than the best BLAST approach and other methods shown here. Conclusion Allowing term predictions to be counted correct if closely related to a correct term decreases the reliability of the accuracy score. As such we recommend using accuracy measures that require exact matching of predicted terms with curator assigned

  11. Linguistic and spatial skills predict early arithmetic development via counting sequence knowledge.

    PubMed

    Zhang, Xiao; Koponen, Tuire; Räsänen, Pekka; Aunola, Kaisa; Lerkkanen, Marja-Kristiina; Nurmi, Jari-Erik

    2014-01-01

    Utilizing a longitudinal sample of Finnish children (ages 6-10), two studies examined how early linguistic (spoken vs. written) and spatial skills predict later development of arithmetic, and whether counting sequence knowledge mediates these associations. In Study 1 (N = 1,880), letter knowledge and spatial visualization, measured in kindergarten, predicted the level of arithmetic in first grade, and later growth through third grade. Study 2 (n = 378) further showed that these associations were mediated by counting sequence knowledge measured in first grade. These studies add to the literature by demonstrating the importance of written language for arithmetic development. The findings are consistent with the hypothesis that linguistic and spatial skills can improve arithmetic development by enhancing children's number-related knowledge.

  12. Fast and Accurate Accessible Surface Area Prediction Without a Sequence Profile.

    PubMed

    Faraggi, Eshel; Kouza, Maksim; Zhou, Yaoqi; Kloczkowski, Andrzej

    2017-01-01

    A fast accessible surface area (ASA) predictor is presented. In this new approach no residue mutation profiles generated by multiple sequence alignments are used as inputs. Instead, we use only single sequence information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for ASAquick are available from Research and Information Systems at http://mamiris.com and from the Battelle Center for Mathematical Medicine at http://mathmed.org .

  13. CATH: an expanded resource to predict protein function through structure and sequence

    PubMed Central

    Dawson, Natalie L.; Lewis, Tony E.; Das, Sayoni; Lees, Jonathan G.; Lee, David; Ashford, Paul; Orengo, Christine A.; Sillitoe, Ian

    2017-01-01

    The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the number of predicted protein domains in the previous version. The daily-updated CATH-B, which contains our very latest domain assignment data, provides putative classifications for over 100 000 additional protein domains. This article describes developments to the CATH-Gene3D resource over the last two years since the publication in 2015, including: significant increases to our structural and sequence coverage; expansion of the functional families in CATH; building a support vector machine (SVM) to automatically assign domains to superfamilies; improved search facilities to return alignments of query sequences against multiple sequence alignments; the redesign of the web pages and download site. PMID:27899584

  14. Structured prediction models for RNN based sequence labeling in clinical text

    PubMed Central

    Jagannatha, Abhyuday N; Yu, Hong

    2016-01-01

    Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies1 for structured prediction in order to improve the exact phrase detection of various medical entities. PMID:28004040

  15. Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human

    PubMed Central

    Wu, Chengchao; Yao, Shixin; Li, Xinghao; Chen, Chujia; Hu, Xuehai

    2017-01-01

    DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation. PMID:28212312

  16. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure.

    PubMed

    Capra, John A; Laskowski, Roman A; Thornton, Janet M; Singh, Mona; Funkhouser, Thomas A

    2009-12-01

    Identifying a protein's functional sites is an important step towards characterizing its molecular function. Numerous structure- and sequence-based methods have been developed for this problem. Here we introduce ConCavity, a small molecule binding site prediction algorithm that integrates evolutionary sequence conservation estimates with structure-based methods for identifying protein surface cavities. In large-scale testing on a diverse set of single- and multi-chain protein structures, we show that ConCavity substantially outperforms existing methods for identifying both 3D ligand binding pockets and individual ligand binding residues. As part of our testing, we perform one of the first direct comparisons of conservation-based and structure-based methods. We find that the two approaches provide largely complementary information, which can be combined to improve upon either approach alone. We also demonstrate that ConCavity has state-of-the-art performance in predicting catalytic sites and drug binding pockets. Overall, the algorithms and analysis presented here significantly improve our ability to identify ligand binding sites and further advance our understanding of the relationship between evolutionary sequence conservation and structural and functional attributes of proteins. Data, source code, and prediction visualizations are available on the ConCavity web site (http://compbio.cs.princeton.edu/concavity/).

  17. Amino acid sequences of two nonspecific lipid-transfer proteins from germinated castor bean.

    PubMed

    Takishima, K; Watanabe, S; Yamada, M; Suga, T; Mamiya, G

    1988-11-01

    The amino acid sequence of two nonspecific lipid-transfer proteins (nsLTP) B and C from germinated castor bean seeds have been determined. Both the proteins consist of 92 residues, as for nsLTP previously reported, and their calculated Mr values are 9847 and 9593 for nsLTP-B and nsLTP-C, respectively. The sequences of nsLTP-B and nsLTP-C, compared to the known sequence of nsLTP-A from the same source, are 68% and 35% similar, respectively. No variation was found at the positions of the cysteine residues, indicating that they might be involved in disulfide bridges.

  18. Blind Prediction of Deleterious Amino Acid Variations with SNPs&GO.

    PubMed

    Capriotti, Emidio; Martelli, Pier Luigi; Fariselli, Piero; Casadio, Rita

    2017-01-19

    SNPs&GO is a machine learning method for predicting the association of single amino acid variations (SAVs) to disease, considering protein functional annotation. The method is a binary classifier that implements a Support Vector Machine algorithm to discriminate between disease-related and neutral SAVs. SNPs&GO combines information from protein sequence with functional annotation encoded by Gene Ontology terms. Tested in sequence mode on more than 38,000 SAVs from the SwissVar dataset, our method reached 81% overall accuracy and an area under the receiving operating characteristic curve (AUC) of 0.88 with low false positive rate. In almost all the editions of the Critical Assessment of Genome Interpretation (CAGI) experiments, SNPs&GO ranked among the most accurate algorithms for predicting the effect of SAVs. In this paper we summarize the best results obtained by SNPs&GO on disease related variations of four CAGI challenges relative to the following genes: CHEK2 (CAGI 2010), RAD50 (CAGI 2011), p16-INK (CAGI 2013) and NAGLU (CAGI 2016). Result evaluation provides insights about the accuracy of our algorithm and the relevance of GO terms in annotating the effect of the variants. It also helps to define good practices for the detection of deleterious SAVs.

  19. A classification of glycosyl hydrolases based on amino acid sequence similarities.

    PubMed Central

    Henrissat, B

    1991-01-01

    The amino acid sequences of 301 glycosyl hydrolases and related enzymes have been compared. A total of 291 sequences corresponding to 39 EC entries could be classified into 35 families. Only ten sequences (less than 5% of the sample) could not be assigned to any family. With the sequences available for this analysis, 18 families were found to be monospecific (containing only one EC number) and 17 were found to be polyspecific (containing at least two EC numbers). Implications on the folding characteristics and mechanism of action of these enzymes and on the evolution of carbohydrate metabolism are discussed. With the steady increase in sequence and structural data, it is suggested that the enzyme classification system should perhaps be revised. PMID:1747104

  20. In silico comparative analysis of DNA and amino acid sequences for prion protein gene.

    PubMed

    Kim, Y; Lee, J; Lee, C

    2008-01-01

    Genetic variability might contribute to species specificity of prion diseases in various organisms. In this study, structures of the prion protein gene (PRNP) and its amino acids were compared among species of which sequence data were available. Comparisons of PRNP DNA sequences among 12 species including human, chimpanzee, monkey, bovine, ovine, dog, mouse, rat, wallaby, opossum, chicken and zebrafish allowed us to identify candidate regulatory regions in intron 1 and 3'-untranslated region (UTR) in addition to the coding region. Highly conserved putative binding sites for transcription factors, such as heat shock factor 2 (HSF2) and myocite enhancer factor 2 (MEF2), were discovered in the intron 1. In 3'-UTR, the functional sequence (ATTAAA) for nucleus-specific polyadenylation was found in all the analysed species. The functional sequence (TTTTTAT) for maturation-specific polyadenylation was identically observed only in ovine, and one or two nucleotide mismatches in the other species. A comparison of the amino acid sequences in 53 species revealed a large sequence identity. Especially the octapeptide repeat region was observed in all the species but frog and zebrafish. Functional changes and susceptibility to prion diseases with various isoforms of prion protein could be caused by numeric variability and conformational changes discovered in the repeat sequences.

  1. Complete amino acid sequence of the N-terminal extension of calf skin type III procollagen.

    PubMed Central

    Brandt, A; Glanville, R W; Hörlein, D; Bruckner, P; Timpl, R; Fietzek, P P; Kühn, K

    1984-01-01

    The N-terminal extension peptide of type III procollagen, isolated from foetal-calf skin, contains 130 amino acid residues. To determine its amino acid sequence, the peptide was reduced and carboxymethylated or aminoethylated and fragmented with trypsin, Staphylococcus aureus V8 proteinase and bacterial collagenase. Pyroglutamate aminopeptidase was used to deblock the N-terminal collagenase fragment to enable amino acid sequencing. The type III collagen extension peptide is homologous to that of the alpha 1 chain of type I procollagen with respect to a three-domain structure. The N-terminal 79 amino acids, which contain ten of the 12 cysteine residues, form a compact globular domain. The next 39 amino acids are in a collagenase triplet sequence (Gly- Xaa - Yaa )n with a high hydroxyproline content. Finally, another short non-collagenous domain of 12 amino acids ends at the cleavage site for procollagen aminopeptidase, which cleaves a proline-glutamine bond. In contrast with type I procollagen, the type III procollagen extension peptides contain interchain disulphide bridges located at the C-terminus of the triple-helical domain. PMID:6331392

  2. Quantitative analysis and prediction of G-quadruplex forming sequences in double-stranded DNA

    PubMed Central

    Kim, Minji; Kreig, Alex; Lee, Chun-Ying; Rube, H. Tomas; Calvert, Jacob; Song, Jun S.; Myong, Sua

    2016-01-01

    G-quadruplex (GQ) is a four-stranded DNA structure that can be formed in guanine-rich sequences. GQ structures have been proposed to regulate diverse biological processes including transcription, replication, translation and telomere maintenance. Recent studies have demonstrated the existence of GQ DNA in live mammalian cells and a significant number of potential GQ forming sequences in the human genome. We present a systematic and quantitative analysis of GQ folding propensity on a large set of 438 GQ forming sequences in double-stranded DNA by integrating fluorescence measurement, single-molecule imaging and computational modeling. We find that short minimum loop length and the thymine base are two main factors that lead to high GQ folding propensity. Linear and Gaussian process regression models further validate that the GQ folding potential can be predicted with high accuracy based on the loop length distribution and the nucleotide content of the loop sequences. Our study provides important new parameters that can inform the evaluation and classification of putative GQ sequences in the human genome. PMID:27095201

  3. Structure- and Sequence-Based Function Prediction for Non-Homologous Proteins

    PubMed Central

    Sael, Lee; Chitale, Meghana; Kihara, Daisuke

    2012-01-01

    The structural genomics projects have been accumulating an increasing number of protein structures, many of which remain functionally unknown. In parallel effort to experimental methods, computational methods are expected to make a significant contribution for functional elucidation of such proteins. However, conventional computational methods that transfer functions from homologous proteins do not help much for these uncharacterized protein structures because they do not have apparent structural or sequence similarity with the known proteins. Here, we briefly review two avenues of computational function prediction methods, i.e. structure-based methods and sequence-based methods. The focus is on our recently developments of local structure-based methods and sequence-based methods, which can effectively extract function information from distantly related proteins. Two structure-based methods, Pocket-Surfer and Patch-Surfer, identify similar known ligand binding sites for pocket regions in a query protein without using global protein fold similarity information. Two sequence-based methods, PFP and ESG, make use of weakly similar sequences that are conventionally discarded in homology based function annotation. Combined together with experimental methods we hope that computational methods will make leading contribution in functional elucidation of the protein structures. PMID:22270458

  4. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence...

  5. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence...

  6. 37 CFR 1.824 - Form and format for nucleotide and/or amino acid sequence submissions in computer readable form.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... nucleotide and/or amino acid sequence submissions in computer readable form. 1.824 Section 1.824 Patents... And/or Amino Acid Sequences § 1.824 Form and format for nucleotide and/or amino acid sequence... readable form may be created by any means, such as word processors, nucleotide/amino acid sequence...

  7. Using machine learning to predict gene expression and discover sequence motifs

    NASA Astrophysics Data System (ADS)

    Li, Xuejing

    Recently, large amounts of experimental data for complex biological systems have become available. We use tools and algorithms from machine learning to build data-driven predictive models. We first present a novel algorithm to discover gene sequence motifs associated with temporal expression patterns of genes. Our algorithm, which is based on partial least squares (PLS) regression, is able to directly model the flow of information, from gene sequence to gene expression, to learn cis regulatory motifs and characterize associated gene expression patterns. Our algorithm outperforms traditional computational methods e.g. clustering in motif discovery. We then present a study of extending a machine learning model for transcriptional regulation predictive of genetic regulatory response to Caenorhabditis elegans. We show meaningful results both in terms of prediction accuracy on the test experiments and biological information extracted from the regulatory program. The model discovers DNA binding sites ab initio. We also present a case study where we detect a signal of lineage-specific regulation. Finally we present a comparative study on learning predictive models for motif discovery, based on different boosting algorithms: Adaptive Boosting (AdaBoost), Linear Programming Boosting (LPBoost) and Totally Corrective Boosting (TotalBoost). We evaluate and compare the performance of the three boosting algorithms via both statistical and biological validation, for hypoxia response in Saccharomyces cerevisiae.

  8. Prediction of human miRNA target genes using computationally reconstructed ancestral mammalian sequences

    PubMed Central

    Leclercq, Mickael; Diallo, Abdoulaye Baniré; Blanchette, Mathieu

    2017-01-01

    MicroRNAs (miRNA) are short single-stranded RNA molecules derived from hairpin-forming precursors that play a crucial role as post-transcriptional regulators in eukaryotes and viruses. In the past years, many microRNA target genes (MTGs) have been identified experimentally. However, because of the high costs of experimental approaches, target genes databases remain incomplete. Although several target prediction programs have been developed in the recent years to identify MTGs in silico, their specificity and sensitivity remain low. Here, we propose a new approach called MirAncesTar, which uses ancestral genome reconstruction to boost the accuracy of existing MTGs prediction tools for human miRNAs. For each miRNA and each putative human target UTR, our algorithm makes uses of existing prediction tools to identify putative target sites in the human UTR, as well as in its mammalian orthologs and inferred ancestral sequences. It then evaluates evidence in support of selective pressure to maintain target site counts (rather than sequences), accounting for the possibility of target site turnover. It finally integrates this measure with several simpler ones using a logistic regression predictor. MirAncesTar improves the accuracy of existing MTG predictors by 26% to 157%. Source code and prediction results for human miRNAs, as well as supporting evolutionary data are available at http://cs.mcgill.ca/∼blanchem/mirancestar. PMID:27899600

  9. Complete amino acid sequence of branched-chain amino acid aminotransferase (transaminase B) of Salmonella typhimurium, identification of the coenzyme-binding site and sequence comparison analysis

    SciTech Connect

    Feild, M.J.

    1988-01-01

    The complete amino acid sequence of the subunit of branched-chain amino acid aminotransferase of Salmonella typhimurium was determined by automated Edman degradation of peptide fragments generated by chemical and enzymatic digestion of S-carboxymethylated and S-pyridylethylated transaminase B. Peptide fragments of transaminase B were generated by treatment of the enzyme with trypsin, Staphylococcus aureus V8 protease, endoproteinase Lys-C, and cyanogen bromide. Protocols were developed for separation of the peptide fragments by reverse-phase high performance liquid chromatography (HPLC), ion-exchange HPLC, and SDS-urea gel electrophoresis. The enzyme subunit contains 308 amino acid residues and has a molecular weight of 33,920 daltons. The coenzyme-binding site was determined by treatment of the enzyme, containing bound pyridoxal 5-phosphate, with tritiated sodium borohydride prior to trypsin digestion. Monitoring radioactivity incorporation and peptide map comparisons with an apoenzyme tryptic digest, allowed identification of the pyridoxylated-peptide which was isolated by reverse-phase HPLC and sequenced. The coenzyme-binding site is a lysyl residue at position 159. Some peptides were further characterized by fast atom bombardment mass spectrometry.

  10. Analyses of mitochondrial amino acid sequence datasets support the proposal that specimens of Hypodontus macropi from three species of macropodid hosts represent distinct species

    PubMed Central

    2013-01-01

    Background Hypodontus macropi is a common intestinal nematode of a range of kangaroos and wallabies (macropodid marsupials). Based on previous multilocus enzyme electrophoresis (MEE) and nuclear ribosomal DNA sequence data sets, H. macropi has been proposed to be complex of species. To test this proposal using independent molecular data, we sequenced the whole mitochondrial (mt) genomes of individuals of H. macropi from three different species of hosts (Macropus robustus robustus, Thylogale billardierii and Macropus [Wallabia] bicolor) as well as that of Macropicola ocydromi (a related nematode), and undertook a comparative analysis of the amino acid sequence datasets derived from these genomes. Results The mt genomes sequenced by next-generation (454) technology from H. macropi from the three host species varied from 13,634 bp to 13,699 bp in size. Pairwise comparisons of the amino acid sequences predicted from these three mt genomes revealed differences of 5.8% to 18%. Phylogenetic analysis of the amino acid sequence data sets using Bayesian Inference (BI) showed that H. macropi from the three different host species formed distinct, well-supported clades. In addition, sliding window analysis of the mt genomes defined variable regions for future population genetic studies of H. macropi in different macropodid hosts and geographical regions around Australia. Conclusions The present analyses of inferred mt protein sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. PMID:24261823

  11. Predicting RNA-binding residues from evolutionary information and sequence conservation

    PubMed Central

    2010-01-01

    Abstract Background RNA-binding proteins (RBPs) play crucial roles in post-transcriptional control of RNA. RBPs are designed to efficiently recognize specific RNA sequences after it is derived from the DNA sequence. To satisfy diverse functional requirements, RNA binding proteins are composed of multiple blocks of RNA-binding domains (RBDs) presented in various structural arrangements to provide versatile functions. The ability to computationally predict RNA-binding residues in a RNA-binding protein can help biologists reveal important site-directed mutagenesis in wet-lab experiments. Results The proposed prediction framework named “ProteRNA” combines a SVM-based classifier with conserved residue discovery by WildSpan to identify the residues that interact with RNA in a RNA-binding protein. Although these conserved residues can be either functionally conserved residues or structurally conserved residues, they provide clues on the important residues in a protein sequence. In the independent testing dataset, ProteRNA has been able to deliver overall accuracy of 89.78%, MCC of 0.2628, F-score of 0.3075, and F0.5-score of 0.3546. Conclusions This article presents the design of a sequence-based predictor aiming to identify the RNA-binding residues in a RNA-binding protein by combining machine learning and pattern mining approaches. RNA-binding proteins have diverse functions while interacting with different categories of RNAs because these proteins are composed of multiple copies of RNA-binding domains presented in various structural arrangements to expand the functional repertoire of RNA-binding proteins. Furthermore, predicting RNA-binding residues in a RNA-binding protein can help biologists reveal important site-directed mutagenesis in wet-lab experiments. PMID:21143803

  12. Nucleotide sequence and spatial expression pattern of a drought- and abscisic Acid-induced gene of tomato.

    PubMed

    Plant, A L; Cohen, A; Moses, M S; Bray, E A

    1991-11-01

    The nucleotide sequence of le16, a tomato (Lycopersicon esculentum Mill.) gene induced by drought stress and regulated by abscisic acid specifically in aerial vegetative tissue, is presented. The single open reading frame contained within the gene has the capacity to encode a polypeptide of 12.7 kilodaltons and is interrupted by a small intron. The predicted polypeptide is rich in leucine, glycine, and alanine and has an isoelectric point of 8.7. The amino terminus is hydrophobic and characteristic of signal sequences that target polypeptides for export from the cytoplasm. There is homology (47.2% identity) between the amino terminus of the LE 16 polypeptide and the corresponding amino terminal domain of the maize phospholipid transfer protein. le16 was expressed in drought-stressed leaf, petiole, and stem tissue and to a much lower extent in the pericarp of mature green tomato fruit and developing seeds. No expression was detected in the pericarp of red fruit or in drought-stressed roots. Expression of le16 was also induced in leaf tissue by a variety of other abiotic stresses including polyethylene glycol-mediated water deficit, salinity, cold stress, and heat stress. None of these stresses or direct applications of abscisic acid induced the expression of le16 in the roots of the same plants. The unique expression characteristics of this gene indicates that novel regulatory mechanisms, in addition to endogenous abscisic acid, are involved in controlling gene expression.

  13. The amino acid sequence of cytochromes c-551 from three species of Pseudomonas

    PubMed Central

    Ambler, R. P.; Wynn, Margaret

    1973-01-01

    The amino acid sequences of the cytochromes c-551 from three species of Pseudomonas have been determined. Each resembles the protein from Pseudomonas strain P6009 (now known to be Pseudomonas aeruginosa, not Pseudomonas fluorescens) in containing 82 amino acids in a single peptide chain, with a haem group covalently attached to cysteine residues 12 and 15. In all four sequences 43 residues are identical. Although by bacteriological criteria the organisms are closely related, the differences between pairs of sequences range from 22% to 39%. These values should be compared with the differences in the sequence of mitochondrial cytochrome c between mammals and amphibians (about 18%) or between mammals and insects (about 33%). Detailed evidence for the amino acid sequences of the proteins has been deposited as Supplementary Publication SUP 50015 at the National Lending Library for Science and Technology, Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1973), 131, 5. PMID:4352718

  14. Draft Genome Sequence of Sorghum Grain Mold Fungus Epicoccum sorghinum, a Producer of Tenuazonic Acid

    PubMed Central

    Oliveira, Rodrigo C.; Davenport, Karen W.; Hovde, Blake; Silva, Danielle; Chain, Patrick S. G.; Correa, Benedito

    2017-01-01

    ABSTRACT The facultative plant pathogen Epicoccum sorghinum is associated with grain mold of sorghum and produces the mycotoxin tenuazonic acid. This fungus can have serious economic impact on sorghum production. Here, we report the draft genome sequence of E. sorghinum (USPMTOX48). PMID:28126937

  15. Snake venom. The amino acid sequence of protein A from Dendroaspis polylepis polylepis (black mamba) venom.

    PubMed

    Joubert, F J; Strydom, D J

    1980-12-01

    Protein A from Dendroaspis polylepis polylepis venom comprises 81 amino acids, including ten half-cystine residues. The complete primary structures of protein A and its variant A' were elucidated. The sequences of proteins A and A', which differ in a single position, show no homology with various neurotoxins and non-neurotoxic proteins and represent a new type of elapid venom protein.

  16. Amino acid sequences of heterotrophic and photosynthetic ferredoxins from the tomato plant (Lycopersicon esculentum Mill.).

    PubMed

    Kamide, K; Sakai, H; Aoki, K; Sanada, Y; Wada, K; Green, L S; Yee, B C; Buchanan, B B

    1995-11-01

    Several forms (isoproteins) of ferredoxin in roots, leaves, and green and red pericarps in tomato plants (Lycopersicon esculentum Mill.) were earlier identified on the basis of N-terminal amino acid sequence and chromatographic behavior (Green et al. 1991). In the present study, a large scale preparation made possible determination of the full length amino acid sequence of the two ferredoxins from leaves. The ferredoxins characteristic of fruit and root were sequenced from the amino terminus to the 30th residue or beyond. The leaf ferredoxins were confirmed to be expressed in pericarp of both green and red fruit. The ferredoxins characteristic of fruit and root appeared to be restricted to those tissue. The results extend earlier findings in demonstrating that ferredoxin occurs in the major organs of the tomato plant where it appears to function irrespective of photosynthetic competence.

  17. Amino acid sequence of myoglobin from white-tailed deer (Odocoileus virginianus).

    PubMed

    Joseph, Poulson; Suman, Surendranath P; Li, Shuting; Fontaine, Michele; Steinke, Laurey

    2012-10-01

    Our objective was to determine the primary structure of white-tailed deer myoglobin (Mb). White-tailed deer Mb was isolated from cardiac muscles employing ammonium sulfate precipitation and gel-filtration chromatography. The amino acid sequence was determined by Edman degradation. Sequence analyses of intact Mb as well as tryptic- and cyanogen bromide-peptides yielded the complete primary structure of white-tailed deer Mb, which shared 100% similarity with red deer Mb. White-tailed deer Mb consists of 153 amino acid residues and shares more than 96% sequence similarity with myoglobins from meat-producing ruminants, such as cattle, buffalo, sheep, and goat. Similar to sheep and goat myoglobins, white-tailed deer Mb contains 12 histidine residues. Proximal (position 93) and distal (position 64) histidine residues responsible for maintaining the stability of heme are conserved in white-tailed deer Mb.

  18. Nucleotide sequence and the encoded amino acids of human apolipoprotein A-I mRNA.

    PubMed Central

    Law, S W; Brewer, H B

    1984-01-01

    The cDNA clones encoding the precursor form of human liver apolipoprotein A-I (apoA-I), preproapoA-I, have been isolated from a cDNA library. A 17-base synthetic oligonucleotide based on residues 108-113 of apoA-I and a 26-base primer-extended, dideoxynucleotide-terminated cDNA were used as hybridization probes to select for recombinant plasmids bearing the apoA-I sequence. The complete nucleic acid sequence of human liver preproapoA-I has been determined by analysis of the cloned cDNA. The sequence is composed of 801 nucleotides encoding 267 amino acid residues. PreproapoA-I contains an 18-amino-acid prepeptide and a 6-amino-acid propeptide connected to the amino terminus of the 243-amino acid mature apoA-I. Southern blotting analysis of chromosomal DNA obtained from peripheral blood indicated the apoA-I gene is contained in a 2.1-kilobase-pair Pst I fragment and there is no gross difference in structural organization between the normal apoA-I gene and the Tangier disease apoA-I gene. Images PMID:6198645

  19. Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints

    PubMed Central

    Dowell, Robin D; Eddy, Sean R

    2006-01-01

    Background We are interested in the problem of predicting secondary structure for small sets of homologous RNAs, by incorporating limited comparative sequence information into an RNA folding model. The Sankoff algorithm for simultaneous RNA folding and alignment is a basis for approaches to this problem. There are two open problems in applying a Sankoff algorithm: development of a good unified scoring system for alignment and folding and development of practical heuristics for dealing with the computational complexity of the algorithm. Results We use probabilistic models (pair stochastic context-free grammars, pairSCFGs) as a unifying framework for scoring pairwise alignment and folding. A constrained version of the pairSCFG structural alignment algorithm was developed which assumes knowledge of a few confidently aligned positions (pins). These pins are selected based on the posterior probabilities of a probabilistic pairwise sequence alignment. Conclusion Pairwise RNA structural alignment improves on structure prediction accuracy relative to single sequence folding. Constraining on alignment is a straightforward method of reducing the runtime and memory requirements of the algorithm. Five practical implementations of the pairwise Sankoff algorithm – this work (Consan), David Mathews' Dynalign, Ian Holmes' Stemloc, Ivo Hofacker's PMcomp, and Jan Gorodkin's FOLDALIGN – have comparable overall performance with different strengths and weaknesses. PMID:16952317

  20. Mathematical Characterization of Protein Sequences Using Patterns as Chemical Group Combinations of Amino Acids.

    PubMed

    Das, Jayanta Kumar; Das, Provas; Ray, Korak Kumar; Choudhury, Pabitra Pal; Jana, Siddhartha Sankar

    2016-01-01

    Comparison of amino acid sequence similarity is the fundamental concept behind the protein phylogenetic tree formation. By virtue of this method, we can explain the evolutionary relationships, but further explanations are not possible unless sequences are studied through the chemical nature of individual amino acids. Here we develop a new methodology to characterize the protein sequences on the basis of the chemical nature of the amino acids. We design various algorithms for studying the variation of chemical group transitions and various chemical group combinations as patterns in the protein sequences. The amino acid sequence of conventional myosin II head domain of 14 family members are taken to illustrate this new approach. We find two blocks of maximum length 6 aa as 'FPKATD' and 'Y/FTNEKL' without repeating the same chemical nature and one block of maximum length 20 aa with the repetition of chemical nature which are common among all 14 members. We also check commonality with another motor protein sub-family kinesin, KIF1A. Based on our analysis we find a common block of length 8 aa both in myosin II and KIF1A. This motif is located in the neck linker region which could be responsible for the generation of mechanical force, enabling us to find the unique blocks which remain chemically conserved across the family. We also validate our methodology with different protein families such as MYOI, Myosin light chain kinase (MLCK) and Rho-associated protein kinase (ROCK), Na+/K+-ATPase and Ca2+-ATPase. Altogether, our studies provide a new methodology for investigating the conserved amino acids' pattern in different proteins.

  1. De novo prediction of RNA-protein interactions from sequence information.

    PubMed

    Wang, Ying; Chen, Xiaowei; Liu, Zhi-Ping; Huang, Qiang; Wang, Yong; Xu, Derong; Zhang, Xiang-Sun; Chen, Runsheng; Chen, Luonan

    2013-01-27

    Protein-RNA interactions are fundamentally important in understanding cellular processes. In particular, non-coding RNA-protein interactions play an important role to facilitate biological functions in signalling, transcriptional regulation, and even the progression of complex diseases. However, experimental determination of protein-RNA interactions remains time-consuming and labour-intensive. Here, we develop a novel extended naïve-Bayes-classifier for de novo prediction of protein-RNA interactions, only using protein and RNA sequence information. Specifically, we first collect a set of known protein-RNA interactions as gold-standard positives and extract sequence-based features to represent each protein-RNA pair. To fill the gap between high dimensional features and scarcity of gold-standard positives, we select effective features by cutting a likelihood ratio score, which not only reduces the computational complexity but also allows transparent feature integration during prediction. An extended naïve Bayes classifier is then constructed using these effective features to train a protein-RNA interaction prediction model. Numerical experiments show that our method can achieve the prediction accuracy of 0.77 even though only a small number of protein-RNA interaction data are available. In particular, we demonstrate that the extended naïve-Bayes-classifier is superior to the naïve-Bayes-classifier by fully considering the dependences among features. Importantly, we conduct ncRNA pull-down experiments to validate the predicted novel protein-RNA interactions and identify the interacting proteins of sbRNA CeN72 in C. elegans, which further demonstrates the effectiveness of our method.

  2. A Novel Method for Accurate Operon Predictions in All SequencedProkaryotes

    SciTech Connect

    Price, Morgan N.; Huang, Katherine H.; Alm, Eric J.; Arkin, Adam P.

    2004-12-01

    We combine comparative genomic measures and the distance separating adjacent genes to predict operons in 124 completely sequenced prokaryotic genomes. Our method automatically tailors itself to each genome using sequence information alone, and thus can be applied to any prokaryote. For Escherichia coli K12 and Bacillus subtilis, our method is 85 and 83% accurate, respectively, which is similar to the accuracy of methods that use the same features but are trained on experimentally characterized transcripts. In Halobacterium NRC-1 and in Helicobacterpylori, our method correctly infers that genes in operons are separated by shorter distances than they are in E.coli, and its predictions using distance alone are more accurate than distance-only predictions trained on a database of E.coli transcripts. We use microarray data from sixphylogenetically diverse prokaryotes to show that combining intergenic distance with comparative genomic measures further improves accuracy and that our method is broadly effective. Finally, we survey operon structure across 124 genomes, and find several surprises: H.pylori has many operons, contrary to previous reports; Bacillus anthracis has an unusual number of pseudogenes within conserved operons; and Synechocystis PCC6803 has many operons even though it has unusually wide spacings between conserved adjacent genes.

  3. Development of a protein-ligand-binding site prediction method based on interaction energy and sequence conservation.

    PubMed

    Tsujikawa, Hiroto; Sato, Kenta; Wei, Cao; Saad, Gul; Sumikoshi, Kazuya; Nakamura, Shugo; Terada, Tohru; Shimizu, Kentaro

    2016-09-01

    We present a new method for predicting protein-ligand-binding sites based on protein three-dimensional structure and amino acid conservation. This method involves calculation of the van der Waals interaction energy between a protein and many probes placed on the protein surface and subsequent clustering of the probes with low interaction energies to identify the most energetically favorable locus. In addition, it uses amino acid conservation among homologous proteins. Ligand-binding sites were predicted by combining the interaction energy and the amino acid conservation score. The performance of our prediction method was evaluated using a non-redundant dataset of 348 ligand-bound and ligand-unbound protein structure pairs, constructed by filtering entries in a ligand-binding site structure database, LigASite. Ligand-bound structure prediction (bound prediction) indicated that 74.0 % of predicted ligand-binding sites overlapped with real ligand-binding sites by over 25 % of their volume. Ligand-unbound structure prediction (unbound prediction) indicated that 73.9 % of predicted ligand-binding residues overlapped with real ligand-binding residues. The amino acid conservation score improved the average prediction accuracy by 17.0 and 17.6 points for the bound and unbound predictions, respectively. These results demonstrate the effectiveness of the combined use of the interaction energy and amino acid conservation in the ligand-binding site prediction.

  4. Predicting candidate genomic sequences that correspond to synthetic functional RNA motifs

    PubMed Central

    Laserson, Uri; Gan, Hin Hark; Schlick, Tamar

    2005-01-01

    Riboswitches and RNA interference are important emerging mechanisms found in many organisms to control gene expression. To enhance our understanding of such RNA roles, finding small regulatory motifs in genomes presents a challenge on a wide scale. Many simple functional RNA motifs have been found by in vitro selection experiments, which produce synthetic target-binding aptamers as well as catalytic RNAs, including the hammerhead ribozyme. Motivated by the prediction of Piganeau and Schroeder [(2003) Chem. Biol., 10, 103–104] that synthetic RNAs may have natural counterparts, we develop and apply an efficient computational protocol for identifying aptamer-like motifs in genomes. We define motifs from the sequence and structural information of synthetic aptamers, search for sequences in genomes that will produce motif matches, and then evaluate the structural stability and statistical significance of the potential hits. Our application to aptamers for streptomycin, chloramphenicol, neomycin B and ATP identifies 37 candidate sequences (in coding and non-coding regions) that fold to the target aptamer structures in bacterial and archaeal genomes. Further energetic screening reveals that several candidates exhibit energetic properties and sequence conservation patterns that are characteristic of functional motifs. Besides providing candidates for experimental testing, our computational protocol offers an avenue for expanding natural RNA's functional repertoire. PMID:16254081

  5. Many amino acid substitution variants identified in DNA repair genes during human population screenings are predicted to impact protein function

    SciTech Connect

    Xi, T; Jones, I M; Mohrenweiser, H W

    2003-11-03

    Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant From Tolerant (SIFT) classified 226 of 508 variants (44%) as ''Intolerant''. Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as ''Probably or Possibly Damaging''. Another 9-15% of the variants were classed as ''Potentially Intolerant or Damaging''. The results from the two algorithms are highly associated, with concordance in predicted impact observed for {approx}62% of the variants. Twenty one to thirty one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as ''Tolerant'' or ''Benign''. Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.

  6. Software scripts for quality checking of high-throughput nucleic acid sequencers.

    PubMed

    Lazo, G R; Tong, J; Miller, R; Hsia, C; Rausch, C; Kang, Y; Anderson, O D

    2001-06-01

    We have developed a graphical interface to allow the researcher to view and assess the quality of sequencing results using a series of program scripts developed to process data generated by automated sequencers. The scripts are written in Perl programming language and are executable under the cgibin directory of a Web server environment. The scripts direct nucleic acid sequencing trace file data output from automated sequencers to be analyzed by the phred molecular biology program and are displayed as graphical hypertext mark-up language (HTML) pages. The scripts are mainly designed to handle 96-well microtiter dish samples, but the scripts are also able to read data from 384-well microtiter dishes 96 samples at a time. The scripts may be customized for different laboratory environments and computer configurations. Web links to the sources and discussion page are provided.

  7. Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    PubMed Central

    2011-01-01

    Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite

  8. Depositional sequence analysis and sedimentologic modeling for improved prediction of Pennsylvanian reservoirs (Annex 1)

    SciTech Connect

    Watney, W.L.

    1992-01-01

    Interdisciplinary studies of the Upper Pennsylvanian Lansing and Kansas City groups have been undertaken in order to improve the geologic characterization of petroleum reservoirs and to develop a quantitative understanding of the processes responsible for formation of associated depositional sequences. To this end, concepts and methods of sequence stratigraphy are being used to define and interpret the three-dimensional depositional framework of the Kansas City Group. The investigation includes characterization of reservoir rocks in oil fields in western Kansas, description of analog equivalents in near-surface and surface sites in southeastern Kansas, and construction of regional structural and stratigraphic framework to link the site specific studies. Geologic inverse and simulation models are being developed to integrate quantitative estimates of controls on sedimentation to produce reconstructions of reservoir-bearing strata in an attempt to enhance our ability to predict reservoir characteristics.

  9. Amino acid sequence of band-3 protein from rainbow trout erythrocytes derived from cDNA.

    PubMed Central

    Hübner, S; Michel, F; Rudloff, V; Appelhans, H

    1992-01-01

    In this report we present the first complete band-3 cDNA sequence of a poikilothermic lower vertebrate. The primary structure of the anion-exchange protein band 3 (AE1) from rainbow trout erythrocytes was determined by nucleotide sequencing of cDNA clones. The overlapping clones have a total length of 3827 bp with a 5'-terminal untranslated region of 150 bp, a 2754 bp open reading frame and a 3'-untranslated region of 924 bp. Band-3 protein from trout erythrocytes consists of 918 amino acid residues with a calculated molecular mass of 101 827 Da. Comparison of its amino acid sequence revealed a 60-65% identity within the transmembrane spanning sequence of band-3 proteins published so far. An additional insertion of 24 amino acid residues within the membrane-associated domain of trout band-3 protein was identified, which until now was thought to be a general feature only of mammalian band-3-related proteins. PMID:1637296

  10. Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Patel, Kamlesh D [Ken; SNL,

    2016-07-12

    Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  11. Prediction of HIV-1 coreceptor usage (tropism) by sequence analysis using a genotypic approach.

    PubMed

    Sierra, Saleta; Kaiser, Rolf; Lübke, Nadine; Thielen, Alexander; Schuelter, Eugen; Heger, Eva; Däumer, Martin; Reuter, Stefan; Esser, Stefan; Fätkenheuer, Gerd; Pfister, Herbert; Oette, Mark; Lengauer, Thomas

    2011-12-01

    Maraviroc (MVC) is the first licensed antiretroviral drug from the class of coreceptor antagonists. It binds to the host coreceptor CCR5, which is used by the majority of HIV strains in order to infect the human immune cells (Fig. 1). Other HIV isolates use a different coreceptor, the CXCR4. Which receptor is used, is determined in the virus by the Env protein (Fig. 2). Depending on the coreceptor used, the viruses are classified as R5 or X4, respectively. MVC binds to the CCR5 receptor inhibiting the entry of R5 viruses into the target cell. During the course of disease, X4 viruses may emerge and outgrow the R5 viruses. Determination of coreceptor usage (also called tropism) is therefore mandatory prior to administration of MVC, as demanded by EMA and FDA. The studies for MVC efficiency MOTIVATE, MERIT and 1029 have been performed with the Trofile assay from Monogram, San Francisco, U.S.A. This is a high quality assay based on sophisticated recombinant tests. The acceptance for this test for daily routine is rather low outside of the U.S.A., since the European physicians rather tend to work with decentralized expert laboratories, which also provide concomitant resistance testing. These laboratories have undergone several quality assurance evaluations, the last one being presented in 2011. For several years now, we have performed tropism determinations based on sequence analysis from the HIV env-V3 gene region (V3). This region carries enough information to perform a reliable prediction. The genotypic determination of coreceptor usage presents advantages such as: shorter turnover time (equivalent to resistance testing), lower costs, possibility to adapt the results to the patients' needs and possibility of analysing clinical samples with very low or even undetectable viral load (VL), particularly since the number of samples analysed with VL < 1000 copies/μl roughly increased in the last years (Fig. 3). The main steps for tropism testing (Fig. 4) demonstrated in

  12. Role of the two-component leader sequence and mature amino acid sequences in extracellular export of endoglucanase EGL from Pseudomonas solanacearum.

    PubMed Central

    Huang, J Z; Schell, M A

    1992-01-01

    The egl gene of Pseudomonas solanacearum encodes a 43-kDa extracellular endoglucanase (mEGL) involved in wilt disease caused by this phytopathogen. Egl is initially translated with a 45-residue, two-part leader sequence. The first 19 residues are apparently removed by signal peptidase II during export of Egl across the inner membrane (IM); the remaining residues of the leader sequence (modified with palmitate) are removed during export across the outer membrane (OM). Localization of Egl-PhoA fusion proteins showed that the first 26 residues of the Egl leader sequence are required and sufficient to direct lipid modification, processing, and export of Egl or PhoA across the IM but not the OM. Fusions of the complete 45-residue leader sequence or of the leader and increasing portions of mEgl sequences to PhoA did not cause its export across the OM. In-frame deletion of portions of mEGL-coding sequences blocked export of the truncated polypeptides across the OM without affecting export across the IM. These results indicate that the first part of the leader sequence functions independently to direct export of Egl across the IM while the second part and sequences and structures in mEGL are involved in export across the OM. Computer analysis of the mEgl amino acid sequence obtained from its nucleotide sequence identified a region of mEGL similar in amino acid sequence to regions in other prokaryotic endoglucanases. Images PMID:1735723

  13. Studies on adenosine triphosphate transphosphorylases. Amino acid sequence of rabbit muscle ATP-AMP transphosphorylase.

    PubMed

    Kuby, S A; Palmieri, R H; Frischat, A; Fischer, A H; Wu, L H; Maland, L; Manship, M

    1984-05-22

    The total amino acid sequence of rabbit muscle adenylate kinase has been determined, and the single polypeptide chain of 194 amino acid residues starts with N-acetylmethionine and ends with leucyllysine at its carboxyl terminus, in agreement with the earlier data on its amino acid composition [Mahowald, T. A., Noltmann, E. A., & Kuby, S. A. (1962) J. Biol. Chem. 237, 1138-1145] and its carboxyl-terminus sequence [Olson, O. E., & Kuby, S. A. (1964) J. Biol. Chem. 239, 460-467]. Elucidation of the primary structure was based on tryptic and chymotryptic cleavages of the performic acid oxidized protein, cyanogen bromide cleavages of the 14C-labeled S-carboxymethylated protein at its five methionine sites (followed by maleylation of peptide fragments), and tryptic cleavages at its 12 arginine sites of the maleylated 14C-labeled S-carboxymethylated protein. Calf muscle myokinase, whose sequence has also been established, differs primarily from the rabbit muscle myokinase's sequence in the following: His-30 is replaced by Gln-30; Lys-56 is replaced by Met-56; Ala-84 and Asp 85 are replaced by Val-84 and Asn-85. A comparison of the four muscle-type adenylate kinases, whose covalent structures have now been determined, viz., rabbit, calf, porcine, and human [for the latter two sequences see Heil, A., Müller, G., Noda, L., Pinder, T., Schirmer, H., Schirmer, I., & Von Zabern, I. (1974) Eur. J. Biochem. 43, 131-144, and Von Zabern, I., Wittmann-Liebold, B., Untucht-Grau, R., Schirmer, R. H., & Pai, E. F. (1976) Eur. J. Biochem. 68, 281-290], demonstrates an extraordinary degree of homology.(ABSTRACT TRUNCATED AT 250 WORDS)

  14. Mathematical Characterization of Protein Sequences Using Patterns as Chemical Group Combinations of Amino Acids

    PubMed Central

    Choudhury, Pabitra Pal; Jana, Siddhartha Sankar

    2016-01-01

    Comparison of amino acid sequence similarity is the fundamental concept behind the protein phylogenetic tree formation. By virtue of this method, we can explain the evolutionary relationships, but further explanations are not possible unless sequences are studied through the chemical nature of individual amino acids. Here we develop a new methodology to characterize the protein sequences on the basis of the chemical nature of the amino acids. We design various algorithms for studying the variation of chemical group transitions and various chemical group combinations as patterns in the protein sequences. The amino acid sequence of conventional myosin II head domain of 14 family members are taken to illustrate this new approach. We find two blocks of maximum length 6 aa as ‘FPKATD’ and ‘Y/FTNEKL’ without repeating the same chemical nature and one block of maximum length 20 aa with the repetition of chemical nature which are common among all 14 members. We also check commonality with another motor protein sub-family kinesin, KIF1A. Based on our analysis we find a common block of length 8 aa both in myosin II and KIF1A. This motif is located in the neck linker region which could be responsible for the generation of mechanical force, enabling us to find the unique blocks which remain chemically conserved across the family. We also validate our methodology with different protein families such as MYOI, Myosin light chain kinase (MLCK) and Rho-associated protein kinase (ROCK), Na+/K+-ATPase and Ca2+-ATPase. Altogether, our studies provide a new methodology for investigating the conserved amino acids’ pattern in different proteins. PMID:27930687

  15. The complete amino acid sequence of a trypsin inhibitor from Bauhinia variegata var. candida seeds.

    PubMed

    Di Ciero, L; Oliva, M L; Torquato, R; Köhler, P; Weder, J K; Camillo Novello, J; Sampaio, C A; Oliveira, B; Marangoni, S

    1998-11-01

    Trypsin inhibitors of two varieties of Bauhinia variegata seeds have been isolated and characterized. Bauhinia variegata candida trypsin inhibitor (BvcTI) and B. variegata lilac trypsin inhibitor (BvlTI) are proteins with Mr of about 20,000 without free sulfhydryl groups. Amino acid analysis shows a high content of aspartic acid, glutamic acid, serine, and glycine, and a low content of histidine, tyrosine, methionine, and lysine in both inhibitors. Isoelectric focusing for both varieties detected three isoforms (pI 4.85, 5.00, and 5.15), which were resolved by HPLC procedure. The trypsin inhibitors show Ki values of 6.9 and 1.2 nM for BvcTI and BvlTI, respectively. The N-terminal sequences of the three trypsin inhibitor isoforms from both varieties of Bauhinia variegata and the complete amino acid sequence of B. variegata var. candida L. trypsin inhibitor isoform 3 (BvcTI-3) are presented. The sequences have been determined by automated Edman degradation of the reduced and carboxymethylated proteins of the peptides resulting from Staphylococcus aureus protease and trypsin digestion. BvcTI-3 is composed of 167 residues and has a calculated molecular mass of 18,529. Homology studies with other trypsin inhibitors show that BvcTI-3 belongs to the Kunitz family. The putative active site encompasses Arg (63)-Ile (64).

  16. Multiple site-selective insertions of non-canonical amino acids into sequence-repetitive polypeptides

    PubMed Central

    Wu, I-Lin; Patterson, Melissa A.; Carpenter Desai, Holly E.; Mehl, Ryan A.; Giorgi, Gianluca

    2013-01-01

    A simple and efficient method is described for introduction of non-canonical amino acids at multiple, structurally defined sites within recombinant polypeptide sequences. E. coli MRA30, a bacterial host strain with attenuated activity for release factor 1 (RF1), is assessed for its ability to support the incorporation of a diverse range of non-canonical amino acids in response to multiple encoded amber (TAG) codons within genetic templates derived from superfolder GFP and an elastin-mimetic protein polymer. Suppression efficiency and isolated protein yield were observed to depend on the identity of the orthogonal aminoacyl-tRNA synthetase/tRNACUA pair and the non-canonical amino acid substrate. This approach afforded elastin-mimetic protein polymers containing non-canonical amino acid derivatives at up to twenty-two positions within the repeat sequence with high levels of substitution. The identity and position of the variant residues was confirmed by mass spectrometric analysis of the full-length polypeptides and proteolytic cleavage fragments resulting from thermolysin digestion. The accumulated data suggest that this multi-site suppression approach permits the preparation of protein-based materials in which novel chemical functionality can be introduced at precisely defined positions within the polypeptide sequence. PMID:23625817

  17. SUBGROUPS OF AMINO ACID SEQUENCES IN THE VARIABLE REGIONS OF IMMUNOGLOBULIN HEAVY CHAINS*

    PubMed Central

    Cunningham, Bruce A.; Pflumm, Mollie N.; User, Urs Rutisha; Edelman, Gerald M.

    1969-01-01

    The amino acid sequence of the first 133 residues of the heavy (γ) chain from a human γG immunoglobulin (He) has been determined. This γ-chain is identical in Gm type to that of protein Eu, the complete sequence of which has been reported. Comparison of the two sequences substantiates the previous suggestion that there are subgroups of variable regions of heavy chains. The variable region of Eu has been assigned to subgroup I and that of He to subgroup II; on the other hand, the constant regions of the two proteins appear to be identical. Comparison of the sequence of the heavy chain of He with the heavy chain sequences determined in other laboratories suggests that the variable region of subgroup II is at least 118 residues long. The nature and distribution of amino acid variations in this heavy chain subgroup resemble those observed in light chain subgroups. These studies provide evidence that the translocation hypothesis applies to heavy as well as to light chains, viz., genes for variable regions (V) are somatically translocated to genes for constant regions (C) to form complete VC structural genes. Images PMID:5264153

  18. Complete nucleic acid sequence of Penaeus stylirostris densovirus (PstDNV) from India.

    PubMed

    Rai, Praveen; Safeena, Muhammed P; Karunasagar, Iddya; Karunasagar, Indrani

    2011-06-01

    Infectious hypodermal and hematopoietic necrosis virus (IHHNV) of shrimp, recently been classified as Penaeus stylirostris densovirus (PstDNV). The complete nucleic acid sequence of PstDNV from India was obtained by cloning and sequencing of different DNA fragment of the virus. The genome organisation of PstDNV revealed that there were three major coding domains: a left ORF (NS1) of 2001 bp, a mid ORF (NS2) of 1092 bp and a right ORF (VP) of 990 bp. The complete genome and amino acid sequences of three proteins viz., NS1, NS2 and VP were compared with the genomes of the virus reported from Hawaii, China and Mexico and with partial sequence available from isolates from different regions. The phylogenetic analysis of shrimp, insect and vertebrate parvovirus sequences showed that the Indian PstDNV isolate is phylogenetically more closely related to one of the three isolates from Taiwan (AY355307), and two isolates (AY362547 and AY102034) from Thailand.

  19. PiRaNhA: a server for the computational prediction of RNA-binding residues in protein sequences

    PubMed Central

    Murakami, Yoichi; Spriggs, Ruth V.; Nakamura, Haruki; Jones, Susan

    2010-01-01

    The PiRaNhA web server is a publicly available online resource that automatically predicts the location of RNA-binding residues (RBRs) in protein sequences. The goal of functional annotation of sequences in the field of RNA binding is to provide predictions of high accuracy that require only small numbers of targeted mutations for verification. The PiRaNhA server uses a support vector machine (SVM), with position-specific scoring matrices, residue interface propensity, predicted residue accessibility and residue hydrophobicity as features. The server allows the submission of up to 10 protein sequences, and the predictions for each sequence are provided on a web page and via email. The prediction results are provided in sequence format with predicted RBRs highlighted, in text format with the SVM threshold score indicated and as a graph which enables users to quickly identify those residues above any specific SVM threshold. The graph effectively enables the increase or decrease of the false positive rate. When tested on a non-redundant data set of 42 protein sequences not used in training, the PiRaNhA server achieved an accuracy of 85%, specificity of 90% and a Matthews correlation coefficient of 0.41 and outperformed other publicly available servers. The PiRaNhA prediction server is freely available at http://www.bioinformatics.sussex.ac.uk/PIRANHA. PMID:20507911

  20. Evolutionary conservation analysis increases the colocalization of predicted exonic splicing enhancers in the BRCA1 gene with missense sequence changes and in-frame deletions, but not polymorphisms

    PubMed Central

    Pettigrew, Christopher; Wayte, Nicola; Lovelock, Paul K; Tavtigian, Sean V; Chenevix-Trench, Georgia; Spurdle, Amanda B; Brown, Melissa A

    2005-01-01

    Introduction Aberrant pre-mRNA splicing can be more detrimental to the function of a gene than changes in the length or nature of the encoded amino acid sequence. Although predicting the effects of changes in consensus 5' and 3' splice sites near intron:exon boundaries is relatively straightforward, predicting the possible effects of changes in exonic splicing enhancers (ESEs) remains a challenge. Methods As an initial step toward determining which ESEs predicted by the web-based tool ESEfinder in the breast cancer susceptibility gene BRCA1 are likely to be functional, we have determined their evolutionary conservation and compared their location with known BRCA1 sequence variants. Results Using the default settings of ESEfinder, we initially detected 669 potential ESEs in the coding region of the BRCA1 gene. Increasing the threshold score reduced the total number to 464, while taking into consideration the proximity to splice donor and acceptor sites reduced the number to 211. Approximately 11% of these ESEs (23/211) either are identical at the nucleotide level in human, primates, mouse, cow, dog and opossum Brca1 (conserved) or are detectable by ESEfinder in the same position in the Brca1 sequence (shared). The frequency of conserved and shared predicted ESEs between human and mouse is higher in BRCA1 exons (2.8 per 100 nucleotides) than in introns (0.6 per 100 nucleotides). Of conserved or shared putative ESEs, 61% (14/23) were predicted to be affected by sequence variants reported in the Breast Cancer Information Core database. Applying the filters described above increased the colocalization of predicted ESEs with missense changes, in-frame deletions and unclassified variants predicted to be deleterious to protein function, whereas they decreased the colocalization with known polymorphisms or unclassified variants predicted to be neutral. Conclusion In this report we show that evolutionary conservation analysis may be used to improve the specificity of an ESE

  1. Foreshock Sequences and Short-Term Earthquake Predictability on East Pacific Rise Transform Faults

    NASA Astrophysics Data System (ADS)

    McGuire, J. J.; Boettcher, M. S.; Jordan, T. H.

    2004-12-01

    A predominant view of continental seismicity postulates that all earthquakes initiate in a similar manner regardless of their eventual size and that earthquake triggering can be described by an Endemic Type Aftershock Sequence (ETAS) model [e.g. Ogata, 1988, Helmstetter and Sorenette 2002]. These null hypotheses cannot be rejected as an explanation for the relative abundances of foreshocks and aftershocks to large earthquakes in California [Helmstetter et al., 2003]. An alternative location for testing this hypothesis is mid-ocean ridge transform faults (RTFs), which have many properties that are distinct from continental transform faults: most plate motion is accommodated aseismically, many large earthquakes are slow events enriched in low-frequency radiation, and the seismicity shows depleted aftershock sequences and high foreshock activity. Here we use the 1996-2001 NOAA-PMEL hydroacoustic seismicity catalog for equatorial East Pacific Rise transform faults to show that the foreshock/aftershock ratio is two orders of magnitude greater than the ETAS prediction based on global RTF aftershock abundances. We can thus reject the null hypothesis that there is no fundamental distinction between foreshocks, mainshocks, and aftershocks on RTFs. We further demonstrate (retrospectively) that foreshock sequences on East Pacific Rise transform faults can be used to achieve statistically significant short-term prediction of large earthquakes (magnitude ≥ 5.4) with good spatial (15-km) and temporal (1-hr) resolution using the NOAA-PMEL catalogs. Our very simplistic approach produces a large number of false alarms, but it successfully predicts the majority (70%) of M≥5.4 earthquakes while covering only a tiny fraction (0.15%) of the total potential space-time volume with alarms. Therefore, it achieves a large probability gain (about a factor of 500) over random guessing, despite not using any near field data. The predictability of large EPR transform earthquakes suggests

  2. DNA Cloning of Plasmodium falciparum Circumsporozoite Gene: Amino Acid Sequence of Repetitive Epitope

    NASA Astrophysics Data System (ADS)

    Enea, Vincenzo; Ellis, Joan; Zavala, Fidel; Arnot, David E.; Asavanich, Achara; Masuda, Aoi; Quakyi, Isabella; Nussenzweig, Ruth S.

    1984-08-01

    A clone of complementary DNA encoding the circumsporozoite (CS) protein of the human malaria parasite Plasmodium falciparum has been isolated by screening an Escherichia coli complementary DNA library with a monoclonal antibody to the CS protein. The DNA sequence of the complementary DNA insert encodes a four-amino acid sequence: proline-asparagine-alanine-asparagine, tandemly repeated 23 times. The CS β -lactamase fusion protein specifically binds monoclonal antibodies to the CS protein and inhibits the binding of these antibodies to native Plasmodium falciparum CS protein. These findings provide a basis for the development of a vaccine against Plasmodium falciparum malaria.

  3. Amino-Acid Sequence of NADP-Specific Glutamate Dehydrogenase of Neurospora crassa

    PubMed Central

    Wootton, John C.; Chambers, Geoffrey K.; Holder, Anthony A.; Baron, Andrew J.; Taylor, John G.; Fincham, John R. S.; Blumenthal, Kenneth M.; Moon, Kenneth; Smith, Emil L.

    1974-01-01

    A tentative primary structure of the NADP-specific glutamate dehydrogenase [L-glutamate: NADP oxidoreductase (deaminating), EC 1.4.1.4] from Neurospora crassa has been determined. The proposed sequence contains 452 amino-acid residues in each of the identical subunits of the hexameric enzyme. Comparison of the sequence with that of the bovine liver enzyme reveals considerable homology in the amino-terminal portion of the chain, including the vicinity of the reactive lysine, with only shorter stretches of homology within the carboxyl-terminal regions. The significance of this distribution of homologous regions is discussed. PMID:4155068

  4. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    PubMed Central

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583

  5. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion.

    PubMed

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-07-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed).

  6. Ligand Similarity Complements Sequence, Physical Interaction, and Co-Expression for Gene Function Prediction

    PubMed Central

    Shoichet, Brian K.; Gillis, Jesse

    2016-01-01

    The expansion of protein-ligand annotation databases has enabled large-scale networking of proteins by ligand similarity. These ligand-based protein networks, which implicitly predict the ability of neighboring proteins to bind related ligands, may complement biologically-oriented gene networks, which are used to predict functional or disease relevance. To quantify the degree to which such ligand-based protein associations might complement functional genomic associations, including sequence similarity, physical protein-protein interactions, co-expression, and disease gene annotations, we calculated a network based on the Similarity Ensemble Approach (SEA: sea.docking.org), where protein neighbors reflect the similarity of their ligands. We also measured the similarity with functional genomic networks over a common set of 1,131 genes, and found that the networks had only small overlaps, which were significant only due to the large scale of the data. Consistent with the view that the networks contain different information, combining them substantially improved Molecular Function prediction within GO (from AUROC~0.63–0.75 for the individual data modalities to AUROC~0.8 in the aggregate). We investigated the boost in guilt-by-association gene function prediction when the networks are combined and describe underlying properties that can be further exploited. PMID:27467773

  7. Tools for Sequence-Based miRNA Target Prediction: What to Choose?

    PubMed Central

    Riffo-Campos, Ángela L.; Riquelme, Ismael; Brebi-Mieville, Priscilla

    2016-01-01

    MicroRNAs (miRNAs) are defined as small non-coding RNAs ~22 nt in length. They regulate gene expression at a post-transcriptional level through complementary base pairing with the target mRNA, leading to mRNA degradation and therefore blocking translation. In the last decade, the dysfunction of miRNAs has been related to the development and progression of many diseases. Currently, researchers need a method to identify precisely the miRNA targets, prior to applying experimental approaches that allow a better functional characterization of miRNAs in biological processes and can thus predict their effects. Computational prediction tools provide a rapid method to identify putative miRNA targets. However, since a large number of tools for the prediction of miRNA:mRNA interactions have been developed, all with different algorithms, the biological researcher sometimes does not know which is the best choice for his study and many times does not understand the bioinformatic basis of these tools. This review describes the biological fundamentals of these prediction tools, characterizes the main sequence-based algorithms, and offers some insights into their uses by biologists. PMID:27941681

  8. Tools for Sequence-Based miRNA Target Prediction: What to Choose?

    PubMed

    Riffo-Campos, Ángela L; Riquelme, Ismael; Brebi-Mieville, Priscilla

    2016-12-09

    MicroRNAs (miRNAs) are defined as small non-coding RNAs ~22 nt in length. They regulate gene expression at a post-transcriptional level through complementary base pairing with the target mRNA, leading to mRNA degradation and therefore blocking translation. In the last decade, the dysfunction of miRNAs has been related to the development and progression of many diseases. Currently, researchers need a method to identify precisely the miRNA targets, prior to applying experimental approaches that allow a better functional characterization of miRNAs in biological processes and can thus predict their effects. Computational prediction tools provide a rapid method to identify putative miRNA targets. However, since a large number of tools for the prediction of miRNA:mRNA interactions have been developed, all with different algorithms, the biological researcher sometimes does not know which is the best choice for his study and many times does not understand the bioinformatic basis of these tools. This review describes the biological fundamentals of these prediction tools, characterizes the main sequence-based algorithms, and offers some insights into their uses by biologists.

  9. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F.W.

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.

  10. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F. William

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.

  11. Lower serum uric acid level predicts mortality in dialysis patients

    PubMed Central

    Bae, Eunjin; Cho, Hyun-Jeong; Shin, Nara; Kim, Sun Moon; Yang, Seung Hee; Kim, Dong Ki; Kim, Yong-Lim; Kang, Shin-Wook; Yang, Chul Woo; Kim, Nam Ho; Kim, Yon Su; Lee, Hajeong

    2016-01-01

    Abstract We evaluated the impact of serum uric acid (SUA) on mortality in patients with chronic dialysis. A total of 4132 adult patients on dialysis were enrolled prospectively between August 2008 and September 2014. Among them, we included 1738 patients who maintained dialysis for at least 3 months and had available SUA in the database. We categorized the time averaged-SUA (TA-SUA) into 5 groups: <5.5, 5.5–6.4, 6.5–7.4, 7.5–8.4, and ≥8.5 mg/dL. Cox regression analysis was used to calculate the hazard ratio (HR) of all-cause mortality according to SUA group. The mean TA-SUA level was slightly higher in men than in women. Patients with lower TA-SUA level tended to have lower body mass index (BMI), phosphorus, serum albumin level, higher proportion of diabetes mellitus (DM), and higher proportion of malnourishment on the subjective global assessment (SGA). During a median follow-up of 43.9 months, 206 patients died. Patients with the highest SUA had a similar risk to the middle 3 TA-SUA groups, but the lowest TA-SUA group had a significantly elevated HR for mortality. The lowest TA-SUA group was significantly associated with increased all-cause mortality (adjusted HR, 1.720; 95% confidence interval, 1.007–2.937; P = 0.047) even after adjusting for demographic, comorbid, nutritional covariables, and medication use that could affect SUA levels. This association was prominent in patients with well nourishment on the SGA, a preserved serum albumin level, a higher BMI, and concomitant DM although these parameters had no significant interaction in the TA-SUA-mortality relationship except DM. In conclusion, a lower TA-SUA level <5.5 mg/dL predicted all-cause mortality in patients with chronic dialysis. PMID:27310949

  12. Predicting protein-protein interactions from sequence using correlation coefficient and high-quality interaction dataset.

    PubMed

    Shi, Ming-Guang; Xia, Jun-Feng; Li, Xue-Ling; Huang, De-Shuang

    2010-03-01

    Identifying protein-protein interactions (PPIs) is critical for understanding the cellular function of the proteins and the machinery of a proteome. Data of PPIs derived from high-throughput technologies are often incomplete and noisy. Therefore, it is important to develop computational methods and high-quality interaction dataset for predicting PPIs. A sequence-based method is proposed by combining correlation coefficient (CC) transformation and support vector machine (SVM). CC transformation not only adequately considers the neighboring effect of protein sequence but describes the level of CC between two protein sequences. A gold standard positives (interacting) dataset MIPS Core and a gold standard negatives (non-interacting) dataset GO-NEG of yeast Saccharomyces cerevisiae were mined to objectively evaluate the above method and attenuate the bias. The SVM model combined with CC transformation yielded the best performance with a high accuracy of 87.94% using gold standard positives and gold standard negatives datasets. The source code of MATLAB and the datasets are available on request under smgsmg@mail.ustc.edu.cn.

  13. Using the concept of Chou's pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy.

    PubMed

    Jiang, Xiaoying; Wei, Rong; Zhang, Tongliang; Gu, Quan

    2008-01-01

    The function of protein is closely correlated with it subcellular location. Prediction of subcellular location of apoptosis proteins is an important research area in post-genetic era because the knowledge of apoptosis proteins is useful to understand the mechanism of programmed cell death. Compared with the conventional amino acid composition (AAC), the Pseudo Amino Acid composition (PseAA) as originally introduced by Chou can incorporate much more information of a protein sequence so as to remarkably enhance the power of using a discrete model to predict various attributes of a protein. In this study, a novel approach is presented to predict apoptosis protein solely from sequence based on the concept of Chou's PseAA composition. The concept of approximate entropy (ApEn), which is a parameter denoting complexity of time series, is used to construct PseAA composition as additional features. Fuzzy K-nearest neighbor (FKNN) classifier is selected as prediction engine. Particle swarm optimization (PSO) algorithm is adopted for optimizing the weight factors which are important in PseAA composition. Two datasets are used to validate the performance of the proposed approach, which incorporate six subcellular location and four subcellular locations, respectively. The results obtained by jackknife test are quite encouraging. It indicates that the ApEn of protein sequence could represent effectively the information of apoptosis proteins subcellular locations. It can at least play a complimentary role to many of the existing methods, and might become potentially useful tool for protein function prediction. The software in Matlab is available freely by contacting the corresponding author.

  14. Respiratory syncytial virus fusion glycoprotein: nucleotide sequence of mRNA, identification of cleavage activation site and amino acid sequence of N-terminus of F1 subunit.

    PubMed Central

    Elango, N; Satake, M; Coligan, J E; Norrby, E; Camargo, E; Venkatesan, S

    1985-01-01

    The amino acid sequence of respiratory syncytial virus fusion protein (Fo) was deduced from the sequence of a partial cDNA clone of mRNA and from the 5' mRNA sequence obtained by primer extension and dideoxysequencing. The encoded protein of 574 amino acids is extremely hydrophobic and has a molecular weight of 63371 daltons. The site of proteolytic cleavage within this protein was accurately mapped by determining a partial amino acid sequence of the N-terminus of the larger subunit (F1) purified by radioimmunoprecipitation using monoclonal antibodies. Alignment of the N-terminus of the F1 subunit within the deduced amino acid sequence of Fo permitted us to identify a sequence of lys-lys-arg-lys-arg-arg at the C-terminus of the smaller N-terminal F2 subunit that appears to represent the cleavage/activation domain. Five potential sites of glycosylation, four within the F2 subunit, were also identified. Three extremely hydrophobic domains are present in the protein; a) the N-terminal signal sequence, b) the N-terminus of the F1 subunit that is analogous to the N-terminus of the paramyxovirus F1 subunit and the HA2 subunit of influenza virus hemagglutinin, and c) the putative membrane anchorage domain near the C-terminus of F1. Images PMID:2987829

  15. The Complete Genome Sequence of the Lactic Acid Bacterium Lactococcus lactis ssp. lactis IL1403

    PubMed Central

    Bolotin, Alexander; Wincker, Patrick; Mauger, Stéphane; Jaillon, Olivier; Malarme, Karine; Weissenbach, Jean; Ehrlich, S. Dusko; Sorokin, Alexei

    2001-01-01

    Lactococcus lactis is a nonpathogenic AT-rich gram-positive bacterium closely related to the genus Streptococcus and is the most commonly used cheese starter. It is also the best-characterized lactic acid bacterium. We sequenced the genome of the laboratory strain IL1403, using a novel two-step strategy that comprises diagnostic sequencing of the entire genome and a shotgun polishing step. The genome contains 2,365,589 base pairs and encodes 2310 proteins, including 293 protein-coding genes belonging to six prophages and 43 insertion sequence (IS) elements. Nonrandom distribution of IS elements indicates that the chromosome of the sequenced strain may be a product of recent recombination between two closely related genomes. A complete set of late competence genes is present, indicating the ability of L. lactis to undergo DNA transformation. Genomic sequence revealed new possibilities for fermentation pathways and for aerobic respiration. It also indicated a horizontal transfer of genetic information from Lactococcus to gram-negative enteric bacteria of Salmonella-Escherichia group. [The sequence data described in this paper has been submitted to the GenBank data library under accession no. AE005176.] PMID:11337471

  16. Amino acid sequence of myoglobin from emu (Dromaius novaehollandiae) skeletal muscle.

    PubMed

    Suman, S P; Joseph, P; Li, S; Beach, C M; Fontaine, M; Steinke, L

    2010-11-01

    The objective of the present study was to characterize the primary structure of emu myoglobin (Mb). Emu Mb was isolated from Iliofibularis muscle employing gel-filtration chromatography. Matrix Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry was employed to determine the exact molecular mass of emu Mb in comparison with horse Mb, and Edman degradation was utilized to characterize the amino acid sequence. The molecular mass of emu Mb was 17,380 Da and was close to those reported for ratite and poultry myoglobins. Similar to myoglobins from meat-producing livestock and birds, emu Mb has 153 amino acids. Emu Mb contains 9 histidines. Proximal and distal histidines, responsible for coordinating oxygen-binding property of Mb, are conserved in emu. Emu Mb shared more than 90% homology with ratite and chicken myoglobins, whereas it demonstrated only less than 70% sequence similarity with ruminant myoglobins.

  17. In Vitro and In Vivo Activities of Antimicrobial Peptides Developed Using an Amino Acid-Based Activity Prediction Method

    PubMed Central

    Wu, Xiaozhe; Wang, Zhenling; Li, Xiaolu; Fan, Yingzi; He, Gu; Wan, Yang; Yu, Chaoheng; Tang, Jianying; Li, Meng; Zhang, Xian; Zhang, Hailong; Xiang, Rong; Pan, Ying; Liu, Yan; Lu, Lian

    2014-01-01

    To design and discover new antimicrobial peptides (AMPs) with high levels of antimicrobial activity, a number of machine-learning methods and prediction methods have been developed. Here, we present a new prediction method that can identify novel AMPs that are highly similar in sequence to known peptides but offer improved antimicrobial activity along with lower host cytotoxicity. Using previously generated AMP amino acid substitution data, we developed an amino acid activity contribution matrix that contained an activity contribution value for each amino acid in each position of the model peptide. A series of AMPs were designed with this method. After evaluating the antimicrobial activities of these novel AMPs against both Gram-positive and Gram-negative bacterial strains, DP7 was chosen for further analysis. Compared to the parent peptide HH2, this novel AMP showed broad-spectrum, improved antimicrobial activity, and in a cytotoxicity assay it showed lower toxicity against human cells. The in vivo antimicrobial activity of DP7 was tested in a Staphylococcus aureus infection murine model. When inoculated and treated via intraperitoneal injection, DP7 reduced the bacterial load in the peritoneal lavage solution. Electron microscope imaging and the results indicated disruption of the S. aureus outer membrane by DP7. Our new prediction method can therefore be employed to identify AMPs possessing minor amino acid differences with improved antimicrobial activities, potentially increasing the therapeutic agents available to combat multidrug-resistant infections. PMID:24982064

  18. In vitro and in vivo activities of antimicrobial peptides developed using an amino acid-based activity prediction method.

    PubMed

    Wu, Xiaozhe; Wang, Zhenling; Li, Xiaolu; Fan, Yingzi; He, Gu; Wan, Yang; Yu, Chaoheng; Tang, Jianying; Li, Meng; Zhang, Xian; Zhang, Hailong; Xiang, Rong; Pan, Ying; Liu, Yan; Lu, Lian; Yang, Li

    2014-09-01

    To design and discover new antimicrobial peptides (AMPs) with high levels of antimicrobial activity, a number of machine-learning methods and prediction methods have been developed. Here, we present a new prediction method that can identify novel AMPs that are highly similar in sequence to known peptides but offer improved antimicrobial activity along with lower host cytotoxicity. Using previously generated AMP amino acid substitution data, we developed an amino acid activity contribution matrix that contained an activity contribution value for each amino acid in each position of the model peptide. A series of AMPs were designed with this method. After evaluating the antimicrobial activities of these novel AMPs against both Gram-positive and Gram-negative bacterial strains, DP7 was chosen for further analysis. Compared to the parent peptide HH2, this novel AMP showed broad-spectrum, improved antimicrobial activity, and in a cytotoxicity assay it showed lower toxicity against human cells. The in vivo antimicrobial activity of DP7 was tested in a Staphylococcus aureus infection murine model. When inoculated and treated via intraperitoneal injection, DP7 reduced the bacterial load in the peritoneal lavage solution. Electron microscope imaging and the results indicated disruption of the S. aureus outer membrane by DP7. Our new prediction method can therefore be employed to identify AMPs possessing minor amino acid differences with improved antimicrobial activities, potentially increasing the therapeutic agents available to combat multidrug-resistant infections.

  19. Self-sequencing of amino acids and origins of polyfunctional protocells

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1984-01-01

    The role of proteins in the origin of living things is discussed. It has been experimentally established that amino acids can sequence themselves under simulated geological conditions with highly nonrandom products which accordingly contain diverse information. Multiple copies of each type of macromolecule are formed, resulting in greater power for any protoenzymic molecule than would accrue from a single copy of each type. Thermal proteins are readily incorporated into laboratory protocells. The experimental evidence for original polyfunctional protocells is discussed.

  20. Amino acid sequence of atrial natriuretic peptides in human coronary sinus plasma.

    PubMed

    Yandle, T; Crozier, I; Nicholls, G; Espiner, E; Carne, A; Brennan, S

    1987-07-31

    Two atrial natriuretic peptides were purified from pooled human coronary sinus plasma by Sep-Pak extraction, immunoaffinity chromatography and reverse phase HPLC. The amino acid sequences of the two peptides were homologous with 99-126 human atrial natriuretic peptide (hANP) and 106-126 hANP, the latter being most probably linked to 99-105 ANP by the disulphide bond. The molar ratio of the peptides in plasma, as assessed by radioimmunoassay was 10:3.

  1. Amino Acid Sequences Mediating Vascular Cell Adhesion Molecule 1 Binding to Integrin Alpha 4: Homologous DSP Sequence Found for JC Polyoma VP1 Coat Protein

    PubMed Central

    Meyer, Michael Andrew

    2013-01-01

    The JC polyoma viral coat protein VP1 was analyzed for amino acid sequences homologies to the IDSP sequence which mediates binding of VLA-4 (integrin alpha 4) to vascular cell adhesion molecule 1. Although the full sequence was not found, a DSP sequence was located near the critical arginine residue linked to infectivity of the virus and binding to sialic acid containing molecules such as integrins (3). For the JC polyoma virus, a DSP sequence was found at residues 70, 71 and 72 with homology also noted for the mouse polyoma virus and SV40 virus. Three dimensional modeling of the VP1 molecule suggests that the DSP loop has an accessible site for interaction from the external side of the assembled viral capsid pentamer. PMID:24147211

  2. Amino Acid Sequences Mediating Vascular Cell Adhesion Molecule 1 Binding to Integrin Alpha 4: Homologous DSP Sequence Found for JC Polyoma VP1 Coat Protein.

    PubMed

    Meyer, Michael Andrew

    2013-01-01

    The JC polyoma viral coat protein VP1 was analyzed for amino acid sequences homologies to the IDSP sequence which mediates binding of VLA-4 (integrin alpha 4) to vascular cell adhesion molecule 1. Although the full sequence was not found, a DSP sequence was located near the critical arginine residue linked to infectivity of the virus and binding to sialic acid containing molecules such as integrins (3). For the JC polyoma virus, a DSP sequence was found at residues 70, 71 and 72 with homology also noted for the mouse polyoma virus and SV40 virus. Three dimensional modeling of the VP1 molecule suggests that the DSP loop has an accessible site for interaction from the external side of the assembled viral capsid pentamer.

  3. Complete amino acid sequence of BSP-A3 from bovine seminal plasma. Homology to PDC-109 and to the collagen-binding domain of fibronectin.

    PubMed Central

    Seidah, N G; Manjunath, P; Rochemont, J; Sairam, M R; Chrétien, M

    1987-01-01

    Bovine seminal plasma was shown to contain three similar proteins, called BSP-A1, BSP-A2 and BSP-A3. Both BSP-A1 and BSP-A2 were shown to be molecular variants of a recently characterized peptide called PDC-109. They seem to differ only in their degree of glycosylation and otherwise seem to possess an identical amino acid composition. The work in the present paper deals with the complete characterization of the third member of this series, namely BSP-A3. The complete amino acid sequence revealed that it is composed of 115 amino acids and predicts a Mr of 13,403. An analysis of the primary structure of BSP-A3 revealed a high degree of internal homology, with two homologous domains composed of 39 (residues 28-66) and 43 (residues 73-115) amino acids. An exhaustive computer-bank search for the similarity of this sequence to any known protein, or segment thereof, revealed two significant homologies. The first is between PDC-109 and BSP-A3, which is so high that we can confidently predict that both proteins evolved from a single ancestral gene. The collagen-binding domain of bovine fibronectin (type II sequence) was also found to be highly homologous to both BSP-A3 and PDC-109. PMID:3606570

  4. Predicting the functional consequences of cancer-associated amino acid substitutions

    PubMed Central

    Shihab, Hashem A.; Gough, Julian; Cooper, David N.; Day, Ian N. M.; Gaunt, Tom R.

    2013-01-01

    Motivation: The number of missense mutations being identified in cancer genomes has greatly increased as a consequence of technological advances and the reduced cost of whole-genome/whole-exome sequencing methods. However, a high proportion of the amino acid substitutions detected in cancer genomes have little or no effect on tumour progression (passenger mutations). Therefore, accurate automated methods capable of discriminating between driver (cancer-promoting) and passenger mutations are becoming increasingly important. In our previous work, we developed the Functional Analysis through Hidden Markov Models (FATHMM) software and, using a model weighted for inherited disease mutations, observed improved performances over alternative computational prediction algorithms. Here, we describe an adaptation of our original algorithm that incorporates a cancer-specific model to potentiate the functional analysis of driver mutations. Results: The performance of our algorithm was evaluated using two separate benchmarks. In our analysis, we observed improved performances when distinguishing between driver mutations and other germ line variants (both disease-causing and putatively neutral mutations). In addition, when discriminating between somatic driver and passenger mutations, we observed performances comparable with the leading computational prediction algorithms: SPF-Cancer and TransFIC. Availability and implementation: A web-based implementation of our cancer-specific model, including a downloadable stand-alone package, is available at http://fathmm.biocompute.org.uk. Contact: fathmm@biocompute.org.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23620363

  5. Prediction of conformational states of amino acids using a Ramachandran plot.

    PubMed

    Kolaskar, A S; Sawant, S

    1996-01-01

    (phi, psi) data from crystal structures of 221 proteins having high resolution and sequence similarity cut-off at the 25% level were analysed by dividing the Ramachandran plot in three regions representing three conformational states: (i) conformational state 1: conformations in the (phi, psi) range from (-140 degrees, -100 degrees) to (0 degrees, 0 degrees); (ii) conformational state 2: conformations with (phi, psi) from (-180 degrees, 80 degrees) to (0 degrees, 180 degrees); and (iii) conformational state 3: all the remaining conformations in the (phi, psi) plane which are not included in the above two conformational states. Normalized probability values of the occurrence of single amino acid residues in conformational regions 1-3 and similar values for dipeptides were calculated. Comparisons of single residue and dipeptide normalized probability values have shown that short-range interactions, although strong, destabilize conformational states of only 44 dipeptides out of the 400 x 9 possible states. However, dipeptide frequency values provide better resolving power than single-residue potentials when used to predict conformational states of residues in a protein from its primary structure. The simple approach used in the present study to predict conformational states yields an accuracy of > 70% for 14 proteins and an accuracy in the range of 50-70% for 247 proteins. Thus these studies point out yet another use of the Ramachandran plot and the role of tertiary interactions in protein folding.

  6. SucStruct: Prediction of succinylated lysine residues by using structural properties of amino acids.

    PubMed

    López, Yosvany; Dehzangi, Abdollah; Lal, Sunil Pranit; Taherzadeh, Ghazaleh; Michaelson, Jacob; Sattar, Abdul; Tsunoda, Tatsuhiko; Sharma, Alok

    2017-03-28

    Post-Translational Modification (PTM) is a biological reaction which contributes to diversify the proteome. Despite many modifications with important roles in the cellular activity, lysine succinylation has recently emerged as an important PTM mark. It alters the chemical structure of lysines, leading to remarkable changes in the structure and function of proteins. Given the huge amount of proteins being sequenced in the post-genome era, the experimental detection of succinylated residues remains expensive, inefficient and time-consuming. Therefore, the development of computational tools for accurately predicting succinylated lysines is an urgent necessity. To date, several approaches have been proposed but their sensitivity has been reportedly poor. In this paper, we propose an approach that utilizes structural features of amino acids to improve lysine succinylation prediction. Succinylated and non-succinylated lysines were first retrieved from 670 proteins and characteristics such as accessible surface area, backbone torsion angles, and local structure conformations were incorporated. We used the k-nearest neighbors cleaning for dealing with class imbalance and designed a pruned decision tree for classification. Our predictor, referred as SucStruct (Succinylation using Structural features), proved to significantly improve performance when compared to previous predictors, with sensitivity, accuracy and Mathew's correlation coefficient equal to 0.7334-0.7946, 0.7444-0.7608 and 0.4884-0.5240, respectively.

  7. Predictability in the Epidemic-Type Aftershock Sequence model of interacting triggered seismicity

    NASA Astrophysics Data System (ADS)

    Helmstetter, AgnèS.; Sornette, Didier

    2003-10-01

    As part of an effort to develop a systematic methodology for earthquake forecasting, we use a simple model of seismicity on the basis of interacting events which may trigger a cascade of earthquakes, known as the Epidemic-Type Aftershock Sequence model (ETAS). The ETAS model is constructed on a bare (unrenormalized) Omori law, the Gutenberg-Richter law, and the idea that large events trigger more numerous aftershocks. For simplicity, we do not use the information on the spatial location of earthquakes and work only in the time domain. We demonstrate the essential role played by the cascade of triggered seismicity in controlling the rate of aftershock decay as well as the overall level of seismicity in the presence of a constant external seismicity source. We offer an analytical approach to account for the yet unobserved triggered seismicity adapted to the problem of forecasting future seismic rates at varying horizons from the present. Tests presented on synthetic catalogs validate strongly the importance of taking into account all the cascades of still unobserved triggered events in order to predict correctly the future level of seismicity beyond a few minutes. We find a strong predictability if one accepts to predict only a small fraction of the large-magnitude targets. Specifically, we find a prediction gain (defined as the ratio of the fraction of predicted events over the fraction of time in alarms) equal to 21 for a fraction of alarm of 1%, a target magnitude M ≥ 6, an update time of 0.5 days between two predictions, and for realistic parameters of the ETAS model. However, the probability gains degrade fast when one attempts to predict a larger fraction of the targets. This is because a significant fraction of events remain uncorrelated from past seismicity. This delineates the fundamental limits underlying forecasting skills, stemming from an intrinsic stochastic component in these interacting triggered seismicity models. Quantitatively, the fundamental

  8. Functional metagenomics reveals novel β-galactosidases not predictable from gene sequences

    PubMed Central

    Cheng, Jiujun; Romantsov, Tatyana; Engel, Katja; Doxey, Andrew C.; Rose, David R.; Neufeld, Josh D.

    2017-01-01

    The techniques of metagenomics have allowed researchers to access the genomic potential of uncultivated microbes, but there remain significant barriers to determination of gene function based on DNA sequence alone. Functional metagenomics, in which DNA is cloned and expressed in surrogate hosts, can overcome these barriers, and make important contributions to the discovery of novel enzymes. In this study, a soil metagenomic library carried in an IncP cosmid was used for functional complementation for β-galactosidase activity in both Sinorhizobium meliloti (α-Proteobacteria) and Escherichia coli (γ-Proteobacteria) backgrounds. One β-galactosidase, encoded by six overlapping clones that were selected in both hosts, was identified as a member of glycoside hydrolase family 2. We could not identify ORFs obviously encoding possible β-galactosidases in 19 other sequenced clones that were only able to complement S. meliloti. Based on low sequence identity to other known glycoside hydrolases, yet not β-galactosidases, three of these ORFs were examined further. Biochemical analysis confirmed that all three encoded β-galactosidase activity. Lac36W_ORF11 and Lac161_ORF7 had conserved domains, but lacked similarities to known glycoside hydrolases. Lac161_ORF10 had neither conserved domains nor similarity to known glycoside hydrolases. Bioinformatic and structural modeling implied that Lac161_ORF10 protein represented a novel enzyme family with a five-bladed propeller glycoside hydrolase domain. By discovering founding members of three novel β-galactosidase families, we have reinforced the value of functional metagenomics for isolating novel genes that could not have been predicted from DNA sequence analysis alone. PMID:28273103

  9. Dispositional Optimism and Perceived Risk Interact to Predict Intentions to Learn Genome Sequencing Results

    PubMed Central

    Taber, Jennifer M.; Klein, William M. P.; Ferrer, Rebecca A.; Lewis, Katie L.; Biesecker, Leslie G.; Biesecker, Barbara B.

    2015-01-01

    Objective Dispositional optimism and risk perceptions are each associated with health-related behaviors and decisions and other outcomes, but little research has examined how these constructs interact, particularly in consequential health contexts. The predictive validity of risk perceptions for health-related information seeking and intentions may be improved by examining dispositional optimism as a moderator, and by testing alternate types of risk perceptions, such as comparative and experiential risk. Method Participants (n = 496) had their genomes sequenced as part of a National Institutes of Health pilot cohort study (ClinSeq®). Participants completed a cross-sectional baseline survey of various types of risk perceptions and intentions to learn genome sequencing results for differing disease risks (e.g., medically actionable, nonmedically actionable, carrier status) and to use this information to change their lifestyle/health behaviors. Results Risk perceptions (absolute, comparative, and experiential) were largely unassociated with intentions to learn sequencing results. Dispositional optimism and comparative risk perceptions interacted, however, such that individuals higher in optimism reported greater intentions to learn all 3 types of sequencing results when comparative risk was perceived to be higher than when it was perceived to be lower. This interaction was inconsistent for experiential risk and absent for absolute risk. Independent of perceived risk, participants high in dispositional optimism reported greater interest in learning risks for nonmedically actionable disease and carrier status, and greater intentions to use genome information to change their lifestyle/health behaviors. Conclusions The relationship between risk perceptions and intentions may depend on how risk perceptions are assessed and on degree of optimism. PMID:25313897

  10. Amino acid sequence similarity between rabies virus glycoprotein and snake venom curaremimetic neurotoxins.

    PubMed

    Lentz, T L; Wilson, P T; Hawrot, E; Speicher, D W

    1984-11-16

    Evidence was presented earlier that a host-cell receptor for the highly neurotropic rabies virus might be the acetylcholine receptor. The amino acid sequence of the glycoprotein of rabies virus was compared by computer analysis with that of snake venom curaremimetic neurotoxins, potent ligands of the acetylcholine receptor. A statistically significant sequence relation was found between a segment of the rabies glycoprotein and the entire sequence of long neurotoxins. The greatest identity occurs with residues considered most important in neurotoxicity, including those interacting with the acetylcholine binding site of the acetylcholine receptor. Because of the similarity between the glycoprotein and the receptor-binding region of the neurotoxins, this region of the viral glycoprotein may function as a recognition site for the acetylcholine receptor. Direct binding of the rabies virus glycoprotein to the acetylcholine receptor could contribute to the neurotropism of this virus.

  11. Partial amino acid sequence of human pancreatic stone protein, a novel pancreatic secretory protein.

    PubMed Central

    Montalto, G; Bonicel, J; Multigner, L; Rovery, M; Sarles, H; De Caro, A

    1986-01-01

    Pancreatic stone protein (PSP) is the major organic component of human pancreatic stones. With the use of monoclonal antibody immunoadsorbents, five immunoreactive forms (PSP-S) with close Mr values (14,000-19,000) were isolated from normal pancreatic juice. By CM-Trisacryl M chromatography the lowest-Mr form (PSP-S1) was separated from the others and some of its molecular characteristics were investigated. The Mr of the PSP-S1 polypeptide chain calculated from the amino acid composition was about 16,100. The N-terminal sequences (40 residues) of PSP and PSP-S1 are identical, which suggests that the peptide backbone is the same for both of these polypeptides. The PSP-S1 sequence was determined up to residue 65 and was found to be different from all other known protein sequences. Images Fig. 1. PMID:3541906

  12. Characterization of the microbial acid mine drainage microbial community using culturing and direct sequencing techniques.

    PubMed

    Auld, Ryan R; Myre, Maxine; Mykytczuk, Nadia C S; Leduc, Leo G; Merritt, Thomas J S

    2013-05-01

    We characterized the bacterial community from an AMD tailings pond using both classical culturing and modern direct sequencing techniques and compared the two methods. Acid mine drainage (AMD) is produced by the environmental and microbial oxidation of minerals dissolved from mining waste. Surprisingly, we know little about the microbial communities associated with AMD, despite the fundamental ecological roles of these organisms and large-scale economic impact of these waste sites. AMD microbial communities have classically been characterized by laboratory culturing-based techniques and more recently by direct sequencing of marker gene sequences, primarily the 16S rRNA gene. In our comparison of the techniques, we find that their results are complementary, overall indicating very similar community structure with similar dominant species, but with each method identifying some species that were missed by the other. We were able to culture the majority of species that our direct sequencing results indicated were present, primarily species within the Acidithiobacillus and Acidiphilium genera, although estimates of relative species abundance were only obtained from direct sequencing. Interestingly, our culture-based methods recovered four species that had been overlooked from our sequencing results because of the rarity of the marker gene sequences, likely members of the rare biosphere. Further, direct sequencing indicated that a single genus, completely missed in our culture-based study, Legionella, was a dominant member of the microbial community. Our results suggest that while either method does a reasonable job of identifying the dominant members of the AMD microbial community, together the methods combine to give a more complete picture of the true diversity of this environment.

  13. [MOLECULAR EVOLUTION OF ION CHANNELS: AMINO ACID SEQUENCES AND 3D STRUCTURES].

    PubMed

    Korkosh, V S; Zhorov, B S; Tikhonov, D B

    2016-01-01

    An integral part of modern evolutionary biology is comparative analysis of structure and function of macromolecules such as proteins. The first and critical step to understand evolution of homologous proteins is their amino acid sequence alignment. However, standard algorithms fop not provide unambiguous sequence alignments for proteins of poor homology. More reliable results can be obtained by comparing experimental 3D structures obtained at atomic resolution, for instance, with the aid of X-ray structural analysis. If such structures are lacking, homology modeling is used, which may take into account indirect experimental data on functional roles of individual amino-acid residues. An important problem is that the sequence alignment, which reflects genetic modifications, does not necessarily correspond to the functional homology. The latter depends on three-dimensional structures which are critical for natural selection. Since alignment techniques relying only on the analysis of primary structures carry no information on the functional properties of proteins, including 3D structures into consideration is very important. Here we consider several examples involving ion channels and demonstrate that alignment of their three-dimensional structures can significantly improve sequence alignments obtained by traditional methods.

  14. The amino acid sequence of the aspartate aminotransferase from baker's yeast (Saccharomyces cerevisiae).

    PubMed Central

    Cronin, V B; Maras, B; Barra, D; Doonan, S

    1991-01-01

    1. The single (cytosolic) aspartate aminotransferase was purified in high yield from baker's yeast (Saccharomyces cerevisiae). 2. Amino-acid-sequence analysis was carried out by digestion of the protein with trypsin and with CNBr; some of the peptides produced were further subdigested with Staphylococcus aureus V8 proteinase or with pepsin. Peptides were sequenced by the dansyl-Edman method and/or by automated gas-phase methods. The amino acid sequence obtained was complete except for a probable gap of two residues as indicated by comparison with the structures of counterpart proteins in other species. 3. The N-terminus of the enzyme is blocked. Fast-atom-bombardment m.s. was used to identify the blocking group as an acetyl one. 4. Alignment of the sequence of the enzyme with those of vertebrate cytosolic and mitochondrial aspartate aminotransferases and with the enzyme from Escherichia coli showed that about 25% of residues are conserved between these distantly related forms. 5. Experimental details and confirmatory data for the results presented here are given in a Supplementary Publication (SUP 50164, 25 pages) that has been deposited at the British Library Document Supply Centre, Boston Spa. Wetherby, West Yorkshire LS23 7 BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1991) 273, 5. PMID:1859361

  15. Analysis of amino acid sequence variations and immunoglobulin E-binding epitopes of German cockroach tropomyosin.

    PubMed

    Jeong, Kyoung Yong; Lee, Jongweon; Lee, In-Yong; Ree, Han-Il; Hong, Chein-Soo; Yong, Tai-Soon

    2004-09-01

    The allergenicities of tropomyosins from different organisms have been reported to vary. The cDNA encoding German cockroach tropomyosin (Bla g 7) was isolated, expressed, and characterized previously. In the present study, the amino acid sequence variations in German cockroach tropomyosin were analyzed in order to investigate its influence on allergenicity. We also undertook the identification of immunodominant peptides containing immunoglobulin E (IgE) epitopes which may facilitate the development of diagnostic and immunotherapeutic strategies based on the recombinant proteins. Two-dimensional gel electrophoresis and immunoblot analysis with mouse anti-recombinant German cockroach tropomyosin serum was performed to investigate the isoforms at the protein level. Reverse transcriptase PCR (RT-PCR) was applied to examine the sequence diversity. Eleven different variants of the deduced amino acid sequences were identified by RT-PCR. German cockroach tropomyosin has only minor sequence variations that did not seem to affect its allergenicity significantly. These results support the molecular basis underlying the cross-reactivities of arthropod tropomyosins. Recombinant fragments were also generated by PCR, and IgE-binding epitopes were assessed by enzyme-linked immunosorbent assay. Sera from seven patients revealed heterogeneous IgE-binding responses. This study demonstrates multiple IgE-binding epitope regions in a single molecule, suggesting that full-length tropomyosin should be used for the development of diagnostic and therapeutic reagents.

  16. Prediction of enzyme function based on 3D templates of evolutionarily important amino acids

    PubMed Central

    Kristensen, David M; Ward, R Matthew; Lisewski, Andreas Martin; Erdin, Serkan; Chen, Brian Y; Fofanov, Viacheslav Y; Kimmel, Marek; Kavraki, Lydia E; Lichtarge, Olivier

    2008-01-01

    Background Structural genomics projects such as the Protein Structure Initiative (PSI) yield many new structures, but often these have no known molecular functions. One approach to recover this information is to use 3D templates – structure-function motifs that consist of a few functionally critical amino acids and may suggest functional similarity when geometrically matched to other structures. Since experimentally determined functional sites are not common enough to define 3D templates on a large scale, this work tests a computational strategy to select relevant residues for 3D templates. Results Based on evolutionary information and heuristics, an Evolutionary Trace Annotation (ETA) pipeline built templates for 98 enzymes, half taken from the PSI, and sought matches in a non-redundant structure database. On average each template matched 2.7 distinct proteins, of which 2.0 share the first three Enzyme Commission digits as the template's enzyme of origin. In many cases (61%) a single most likely function could be predicted as the annotation with the most matches, and in these cases such a plurality vote identified the correct function with 87% accuracy. ETA was also found to be complementary to sequence homology-based annotations. When matches are required to both geometrically match the 3D template and to be sequence homologs found by BLAST or PSI-BLAST, the annotation accuracy is greater than either method alone, especially in the region of lower sequence identity where homology-based annotations are least reliable. Conclusion These data suggest that knowledge of evolutionarily important residues improves functional annotation among distant enzyme homologs. Since, unlike other 3D template approaches, the ETA method bypasses the need for experimental knowledge of the catalytic mechanism, it should prove a useful, large scale, and general adjunct to combine with other methods to decipher protein function in the structural proteome. PMID:18190718

  17. Cut-off net acid generation pH in predicting acid-forming potential in mine spoils.

    PubMed

    Liao, B; Huang, L N; Ye, Z H; Lan, C Y; Shu, W S

    2007-01-01

    Acidification of mine wastes can lead to a series of environmental problems, such as acid drainage, heavy metal mobilization, and ecosystem degradation. Prediction of acid-forming potential is one of the key steps in management of sulfide-bearing mine wastes. In this paper, the acid-forming potential of 180 mine waste samples collected from 17 mine sites in China were studied using a net acid generation (NAG) method. The samples contained different contents of total sulfur (ranging from 0.6 to 200 g kg(-1)), pyritic sulfur (ranging from 0 to 100 g kg(-1)), and acid neutralization capacity (ANC, ranging from -41 to 274 kg H2SO4 t(-1)). Samples with high acid-forming potential are generally due to their high sulfur content or low acid neutralization capacity. After the samples were oxidized by H2O2, the amounts of acid generation and the final NAG pH were measured. Results indicated that the final NAG pH gave a well-defined demarcation between acid-forming and non-acid-forming materials. Samples with final NAG pH >or= 5 could be classified as non-acid-forming materials, while those with NAG pH acid-forming materials. Materials with NAG pH > 2.5, but < 5, had low risk of being acid-forming. The confirmation of cut-off NAG pH will be used as a rapid and cost-effective operational monitoring tool for the in-pit prediction of acid-forming potential of mine wastes and classification of waste types.

  18. GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences

    PubMed Central

    Deng, Wankun; Wang, Chenwei; Zhang, Ying; Xu, Yang; Zhang, Shuang; Liu, Zexian; Xue, Yu

    2016-01-01

    Protein acetylation catalyzed by specific histone acetyltransferases (HATs) is an essential post-translational modification (PTM) and involved in the regulation a broad spectrum of biological processes in eukaryotes. Although several ten thousands of acetylation sites have been experimentally identified, the upstream HATs for most of the sites are unclear. Thus, the identification of HAT-specific acetylation sites is fundamental for understanding the regulatory mechanisms of protein acetylation. In this work, we first collected 702 known HAT-specific acetylation sites of 205 proteins from the literature and public data resources, and a motif-based analysis demonstrated that different types of HATs exhibit similar but considerably distinct sequence preferences for substrate recognition. Using 544 human HAT-specific sites for training, we constructed a highly useful tool of GPS-PAIL for the prediction of HAT-specific sites for up to seven HATs, including CREBBP, EP300, HAT1, KAT2A, KAT2B, KAT5 and KAT8. The prediction accuracy of GPS-PAIL was critically evaluated, with a satisfying performance. Using GPS-PAIL, we also performed a large-scale prediction of potential HATs for known acetylation sites identified from high-throughput experiments in nine eukaryotes. Both online service and local packages were implemented, and GPS-PAIL is freely available at: http://pail.biocuckoo.org. PMID:28004786

  19. A Novel Data Assimilation Methodology for Predicting Lithology Based on Sequence Labeling Algorithms

    NASA Astrophysics Data System (ADS)

    Park, E.; Jeong, J.; Han, W. S.; Kim, K. Y.

    2014-12-01

    A hidden Markov model (HMM) and a conditional random fields (CRFs) model for lithological predictions based on multiple geophysical well-logging data are derived for dealing with directional non-stationarity through bi-directional training and conditioning. The developed models were benchmarked against their conventional counterparts, and hypothetical boreholes with the corresponding synthetic geophysical data including artificial errors were employed. In the three test scenarios devised, the average fitness and unfitness values of the developed CRFs model and HMM are 0.84 and 0.071, and 0.81 and 0.084, respectively, while those of the conventional CRFs model and HMM are 0.78 and 0.091, and 0.77 and 0.099, respectively. Comparisons of their predictabilities show that the models designed for directional non-stationarity clearly perform better than the conventional models for all tested examples. Among them, the developed linear-chain CRFs model showed the best or close to the best performance with high predictability and a low training data requirement. Keywords: one-dimensional lithological characterization, sequence labeling algorithm, conditional random fields, hidden Markov model, borehole, geophysical well-logging data.

  20. GPS-PAIL: prediction of lysine acetyltransferase-specific modification sites from protein sequences.

    PubMed

    Deng, Wankun; Wang, Chenwei; Zhang, Ying; Xu, Yang; Zhang, Shuang; Liu, Zexian; Xue, Yu

    2016-12-22

    Protein acetylation catalyzed by specific histone acetyltransferases (HATs) is an essential post-translational modification (PTM) and involved in the regulation a broad spectrum of biological processes in eukaryotes. Although several ten thousands of acetylation sites have been experimentally identified, the upstream HATs for most of the sites are unclear. Thus, the identification of HAT-specific acetylation sites is fundamental for understanding the regulatory mechanisms of protein acetylation. In this work, we first collected 702 known HAT-specific acetylation sites of 205 proteins from the literature and public data resources, and a motif-based analysis demonstrated that different types of HATs exhibit similar but considerably distinct sequence preferences for substrate recognition. Using 544 human HAT-specific sites for training, we constructed a highly useful tool of GPS-PAIL for the prediction of HAT-specific sites for up to seven HATs, including CREBBP, EP300, HAT1, KAT2A, KAT2B, KAT5 and KAT8. The prediction accuracy of GPS-PAIL was critically evaluated, with a satisfying performance. Using GPS-PAIL, we also performed a large-scale prediction of potential HATs for known acetylation sites identified from high-throughput experiments in nine eukaryotes. Both online service and local packages were implemented, and GPS-PAIL is freely available at: http://pail.biocuckoo.org.

  1. Purification to homogeneity and partial amino acid sequence of a fragment which includes the methyl acceptor site of the human DNA repair protein for O6-methylguanine.

    PubMed

    Major, G N; Gardner, E J; Carne, A F; Lawley, P D

    1990-03-25

    DNA repair by O6-methylguanine-DNA methyltransferase (O6-MT) is accomplished by removal by the enzyme of the methyl group from premutagenic O6-methylguanine-DNA, thereby restoring native guanine in DNA. The methyl group is transferred to an acceptor site cysteine thiol group in the enzyme, which causes the irreversible inactivation of O6-MT. We detected a variety of different forms of the methylated, inactivated enzyme in crude extracts of human spleen of molecular weights higher and lower than the usually observed 21-24kDa for the human O6-MT. Several apparent fragments of the methylated form of the protein were purified to homogeneity following reaction of partially-purified extract enzyme with O6-[3H-CH3]methylguanine-DNA substrate. One of these fragments yielded amino acid sequence information spanning fifteen residues, which was identified as probably belonging to human methyltransferase by virtue of both its significant sequence homology to three procaryote forms of O6-MT encoded by the ada, ogt (both from E. coli) and dat (B. subtilis) genes, and sequence position of the radiolabelled methyl group which matched the position of the conserved procaryote methyl acceptor site cysteine residue. Statistical prediction of secondary structure indicated good homologies between the human fragment and corresponding regions of the constitutive form of O6-MT in procaryotes (ogt and dat gene products), but not with the inducible ada protein, indicating the possibility that we had obtained partial amino acid sequence for a non-inducible form of the human enzyme. The identity of the fragment sequence as belonging to human methyltransferase was more recently confirmed by comparison with cDNA-derived amino acid sequence from the cloned human O6-MT gene from HeLa cells (1). The two sequences compared well, with only three out of fifteen amino acids being different (and two of them by only one nucleotide in each codon).

  2. Predictable conformational diversity in foldamers of sugar amino acids.

    PubMed

    Menyhard, Dora K; Hudaky, Ilona; Jákli, Imre; Juhász, György; Perczel, András

    2017-03-27

    Systematic conformational search was carried out for monomers and homohexamers of furanoid β-amino acids: cis-(S,R) and trans-(S,S) stereoisomers of aminocyclopentane carboxylic acid (ACPC), two different aminofuranuronic-acids (AFU(α) and AFU(β)), their isopropylidene derivatives (AFU(ip)) as well as the key intermediate β-aminotetrahydrofurancarboxylic acid (ATFC). Stereochemistry of the building blocks was chosen to match with that of natural sugar amino acid (xylose and ribose) precursors. Results show that hexamers of cis furanoid β-amino acids show great variability: while hydrophobic cyclopentane (cis(ACPC)6), and hydrophilic (cisXylAFU(α/β))6 foldamers favor two different zigzagged conformation as hexamers, the backbone fold turns into a helix in case of (cisATFC)6 (10-helix) and (cisAFU(ip))6 (14-helix). Trans stereochemistry resulted in hexamers exclusively of right-handed helix conformation, (H12(P))6, regardless of their polarity. We found that the preferred oligomeric structure of cis/(S,R)AFU(α/β) is conformationally compatible with β-pleated sheets, while that of the trans/(S,S) units match with α-helices of α-proteins.

  3. Processing and amino acid sequence analysis of the mouse mammary tumor virus env gene product.

    PubMed Central

    Arthur, L O; Copeland, T D; Oroszlan, S; Schochetman, G

    1982-01-01

    The envelope proteins of mouse mammary tumor virus (MMTV) are synthesized from a subgenomic 24S mRNA as a 75,000-dalton glycosylated precursor polyprotein which is eventually processed to the mature glycoproteins gp52 and gp36. In vivo synthesis of this env precursor in the presence of the core glycosylation inhibitor tunicamycin yielded a precursor of approximately 61,000 daltons (P61env). However, a 67,000-dalton protein (P67env) was obtained from cell-free translation with the MMTV 24S mRNA as the template. To determine whether the portion of the protein cleaved from P67env to give P61env was removed from the NH2-terminal end of P67env and as such would represent a leader sequence, the NH2-terminal amino acid sequence of the terminal peptide gp52 was determined. Glutamic acid, and not methionine, was found to be the amino-terminal residue of gp52, indicating that the cleaved portion was derived from the NH2-terminal end of P67env. The NH2-terminal amino acid sequences of gp52's from endogenous and exogenous C3H MMTVs were determined though 46 residues and found to be identical. However, amino acid composition and type-specific gp52 radioimmunoassays from MMTVs grown in heterologous cells indicated primary structure differences between gp52's of the two viruses. The nucleic acid sequence of cloned MMTV DNA fragments (J. Majors and H. E. Varmus, personal communication) in conjunction with the NH2-terminal sequence of gp52 allowed localization of the env gene in the MMTV genome. Nucleotides coding for the NH2 terminus of gp52 begin approximately 0.8 kilobase to the 3' side of the single EcoRI cleavage site. Localization of the env gene at that point agrees with the proposed gene order -gag-pol-env- and also allows sufficient coding potential for the glycoprotein precursor without extending into the long terminal repeat. Images PMID:6281457

  4. In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment

    PubMed Central

    2013-01-01

    Background Many Automatic Function Prediction (AFP) methods were developed to cope with an increasing growth of the number of gene sequences that are available from high throughput sequencing experiments. To support the development of AFP methods, it is essential to have community wide experiments for evaluating performance of existing AFP methods. Critical Assessment of Function Annotation (CAFA) is one such community experiment. The meeting of CAFA was held as a Special Interest Group (SIG) meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in 2011. Here, we perform a detailed analysis of two sequence-based function prediction methods, PFP and ESG, which were developed in our lab, using the predictions submitted to CAFA. Results We evaluate PFP and ESG using four different measures in comparison with BLAST, Prior, and GOtcha. In addition to the predictions submitted to CAFA, we further investigate performance of a different scoring function to rank order predictions by PFP as well as PFP/ESG predictions enriched with Priors that simply adds frequently occurring Gene Ontology terms as a part of predictions. Prediction accuracies of each method were also evaluated separately for different functional categories. Successful and unsuccessful predictions by PFP and ESG are also discussed in comparison with BLAST. Conclusion The in-depth analysis discussed here will complement the overall assessment by the CAFA organizers. Since PFP and ESG are based on sequence database search results, our analyses are not only useful for PFP and ESG users but will also shed light on the relationship of the sequence similarity space and functions that can be inferred from the sequences. PMID:23514353

  5. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1

    PubMed Central

    Rhee, Mun Su; Moritz, Brélan E.; Xie, Gary; Glavina del Rio, T.; Dalin, E.; Tice, H.; Bruce, D.; Goodwin, L.; Chertkov, O.; Brettin, T.; Han, C.; Detter, C.; Pitluck, S.; Land, Miriam L.; Patel, Milind; Ou, Mark; Harbrucker, Roberta; Ingram, Lonnie O.; Shanmugam, K. T.

    2011-01-01

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 °C and pH 5.0 and ferments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 °C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemicellulose. This bacterium is also considered as a potential probiotic. Complete genome sequence of a representative strain, B. coagulans strain 36D1, is presented and discussed. PMID:22675583

  6. BeadCons: detection of nucleic acid sequences by flow cytometry.

    PubMed

    Horejsh, Douglas; Martini, Federico; Capobianchi, Maria Rosaria

    2005-11-01

    Molecular beacons are single-stranded nucleic acid structures with a terminal fluorophore and a distal, terminal quencher. These molecules are typically used in real-time PCR assays, but have also been conjugated with solid matrices. This unit describes protocols related to molecular beacon-conjugated beads (BeadCons), whose specific hybridization with complementary target sequences can be resolved by cytometry. Assay sensitivity is achieved through the concentration of fluorescence signal on discrete particles. By using molecular beacons with different fluorophores and microspheres of different sizes, it is possible to construct a fluid array system with each bead corresponding to a specific target nucleic acid. Methods are presented for the design, construction, and use of BeadCons for the specific, multiplexed detection of unlabeled nucleic acids in solution. The use of bead-based detection methods will likely lead to the design of new multiplex molecular diagnostic tools.

  7. Measuring nanometer distances in nucleic acids using a sequence-independent nitroxide probe

    PubMed Central

    Qin, Peter Z; Haworth, Ian S; Cai, Qi; Kusnetzow, Ana K; Grant, Gian Paola G; Price, Eric A; Sowa, Glenna Z; Popova, Anna; Herreros, Bruno; He, Honghang

    2008-01-01

    This protocol describes the procedures for measuring nanometer distances in nucleic acids using a nitroxide probe that can be attached to any nucleotide within a given sequence. Two nitroxides are attached to phosphorothioates that are chemically substituted at specific sites of DNA or RNA. Inter-nitroxide distances are measured using a four-pulse double electron–electron resonance technique, and the measured distances are correlated to the parent structures using a Web-accessible computer program. Four to five days are needed for sample labeling, purification and distance measurement. The procedures described herein provide a method for probing global structures and studying conformational changes of nucleic acids and protein/nucleic acid complexes. PMID:17947978

  8. Complete Genome Sequence of a thermotolerant sporogenic lactic acid bacterium, Bacillus coagulans strain 36D1.

    PubMed

    Rhee, Mun Su; Moritz, Brélan E; Xie, Gary; Glavina Del Rio, T; Dalin, E; Tice, H; Bruce, D; Goodwin, L; Chertkov, O; Brettin, T; Han, C; Detter, C; Pitluck, S; Land, Miriam L; Patel, Milind; Ou, Mark; Harbrucker, Roberta; Ingram, Lonnie O; Shanmugam, K T

    2011-12-31

    Bacillus coagulans is a ubiquitous soil bacterium that grows at 50-55 °C and pH 5.0 and ferments various sugars that constitute plant biomass to L (+)-lactic acid. The ability of this sporogenic lactic acid bacterium to grow at 50-55 °C and pH 5.0 makes this organism an attractive microbial biocatalyst for production of optically pure lactic acid at industrial scale not only from glucose derived from cellulose but also from xylose, a major constituent of hemicellulose. This bacterium is also considered as a potential probiotic. Complete genome sequence of a representative strain, B. coagulans strain 36D1, is presented and discussed.

  9. Prediction of multi-drug resistance transporters using a novel sequence analysis method [version 2; referees: 2 approved

    DOE PAGES

    McDermott, Jason E.; Bruillard, Paul; Overall, Christopher C.; ...

    2015-03-09

    There are many examples of groups of proteins that have similar function, but the determinants of functional specificity may be hidden by lack of sequencesimilarity, or by large groups of similar sequences with different functions. Transporters are one such protein group in that the general function, transport, can be easily inferred from the sequence, but the substrate specificity can be impossible to predict from sequence with current methods. In this paper we describe a linguistic-based approach to identify functional patterns from groups of unaligned protein sequences and its application to predict multi-drug resistance transporters (MDRs) from bacteria. We first showmore » that our method can recreate known patterns from PROSITE for several motifs from unaligned sequences. We then show that the method, MDRpred, can predict MDRs with greater accuracy and positive predictive value than a collection of currently available family-based models from the Pfam database. Finally, we apply MDRpred to a large collection of protein sequences from an environmental microbiome study to make novel predictions about drug resistance in a potential environmental reservoir.« less

  10. Prediction of multi-drug resistance transporters using a novel sequence analysis method [version 2; referees: 2 approved

    SciTech Connect

    McDermott, Jason E.; Bruillard, Paul; Overall, Christopher C.; Gosink, Luke; Lindemann, Stephen R.

    2015-03-09

    There are many examples of groups of proteins that have similar function, but the determinants of functional specificity may be hidden by lack of sequencesimilarity, or by large groups of similar sequences with different functions. Transporters are one such protein group in that the general function, transport, can be easily inferred from the sequence, but the substrate specificity can be impossible to predict from sequence with current methods. In this paper we describe a linguistic-based approach to identify functional patterns from groups of unaligned protein sequences and its application to predict multi-drug resistance transporters (MDRs) from bacteria. We first show that our method can recreate known patterns from PROSITE for several motifs from unaligned sequences. We then show that the method, MDRpred, can predict MDRs with greater accuracy and positive predictive value than a collection of currently available family-based models from the Pfam database. Finally, we apply MDRpred to a large collection of protein sequences from an environmental microbiome study to make novel predictions about drug resistance in a potential environmental reservoir.

  11. Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM.

    PubMed

    Zhang, Shengli; Ye, Feng; Yuan, Xiguo

    2012-01-01

    The accurate identification of protein structure class solely using extracted information from protein sequence is a complicated task in the current computational biology. Prediction of protein structural class for low-similarity sequences remains a challenging problem. In this study, the new computational method has been developed to predict protein structural class by fusing the sequence information and evolution information to represent a protein sample. To evaluate the performance of the proposed method, jackknife cross-validation tests are performed on two widely used benchmark data-sets, 1189 and 25PDB with sequence similarity lower than 40 and 25%, respectively. Comparison of our results with other methods shows that the proposed method by us is very promising and may provide a cost-effective alternative to predict protein structural class in particular for low-similarity data-sets.

  12. The amino acid sequence of Lady Amherst's pheasant (Chrysolophus amherstiae) and golden pheasant (Chrysolophus pictus) egg-white lysozymes.

    PubMed

    Araki, T; Kuramoto, M; Torikata, T

    1990-09-01

    The amino acids of Lady Amherst's pheasant and golden pheasant egg-white lysozymes have been sequenced. The carboxymethylated lysozymes were digested with trypsin followed by sequencing of the tryptic peptides. Lady Amherst's pheasant lysozyme proved to consist of 129 amino acid residues, and a relative molecular mass of 14,423 Da was calculated. This lysozyme had 6 amino acids substitutions when compared with hen egg-white lysozyme: Phe3 to Tyr, His15 to Leu, Gln41 to His, Asn77 to His, Gln 121 to Asn, and a newly found substitution of Ile124 to Thr. The amino acid sequence of golden pheasant lysozyme was identical to that of Lady Amherst's phesant lysozyme. The phylogenetic tree constructured by the comparison of amino acid sequences of phasianoid birds lysozymes revealed a minimum genetic distance between these pheasants and the turkey-peafowl group.

  13. Accurate ab initio prediction of NMR chemical shifts of nucleic acids and nucleic acids/protein complexes

    PubMed Central

    Victora, Andrea; Möller, Heiko M.; Exner, Thomas E.

    2014-01-01

    NMR chemical shift predictions based on empirical methods are nowadays indispensable tools during resonance assignment and 3D structure calculation of proteins. However, owing to the very limited statistical data basis, such methods are still in their infancy in the field of nucleic acids, especially when non-canonical structures and nucleic acid complexes are considered. Here, we present an ab initio approach for predicting proton chemical shifts of arbitrary nucleic acid structures based on state-of-the-art fragment-based quantum chemical calculations. We tested our prediction method on a diverse set of nucleic acid structures including double-stranded DNA, hairpins, DNA/protein complexes and chemically-modified DNA. Overall, our quantum chemical calculations yield highly/very accurate predictions with mean absolute deviations of 0.3–0.6 ppm and correlation coefficients (r2) usually above 0.9. This will allow for identifying misassignments and validating 3D structures. Furthermore, our calculations reveal that chemical shifts of protons involved in hydrogen bonding are predicted significantly less accurately. This is in part caused by insufficient inclusion of solvation effects. However, it also points toward shortcomings of current force fields used for structure determination of nucleic acids. Our quantum chemical calculations could therefore provide input for force field optimization. PMID:25404135

  14. Whole-Genome Sequencing Analysis Accurately Predicts Antimicrobial Resistance Phenotypes in Campylobacter spp.

    PubMed

    Zhao, S; Tyson, G H; Chen, Y; Li, C; Mukherjee, S; Young, S; Lam, C; Folster, J P; Whichard, J M; McDermott, P F

    2015-10-30

    The objectives of this study were to identify antimicrobial resistance genotypes for Campylobacter and to evaluate the correlation between resistance phenotypes and genotypes using in vitro antimicrobial susceptibility testing and whole-genome sequencing (WGS). A total of 114 Campylobacter species isolates (82 C. coli and 32 C. jejuni) obtained from 2000 to 2013 from humans, retail meats, and cecal samples from food production animals in the United States as part of the National Antimicrobial Resistance Monitoring System were selected for study. Resistance phenotypes were determined using broth microdilution of nine antimicrobials. Genomic DNA was sequenced using the Illumina MiSeq platform, and resistance genotypes were identified using assembled WGS sequences through blastx analysis. Eighteen resistance genes, including tet(O), blaOXA-61, catA, lnu(C), aph(2″)-Ib, aph(2″)-Ic, aph(2')-If, aph(2″)-Ig, aph(2″)-Ih, aac(6')-Ie-aph(2″)-Ia, aac(6')-Ie-aph(2″)-If, aac(6')-Im, aadE, sat4, ant(6'), aad9, aph(3')-Ic, and aph(3')-IIIa, and mutations in two housekeeping genes (gyrA and 23S rRNA) were identified. There was a high degree of correlation between phenotypic resistance to a given drug and the presence of one or more corresponding resistance genes. Phenotypic and genotypic correlation was 100% for tetracycline, ciprofloxacin/nalidixic acid, and erythromycin, and correlations ranged from 95.4% to 98.7% for gentamicin, azithromycin, clindamycin, and telithromycin. All isolates were susceptible to florfenicol, and no genes associated with florfenicol resistance were detected. There was a strong correlation (99.2%) between resistance genotypes and phenotypes, suggesting that WGS is a reliable indicator of resistance to the nine antimicrobial agents assayed in this study. WGS has the potential to be a powerful tool for antimicrobial resistance surveillance programs.

  15. Electromyographic Patterns during Golf Swing: Activation Sequence Profiling and Prediction of Shot Effectiveness

    PubMed Central

    Verikas, Antanas; Vaiciukynas, Evaldas; Gelzinis, Adas; Parker, James; Olsson, M. Charlotte

    2016-01-01

    This study analyzes muscle activity, recorded in an eight-channel electromyographic (EMG) signal stream, during the golf swing using a 7-iron club and exploits information extracted from EMG dynamics to predict the success of the resulting shot. Muscles of the arm and shoulder on both the left and right sides, namely flexor carpi radialis, extensor digitorum communis, rhomboideus and trapezius, are considered for 15 golf players (∼5 shots each). The method using Gaussian filtering is outlined for EMG onset time estimation in each channel and activation sequence profiling. Shots of each player revealed a persistent pattern of muscle activation. Profiles were plotted and insights with respect to player effectiveness were provided. Inspection of EMG dynamics revealed a pair of highest peaks in each channel as the hallmark of golf swing, and a custom application of peak detection for automatic extraction of swing segment was introduced. Various EMG features, encompassing 22 feature sets, were constructed. Feature sets were used individually and also in decision-level fusion for the prediction of shot effectiveness. The prediction of the target attribute, such as club head speed or ball carry distance, was investigated using random forest as the learner in detection and regression tasks. Detection evaluates the personal effectiveness of a shot with respect to the player-specific average, whereas regression estimates the value of target attribute, using EMG features as predictors. Fusion after decision optimization provided the best results: the equal error rate in detection was 24.3% for the speed and 31.7% for the distance; the mean absolute percentage error in regression was 3.2% for the speed and 6.4% for the distance. Proposed EMG feature sets were found to be useful, especially when used in combination. Rankings of feature sets indicated statistics for muscle activity in both the left and right body sides, correlation-based analysis of EMG dynamics and features

  16. Prediction of nucleic acid binding probability in proteins: a neighboring residue network based score.

    PubMed

    Miao, Zhichao; Westhof, Eric

    2015-06-23

    We describe a general binding score for predicting the nucleic acid binding probability in proteins. The score is directly derived from physicochemical and evolutionary features and integrates a residue neighboring network approach. Our process achieves stable and high accuracies on both DNA- and RNA-binding proteins and illustrates how the main driving forces for nucleic acid binding are common. Because of the effective integration of the synergetic effects of the network of neighboring residues and the fact that the prediction yields a hierarchical scoring on the protein surface, energy funnels for nucleic acid binding appear on protein surfaces, pointing to the dynamic process occurring in the binding of nucleic acids to proteins.

  17. Sequence features accurately predict genome-wide MeCP2 binding in vivo

    PubMed Central

    Rube, H. Tomas; Lee, Wooje; Hejna, Miroslav; Chen, Huaiyang; Yasui, Dag H.; Hess, John F.; LaSalle, Janine M.; Song, Jun S.; Gong, Qizhi

    2016-01-01

    Methyl-CpG binding protein 2 (MeCP2) is critical for proper brain development and expressed at near-histone levels in neurons, but the mechanism of its genomic localization remains poorly understood. Using high-resolution MeCP2-binding data, we show that DNA sequence features alone can predict binding with 88% accuracy. Integrating MeCP2 binding and DNA methylation in a probabilistic graphical model, we demonstrate that previously reported genome-wide association with methylation is in part due to MeCP2's affinity to GC-rich chromatin, a result replicated using published data. Furthermore, MeCP2 co-localizes with nucleosomes. Finally, MeCP2 binding downstream of promoters correlates with increased expression in Mecp2-deficient neurons. PMID:27008915

  18. A 25-Amino Acid Sequence of the Arabidopsis TGD2 Protein Is Sufficient for Specific Binding of Phosphatidic Acid*

    PubMed Central

    Lu, Binbin; Benning, Christoph

    2009-01-01

    Genetic analysis suggests that the TGD2 protein of Arabidopsis is required for the biosynthesis of endoplasmic reticulum derived thylakoid lipids. TGD2 is proposed to be the substrate-binding protein of a presumed lipid transporter consisting of the TGD1 (permease) and TGD3 (ATPase) proteins. The TGD1, -2, and -3 proteins are localized in the inner chloroplast envelope membrane. TGD2 appears to be anchored with an N-terminal membrane-spanning domain into the inner envelope membrane, whereas the C-terminal domain faces the intermembrane space. It was previously shown that the C-terminal domain of TGD2 binds phosphatidic acid (PtdOH). To investigate the PtdOH binding site of TGD2 in detail, the C-terminal domain of the TGD2 sequence lacking the transit peptide and transmembrane sequences was fused to the C terminus of the Discosoma sp. red fluorescent protein (DR). This greatly improved the solubility of the resulting DR-TGD2C fusion protein following production in Escherichia coli. The DR-TGD2C protein bound PtdOH with high specificity, as demonstrated by membrane lipid-protein overlay and liposome association assays. Internal deletion and truncation mutagenesis identified a previously undescribed minimal 25-amino acid fragment in the C-terminal domain of TGD2 that is sufficient for PtdOH binding. Binding characteristics of this 25-mer were distinctly different from those of TGD2C, suggesting that additional sequences of TGD2 providing the proper context for this 25-mer are needed for wild type-like PtdOH binding. PMID:19416982

  19. Applications of the predictability of the Coherent Noise Model to aftershock sequences

    NASA Astrophysics Data System (ADS)

    Christopoulos, Stavros-Richard; Sarlis, Nicholas

    2014-05-01

    A study [1] of the coherent noise model [2-4] in natural time [5-7] has shown that it exhibits predictability. Interestingly, one of the predictors suggested [1] for the coherent noise model can be generalized and applied to the case of (real) aftershock sequences. The results obtained [8] so far are beyond chance. Here, we apply this approach to several aftershock sequences of strong earthquakes with magnitudes Mw ≥6.9 in Indonesia, California and Greece, including the Mw9.2 earthquake that occurred on 26 December 2004 in Sumatra. References. [1] N. V. Sarlis and S.-R. G. Christopoulos, Predictability of the coherent-noise model and its applications, Physical Review E, 85, 051136, 2012. [2] M.E.J. Newman, Self-organized criticality, evolution and the fossil extinction record, Proc. R. Soc. London B, 263, 1605-1610, 1996. [3] M. E. J. Newman and K. Sneppen, Avalanches, scaling, and coherent noise, Phys. Rev. E, 54, 6226-6231, 1996. [4] K. Sneppen and M. Newman, Coherent noise, scale invariance and intermittency in large systems, Physica D, 110, 209 - 222. [5] P. Varotsos, N. Sarlis, and E. Skordas, Spatiotemporal complexity aspects on the interrelation between Seismic Electric Signals and seismicity, Practica of Athens Academy, 76, 294-321, 2001. [6] P.A. Varotsos, N.V. Sarlis, and E.S. Skordas, Long-range correlations in the electric signals that precede rupture, Phys. Rev. E, 66, 011902, 2002. [7] Varotsos P. A., Sarlis N. V. and Skordas E. S., Natural Time Analysis: The new view of time. Precursory Seismic Electric Signals, Earthquakes and other Complex Time-Series (Springer-Verlag, Berlin Heidelberg) 2011. [8] N. V. Sarlis and S.-R. G. Christopoulos, "Visualization of the significance of Receiver Operating Characteristics based on confidence ellipses", Computer Physics Communications, http://dx.doi.org/10.1016/j.cpc.2013.12.009

  20. Temporal and Spatial Predictability of an Irrelevant Event Differently Affect Detection and Memory of Items in a Visual Sequence.

    PubMed

    Ohyama, Junji; Watanabe, Katsumi

    2016-01-01

    We examined how the temporal and spatial predictability of a task-irrelevant visual event affects the detection and memory of a visual item embedded in a continuously changing sequence. Participants observed 11 sequentially presented letters, during which a task-irrelevant visual event was either present or absent. Predictabilities of spatial location and temporal position of the event were controlled in 2 × 2 conditions. In the spatially predictable conditions, the event occurred at the same location within the stimulus sequence or at another location, while, in the spatially unpredictable conditions, it occurred at random locations. In the temporally predictable conditions, the event timing was fixed relative to the order of the letters, while in the temporally unpredictable condition; it could not be predicted from the letter order. Participants performed a working memory task and a target detection reaction time (RT) task. Memory accuracy was higher for a letter simultaneously presented at the same location as the event in the temporally unpredictable conditions, irrespective of the spatial predictability of the event. On the other hand, the detection RTs were only faster for a letter simultaneously presented at the same location as the event when the event was both temporally and spatially predictable. Thus, to facilitate ongoing detection processes, an event must be predictable both in space and time, while memory processes are enhanced by temporally unpredictable (i.e., surprising) events. Evidently, temporal predictability has differential effects on detection and memory of a visual item embedded in a sequence of images.

  1. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis.

    PubMed

    Bradley, Phelim; Gordon, N Claire; Walker, Timothy M; Dunn, Laura; Heys, Simon; Huang, Bill; Earle, Sarah; Pankhurst, Louise J; Anson, Luke; de Cesare, Mariateresa; Piazza, Paolo; Votintseva, Antonina A; Golubchik, Tanya; Wilson, Daniel J; Wyllie, David H; Diel, Roland; Niemann, Stefan; Feuerriegel, Silke; Kohl, Thomas A; Ismail, Nazir; Omar, Shaheed V; Smith, E Grace; Buck, David; McVean, Gil; Walker, A Sarah; Peto, Tim E A; Crook, Derrick W; Iqbal, Zamin

    2015-12-21

    The rise of antibiotic-resistant bacteria has led to an urgent need for rapid detection of drug resistance in clinical samples, and improvements in global surveillance. Here we show how de Bruijn graph representation of bacterial diversity can be used to identify species and resistance profiles of clinical isolates. We implement this method for Staphylococcus aureus and Mycobacterium tuberculosis in a software package ('Mykrobe predictor') that takes raw sequence data as input, and generates a clinician-friendly report within 3 minutes on a laptop. For S. aureus, the error rates of our method are comparable to gold-standard phenotypic methods, with sensitivity/specificity of 99.1%/99.6% across 12 antibiotics (using an independent validation set, n=470). For M. tuberculosis, our method predicts resistance with sensitivity/specificity of 82.6%/98.5% (independent validation set, n=1,609); sensitivity is lower here, probably because of limited understanding of the underlying genetic mechanisms. We give evidence that minor alleles improve detection of extremely drug-resistant strains, and demonstrate feasibility of the use of emerging single-molecule nanopore sequencing techniques for these purposes.

  2. Genome Sequence Variability Predicts Drug Precautions and Withdrawals from the Market

    PubMed Central

    Baik, Su Youn; Lee, Soo Youn; Park, Chan Hee; Park, Paul J.; Kim, Ju Han

    2016-01-01

    Despite substantial premarket efforts, a significant portion of approved drugs has been withdrawn from the market for safety reasons. The deleterious impact of nonsynonymous substitutions predicted by the SIFT algorithm on structure and function of drug-related proteins was evaluated for 2504 personal genomes. Both withdrawn (n = 154) and precautionary (Beers criteria (n = 90), and US FDA pharmacogenomic biomarkers (n = 96)) drugs showed significantly lower genomic deleteriousness scores (P < 0.001) compared to others (n = 752). Furthermore, the rates of drug withdrawals and precautions correlated significantly with the deleteriousness scores of the drugs (P < 0.01); this trend was confirmed for all drugs included in the withdrawal and precaution lists by the United Nations, European Medicines Agency, DrugBank, Beers criteria, and US FDA. Our findings suggest that the person-to-person genome sequence variability is a strong independent predictor of drug withdrawals and precautions. We propose novel measures of drug safety based on personal genome sequence analysis. PMID:27690231

  3. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis

    PubMed Central

    Bradley, Phelim; Gordon, N. Claire; Walker, Timothy M.; Dunn, Laura; Heys, Simon; Huang, Bill; Earle, Sarah; Pankhurst, Louise J.; Anson, Luke; de Cesare, Mariateresa; Piazza, Paolo; Votintseva, Antonina A.; Golubchik, Tanya; Wilson, Daniel J.; Wyllie, David H.; Diel, Roland; Niemann, Stefan; Feuerriegel, Silke; Kohl, Thomas A.; Ismail, Nazir; Omar, Shaheed V.; Smith, E. Grace; Buck, David; McVean, Gil; Walker, A. Sarah; Peto, Tim E. A.; Crook, Derrick W.; Iqbal, Zamin

    2015-01-01

    The rise of antibiotic-resistant bacteria has led to an urgent need for rapid detection of drug resistance in clinical samples, and improvements in global surveillance. Here we show how de Bruijn graph representation of bacterial diversity can be used to identify species and resistance profiles of clinical isolates. We implement this method for Staphylococcus aureus and Mycobacterium tuberculosis in a software package (‘Mykrobe predictor') that takes raw sequence data as input, and generates a clinician-friendly report within 3 minutes on a laptop. For S. aureus, the error rates of our method are comparable to gold-standard phenotypic methods, with sensitivity/specificity of 99.1%/99.6% across 12 antibiotics (using an independent validation set, n=470). For M. tuberculosis, our method predicts resistance with sensitivity/specificity of 82.6%/98.5% (independent validation set, n=1,609); sensitivity is lower here, probably because of limited understanding of the underlying genetic mechanisms. We give evidence that minor alleles improve detection of extremely drug-resistant strains, and demonstrate feasibility of the use of emerging single-molecule nanopore sequencing techniques for these purposes. PMID:26686880

  4. Nucleotide sequence of the luxC gene encoding fatty acid reductase of the lux operon from Photobacterium leiognathi.

    PubMed

    Lin, J W; Chao, Y F; Weng, S F

    1993-02-26

    The nucleotide sequence of the luxC gene (EMBL Accession No. 65156) encoding fatty acid reductase (FAR) of the lux operon from Photobacterium leiognathi PL741 was determined and the encoded amino acid sequence deduced. The fatty acid reductase is a component of the fatty acid reductase complex. The complex is responsible for converting fatty acid to aldehyde which serves as the substrate in the luciferase-catalyzed bioluminescent reaction. The protein comprises 478 amino acid residues and has a calculated M(r) of 53,858. Alignment and comparison of the fatty acid reductase of P. leiognathi with that of Vibrio harveyi B392 and Vibrio fischeri ATCC 7744 shows that there is 70% and 59% amino acid residues identity, respectively.

  5. JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms

    PubMed Central

    Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim

    2015-01-01

    The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/ PMID:26424080

  6. JRC GMO-Amplicons: a collection of nucleic acid sequences related to genetically modified organisms.

    PubMed

    Petrillo, Mauro; Angers-Loustau, Alexandre; Henriksson, Peter; Bonfini, Laura; Patak, Alex; Kreysa, Joachim

    2015-01-01

    The DNA target sequence is the key element in designing detection methods for genetically modified organisms (GMOs). Unfortunately this information is frequently lacking, especially for unauthorized GMOs. In addition, patent sequences are generally poorly annotated, buried in complex and extensive documentation and hard to link to the corresponding GM event. Here, we present the JRC GMO-Amplicons, a database of amplicons collected by screening public nucleotide sequence databanks by in silico determination of PCR amplification with reference methods for GMO analysis. The European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) provides these methods in the GMOMETHODS database to support enforcement of EU legislation and GM food/feed control. The JRC GMO-Amplicons database is composed of more than 240 000 amplicons, which can be easily accessed and screened through a web interface. To our knowledge, this is the first attempt at pooling and collecting publicly available sequences related to GMOs in food and feed. The JRC GMO-Amplicons supports control laboratories in the design and assessment of GMO methods, providing inter-alia in silico prediction of primers specificity and GM targets coverage. The new tool can assist the laboratories in the analysis of complex issues, such as the detection and identification of unauthorized GMOs. Notably, the JRC GMO-Amplicons database allows the retrieval and characterization of GMO-related sequences included in patents documentation. Finally, it can help annotating poorly described GM sequences and identifying new relevant GMO-related sequences in public databases. The JRC GMO-Amplicons is freely accessible through a web-based portal that is hosted on the EU-RL GMFF website. Database URL: http://gmo-crl.jrc.ec.europa.eu/jrcgmoamplicons/.

  7. Determination of the complete amino acid sequence for the coat protein of brome mosaic virus by time-of-flight mass spectrometry. Evidence for mutations associated with change of propagation host.

    PubMed

    She, Y M; Haber, S; Seifers, D L; Loboda, A; Chernushevich, I; Perreault, H; Ens, W; Standing, K G

    2001-06-08

    Time-of-flight mass spectrometry (TOFMS) has been applied to determine the complete coat protein amino acid sequences of a number of distinct brome mosaic virus (BMV) isolates. Ionization was carried out by both electrospray ionization and matrix-assisted laser desorption/ionization (MALDI). After determining overall coat protein masses, the proteins were digested with trypsin or Lys-C proteinases, and the digestion products were analyzed in a MALDI QqTOF mass spectrometer. The N terminus of the coat protein was found to be acetylated in each BMV isolate analyzed. In one isolate (BMV-Valverde), the amino acid sequence was identical to that predicted from the cDNA sequence of the "type" isolate, but deviations from the predicted amino acid sequence were observed for all the other isolates analyzed. When isolates were propagated in different host taxa, modified coat protein sequences were observed in some cases, along with the original sequence. Sequencing by TOFMS may therefore provide a basis for monitoring the effects of host passaging on a virus at the molecular level. Such TOFMS-based analyses assess the complete profiles of coat protein sequences actually present in infected tissues. They are therefore not subject to the selection biases inherent in deducing such sequences from reverse-transcribed viral RNA and cloning the resulting cDNA.

  8. Nucleic Acid Amplification Testing and Sequencing Combined with Acid-Fast Staining in Needle Biopsy Lung Tissues for the Diagnosis of Smear-Negative Pulmonary Tuberculosis

    PubMed Central

    Tian, Panwen; Chen, Xuerong; Liang, Zongan

    2016-01-01

    Background Smear-negative pulmonary tuberculosis (PTB) is common and difficult to diagnose. In this study, we investigated the diagnostic value of nucleic acid amplification testing and sequencing combined with acid-fast bacteria (AFB) staining of needle biopsy lung tissues for patients with suspected smear-negative PTB. Methods Patients with suspected smear-negative PTB who underwent percutaneous transthoracic needle biopsy between May 1, 2012, and June 30, 2015, were enrolled in this retrospective study. Patients with AFB in sputum smears were excluded. All lung biopsy specimens were fixed in formalin, embedded in paraffin, and subjected to acid-fast staining and tuberculous polymerase chain reaction (TB-PCR). For patients with positive AFB and negative TB-PCR results in lung tissues, probe assays and 16S rRNA sequencing were used for identification of nontuberculous mycobacteria (NTM). The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and diagnostic accuracy of PCR and AFB staining were calculated separately and in combination. Results Among the 220 eligible patients, 133 were diagnosed with TB (men/women: 76/57; age range: 17–80 years, confirmed TB: 9, probable TB: 124). Forty-eight patients who were diagnosed with other specific diseases were assigned as negative controls, and 39 patients with indeterminate final diagnosis were excluded from statistical analysis. The sensitivity, specificity, PPV, NPV, and accuracy of histological AFB (HAFB) for the diagnosis of smear-negative were 61.7% (82/133), 100% (48/48), 100% (82/82), 48.5% (48/181), and 71.8% (130/181), respectively. The sensitivity, specificity, PPV, and NPV of histological PCR were 89.5% (119/133), 95.8% (46/48), 98.3% (119/121), and 76.7% (46/60), respectively, demonstrating that histological PCR had significantly higher accuracy (91.2% [165/181]) than histological acid-fast staining (71.8% [130/181]), P < 0.001. Parallel testing of histological AFB

  9. Computational sequence analysis of predicted long dsRNA transcriptomes of major crops reveals sequence complementarity with human genes.

    PubMed

    Jensen, Peter D; Zhang, Yuanji; Wiggins, B Elizabeth; Petrick, Jay S; Zhu, Jin; Kerstetter, Randall A; Heck, Gregory R; Ivashuta, Sergey I

    2013-01-01

    Long double-stranded RNAs (long dsRNAs) are precursors for the effector molecules of sequence-specific RNA-based gene silencing in eukaryotes. Plant cells can contain numerous endogenous long dsRNAs. This study demonstrates that such endogenous long dsRNAs in plants have sequence complementarity to human genes. Many of these complementary long dsRNAs have perfect sequence complementarity of at least 21 nucleotides to human genes; enough complementarity to potentially trigger gene silencing in targeted human cells if delivered in functional form. However, the number and diversity of long dsRNA molecules in plant tissue from crops such as lettuce, tomato, corn, soy and rice with complementarity to human genes that have a long history of safe consumption supports a conclusion that long dsRNAs do not present a significant dietary risk.

  10. Complete amino acid sequence of the A chain of human complement-classical-pathway enzyme C1r.

    PubMed Central

    Arlaud, G J; Willis, A C; Gagnon, J

    1987-01-01

    The amino acid sequence of human C1r A chain was determined, from sequence analysis performed on fragments obtained from C1r autolytic cleavage, cleavage of methionyl bonds, tryptic cleavages at arginine and lysine residues, and cleavages by staphylococcal proteinase. The polypeptide chain has an N-terminal serine residue and contains 446 amino acid residues (Mr 51,200). The sequence data allow chemical characterization of fragments alpha (positions 1-211), beta (positions 212-279) and gamma (positions 280-446) yielded from C1r autolytic cleavage, and identification of the two major cleavage sites generating these fragments. Position 150 of C1r A chain is occupied by a modified amino acid residue that, upon acid hydrolysis, yields erythro-beta-hydroxyaspartic acid, and that is located in a sequence homologous to the beta-hydroxyaspartic acid-containing regions of Factor IX, Factor X, protein C and protein Z. Sequence comparison reveals internal homology between two segments (positions 10-78 and 186-257). Two carbohydrate moieties are attached to the polypeptide chain, both via asparagine residues at positions 108 and 204. Combined with the previously determined sequence of C1r B chain [Arlaud & Gagnon (1983) Biochemistry 22, 1758-1764], these data give the complete sequence of human C1r. PMID:3036070

  11. Spatiotemporal prediction applying fuzzy logic in a sequence of satellite images

    NASA Astrophysics Data System (ADS)

    Mezzadri-Centeno, Tania; Selleron, Gilles

    2002-01-01

    Spatial evolutions of anthropized ecosystems and the progressive transformation of spaces in the course of time emerge more and more as a special interest issue in researches about the environment. This evolution constitutes one of the major concerns in the domain of environmental space management. The landscape evolution of a region area and the perspectives for a future state rises an issue particularly important. What will be the state of the region in 15, 30 or 50 years? Time can produce transformations over a region area like emergence, disappearance or union of spatial entities... These transformations are called temporal phenomena. We propose to predict the forestry evolution in the forthcoming years on an experimental area, which reveals these spatial transformations. The proposed method is based on the analysis of terrain landscape given a sequence of n satellite images, which represent the state of a region area in different years. For these purposes, we have developed a specific spatio-temporal prediction approach, linking results of forestry evolution analysis and fuzzy logic. The method is supported by the analysis of the landscape dynamics of a test-site located in a tropical rain country: the oriental piedmont of Andes Mountain in Venezuela. This large area - at the scale of a spot satellite image - is typical of tropical deforestation in a pioneer front. The presented approach allows the geographer interested in environmental prospective problems to get type cartographical documents showing future conditions of a landscape. The experimental tests have showed promising results.

  12. Position-specific prediction of methylation sites from sequence conservation based on information theory.

    PubMed

    Shi, Yinan; Guo, Yanzhi; Hu, Yayun; Li, Menglong

    2015-07-23

    Protein methylation plays vital roles in many biological processes and has been implicated in various human diseases. To fully understand the mechanisms underlying methylation for use in drug design and work in methylation-related diseases, an initial but crucial step is to identify methylation sites. The use of high-throughput bioinformatics methods has become imperative to predict methylation sites. In this study, we developed a novel method that is based only on sequence conservation to predict protein methylation sites. Conservation difference profiles between methylated and non-methylated peptides were constructed by the information entropy (IE) in a wider neighbor interval around the methylation sites that fully incorporated all of the environmental information. Then, the distinctive neighbor residues were identified by the importance scores of information gain (IG). The most representative model was constructed by support vector machine (SVM) for Arginine and Lysine methylation, respectively. This model yielded a promising result on both the benchmark dataset and independent test set. The model was used to screen the entire human proteome, and many unknown substrates were identified. These results indicate that our method can serve as a useful supplement to elucidate the mechanism of protein methylation and facilitate hypothesis-driven experimental design and validation.

  13. Improving amino-acid identification, fit and C(alpha) prediction using the Simplex method in automated model building.

    PubMed

    Romo, Tod D; Sacchettini, James C; Ioerger, Thomas R

    2006-11-01

    Automated methods for protein model building in X-ray crystallography typically use a two-phased approach that involves first modeling the protein backbone followed by building in the side chains. The latter phase requires the identification of the amino-acid side-chain type as well as fitting of the side-chain model into the observed electron density. While mistakes in identification of individual side chains are common for a number of reasons, sequence alignment can sometimes be used to correct errors by mapping fragments into the true (expected) amino-acid sequence and exploiting contiguity constraints among neighbors. However, side chains cannot always be confidently aligned; this depends on having sufficient accuracy in the initial calls. The recognition of amino-acid side-chains based on the surrounding pattern of electron density, whether by features, density correlation or free atoms, can be sensitive to inaccuracies in the coordinates of the predicted backbone C(alpha) atoms to which they are anchored. By incorporating a Nelder-Mead Simplex search into the side-chain identification and model-building routines of TEXTAL, it is demonstrated that this form of residue-by-residue rigid-body real-space refinement (in which the C(alpha) itself is allowed to shift) can improve the initial accuracy of side-chain selection by over 25% on average (from 25% average identity to 32% on a test set of five representative proteins, without corrections by sequence alignment). This improvement in amino-acid selection accuracy in TEXTAL is often sufficient to bring the pairwise amino-acid identity of chains in the model out of the so-called ;twilight zone' for sequence-alignment methods. When coupled with sequence alignment, use of the Simplex search yielded improvements in side-chain accuracy on average by over 13 percentage points (from 64 to 77%) and up to 38 percentage points (from 40 to 78%) in one case compared with using sequence alignment alone.

  14. DNA methylation-based forensic age prediction using artificial neural networks and next generation sequencing.

    PubMed

    Vidaki, Athina; Ballard, David; Aliferi, Anastasia; Miller, Thomas H; Barron, Leon P; Syndercombe Court, Denise

    2017-05-01

    generation sequencing (NGS)-based method able to quantify the methylation status of the selected 16 CpG sites was developed using the Illumina MiSeq(®) platform. The method was validated using DNA standards of known methylation levels and the age prediction accuracy has been initially assessed in a set of 46 whole blood samples. Although the resulted prediction accuracy using the NGS data was lower compared to the original model (MAE=7.5years), it is expected that future optimization of our strategy to account for technical variation as well as increasing the sample size will improve both the prediction accuracy and reproducibility.

  15. Nucleic and amino acid sequences relating to a novel transketolase, and methods for the expression thereof

    DOEpatents

    Croteau, Rodney Bruce; Wildung, Mark Raymond; Lange, Bernd Markus; McCaskill, David G.

    2001-01-01

    cDNAs encoding 1-deoxyxylulose-5-phosphate synthase from peppermint (Mentha piperita) have been isolated and sequenced, and the corresponding amino acid sequences have been determined. Accordingly, isolated DNA sequences (SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7) are provided which code for the expression of 1-deoxyxylulose-5-phosphate synthase from plants. In another aspect the present invention provides for isolated, recombinant DXPS proteins, such as the proteins having the sequences set forth in SEQ ID NO:4, SEQ ID NO:6 and SEQ ID NO:8. In other aspects, replicable recombinant cloning vehicles are provided which code for plant 1-deoxyxylulose-5-phosphate synthases, or for a base sequence sufficiently complementary to at least a portion of 1-deoxyxylulose-5-phosphate synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding a plant 1-deoxyxylulose-5-phosphate synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant 1-deoxyxylulose-5-phosphate synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant 1-deoxyxylulose-5-phosphate synthase may be used to obtain expression or enhanced expression of 1-deoxyxylulose-5-phosphate synthase in plants in order to enhance the production of 1-deoxyxylulose-5-phosphate, or its derivatives such as isopentenyl diphosphate (BP), or may be otherwise employed for the regulation or expression of 1-deoxyxylulose-5-phosphate synthase, or the production of its products.

  16. Prediction of Scylla olivacea (Crustacea; Brachyura) peptide hormones using publicly accessible transcriptome shotgun assembly (TSA) sequences.

    PubMed

    Christie, Andrew E

    2016-05-01

    The aquaculture of crabs from the genus Scylla is of increasing economic importance for many Southeast Asian countries. Expansion of Scylla farming has led to increased efforts to understand the physiology and behavior of these crabs, and as such, there are growing molecular resources for them. Here, publicly accessible Scylla olivacea transcriptomic data were mined for putative peptide-encoding transcripts; the proteins deduced from the identified sequences were then used to predict the structures of mature peptide hormones. Forty-nine pre/preprohormone-encoding transcripts were identified, allowing for the prediction of 187 distinct mature peptides. The identified peptides included isoforms of adipokinetic hormone-corazonin-like peptide, allatostatin A, allatostatin B, allatostatin C, bursicon β, CCHamide, corazonin, crustacean cardioactive peptide, crustacean hyperglycemic hormone/molt-inhibiting hormone, diuretic hormone 31, eclosion hormone, FMRFamide-like peptide, HIGSLYRamide, insulin-like peptide, intocin, leucokinin, myosuppressin, neuroparsin, neuropeptide F, orcokinin, pigment dispersing hormone, pyrokinin, red pigment concentrating hormone, RYamide, short neuropeptide F, SIFamide and tachykinin-related peptide, all well-known neuropeptide families. Surprisingly, the tissue used to generate the transcriptome mined here is reported to be testis. Whether or not the testis samples had neural contamination is unknown. However, if the peptides are truly produced by this reproductive organ, it could have far reaching consequences for the study of crustacean endocrinology, particularly in the area of reproductive control. Regardless, this peptidome is the largest thus far predicted for any brachyuran (true crab) species, and will serve as a foundation for future studies of peptidergic control in members of the commercially important genus Scylla.

  17. Deep sequencing reveals microRNAs predictive of antiangiogenic drug response

    PubMed Central

    García-Donas, Jesús; Beuselinck, Benoit; Inglada-Pérez, Lucía; Graña, Osvaldo; Schöffski, Patrick; Wozniak, Agnieszka; Bechter, Oliver; Apellániz-Ruiz, Maria; Leandro-García, Luis Javier; Esteban, Emilio; Castellano, Daniel E.; González del Alba, Aranzazu; Climent, Miguel Angel; Hernando, Susana; Arranz, José Angel; Morente, Manuel; Pisano, David G.; Robledo, Mercedes

    2016-01-01

    The majority of metastatic renal cell carcinoma (RCC) patients are treated with tyrosine kinase inhibitors (TKI) in first-line treatment; however, a fraction are refractory to these antiangiogenic drugs. MicroRNAs (miRNAs) are regulatory molecules proven to be accurate biomarkers in cancer. Here, we identified miRNAs predictive of progressive disease under TKI treatment through deep sequencing of 74 metastatic clear cell RCC cases uniformly treated with these drugs. Twenty-nine miRNAs were differentially expressed in the tumors of patients who progressed under TKI therapy (P values from 6 × 10–9 to 3 × 10–3). Among 6 miRNAs selected for validation in an independent series, the most relevant associations corresponded to miR–1307-3p, miR–155-5p, and miR–221-3p (P = 4.6 × 10–3, 6.5 × 10–3, and 3.4 × 10–2, respectively). Furthermore, a 2 miRNA–based classifier discriminated individuals with progressive disease upon TKI treatment (AUC = 0.75, 95% CI, 0.64–0.85; P = 1.3 × 10–4) with better predictive value than clinicopathological risk factors commonly used. We also identified miRNAs significantly associated with progression-free survival and overall survival (P = 6.8 × 10–8 and 7.8 × 10–7 for top hits, respectively), and 7 overlapped with early progressive disease. In conclusion, this is the first miRNome comprehensive study, to our knowledge, that demonstrates a predictive value of miRNAs for TKI response and provides a new set of relevant markers that can help rationalize metastatic RCC treatment. PMID:27699216

  18. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, Heinz-Ulrich G.; Gray, Joe W.

    1995-01-01

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers ard probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity.

  19. Unconventional amino acid sequence of the sun anemone (Stoichactis helianthus) polypeptide neurotoxin

    SciTech Connect

    Kem, W.; Dunn, B.; Parten, B.; Pennington, M.; Price, D.

    1986-05-01

    A 5000 dalton polypeptide neurotoxin (Sh-NI) purified by G50 Sephadex, P-cellulose, and SP-Sephadex chromatography was homogeneous by isoelectric focusing. Sh-NI was highly toxic to crayfish (LD/sub 50/ 0.6 ..mu..g/kg) but without effect upon mice at 15,000 ..mu..g/kg (i.p. injection). The reduced, /sup 3/H-carboxymethylated toxin and its fragments were subjected to automatic Edman degradation and the resulting PTH-amino acids were identified by HPLC, back hydrolysis, and scintillation counting. Peptides resulting from proteolytic (clostripain, staphylococcal protease) and chemical (tryptophan) cleavage were sequenced. The sequence is: AACKCDDEGPDIRTAPLTGTVDLGSCNAGWEKCASYYTIIADCCRKKK. This sequence differs considerably from the homologous Anemonia and Anthopleura toxins; many of the identical residues (6 half-cystines, G9, P10, R13, G19, G29, W30) are probably critical for folding rather than receptor recognition. However, the Sh-NI sequence closely resembles Radioanthus macrodactylus neurotoxin III and r. paumotensis II. The authors propose that Sh-NI and related Radioanthus toxins act upon a different site on the sodium channel.

  20. Repeat sequence chromosome specific nucleic acid probes and methods of preparing and using

    DOEpatents

    Weier, H.U.G.; Gray, J.W.

    1995-06-27

    A primer directed DNA amplification method to isolate efficiently chromosome-specific repeated DNA wherein degenerate oligonucleotide primers are used is disclosed. The probes produced are a heterogeneous mixture that can be used with blocking DNA as a chromosome-specific staining reagent, and/or the elements of the mixture can be screened for high specificity, size and/or high degree of repetition among other parameters. The degenerate primers are sets of primers that vary in sequence but are substantially complementary to highly repeated nucleic acid sequences, preferably clustered within the template DNA, for example, pericentromeric alpha satellite repeat sequences. The template DNA is preferably chromosome-specific. Exemplary primers and probes are disclosed. The probes of this invention can be used to determine the number of chromosomes of a specific type in metaphase spreads, in germ line and/or somatic cell interphase nuclei, micronuclei and/or in tissue sections. Also provided is a method to select arbitrarily repeat sequence probes that can be screened for chromosome-specificity. 18 figs.

  1. Sequence-defined bioactive macrocycles via an acid-catalysed cascade reaction

    NASA Astrophysics Data System (ADS)

    Porel, Mintu; Thornlow, Dana N.; Phan, Ngoc N.; Alabi, Christopher A.

    2016-06-01

    Synthetic macrocycles derived from sequence-defined oligomers are a unique structural class whose ring size, sequence and structure can be tuned via precise organization of the primary sequence. Similar to peptides and other peptidomimetics, these well-defined synthetic macromolecules become pharmacologically relevant when bioactive side chains are incorporated into their primary sequence. In this article, we report the synthesis of oligothioetheramide (oligoTEA) macrocycles via a one-pot acid-catalysed cascade reaction. The versatility of the cyclization chemistry and modularity of the assembly process was demonstrated via the synthesis of >20 diverse oligoTEA macrocycles. Structural characterization via NMR spectroscopy revealed the presence of conformational isomers, which enabled the determination of local chain dynamics within the macromolecular structure. Finally, we demonstrate the biological activity of oligoTEA macrocycles designed to mimic facially amphiphilic antimicrobial peptides. The preliminary results indicate that macrocyclic oligoTEAs with just two-to-three cationic charge centres can elicit potent antibacterial activity against Gram-positive and Gram-negative bacteria.

  2. Complete amino acid sequence of ananain and a comparison with stem bromelain and other plant cysteine proteases.

    PubMed Central

    Lee, K L; Albee, K L; Bernasconi, R J; Edmunds, T

    1997-01-01

    The amino acid sequences of ananain (EC3.4.22.31) and stem bromelain (3.4.22.32), two cysteine proteases from pineapple stem, are similar yet ananain and stem bromelain possess distinct specificities towards synthetic peptide substrates and different reactivities towards the cysteine protease inhibitors E-64 and chicken egg white cystatin. We present here the complete amino acid sequence of ananain and compare it with the reported sequences of pineapple stem bromelain, papain and chymopapain from papaya and actinidin from kiwifruit. Ananain is comprised of 216 residues with a theoretical mass of 23464 Da. This primary structure includes a sequence insert between residues 170 and 174 not present in stem bromelain or papain and a hydrophobic series of amino acids adjacent to His-157. It is possible that these sequence differences contribute to the different substrate and inhibitor specificities exhibited by ananain and stem bromelain. PMID:9355753

  3. Microbial community dynamics in bioaugmented sequencing batch reactors for bromoamine acid removal.

    PubMed

    Qu, Yuanyuan; Zhou, Jiti; Wang, Jing; Fu, Xiang; Xing, Linlin

    2005-05-01

    Sphingomonas xenophaga QYY with the ability to degrade bromoamine acid (BAA) was previously isolated from sludge samples. The enhancement of BAA removal by strain QYY in sequencing batch reactors (SBRs) was investigated in this study. The results showed that augmented SBRs exhibited stronger abilities to degrade BAA than the non-augmented control one. In order to estimate the relationship between community dynamics and function of augmented SBRs, a combined method based on fingerprints (ribosomal intergenic spacer analysis, RISA) and 16S rRNA gene sequencing was used. The results indicated that the microbial community dynamics were substantially changed, and the introduced strain QYY was persistent in the augmented systems. This study suggests that it is feasible and potentially useful to enhance BAA removal using BAA-degrading bacteria, such as S. xenophaga QYY.

  4. [Measurement of the amino acid sequence for the fusion protein FP3 with LC-MS/MS].

    PubMed

    Li, Xiang; Gao, Xiang-Dong; Tao, Lei; Pei, De-Ning; Guo, Ying; Rao, Chun-Ming; Wang, Jun-Zhi

    2012-02-01

    The amino acid sequence of the fusion protein FP3 was measured by two types of LC-MS/MS and its primary structure was confirmed. After reduction and alkylation, the protein was digested with trypsin and glycosyl groups in glycopeptide were removed by PNGase F. The mixed peptides were separated by LC, then Q-TOF and Ion trap tandem mass spectrometry were used to measure b, y fragment ions of each peptide to analyze the amino acid sequence of fusion protein FP3. Seventy-six percent of full amino acid sequence of the fusion protein FP3 was measured by LC-ESI-Q-TOF with the remaining 24% completed by LC-ESI-Trap. As LC-MS and tandem mass spectrometry are rapid, sensitive, accurate to measure the protein amino acid sequence, they are important approach to structure analysis and identification of recombinant protein.

  5. NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents

    PubMed Central

    Liu, Sophia S.; Hockenberry, Adam J.; Lancichinetti, Andrea; Jewett, Michael C.

    2016-01-01

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems. PMID:27835644

  6. Morphological tranformation of calcite crystal growth by prismatic "acidic" polypeptide sequences.

    SciTech Connect

    Kim, I; Giocondi, J L; Orme, C A; Collino, J; Evans, J S

    2007-02-13

    Many of the interesting mechanical and materials properties of the mollusk shell are thought to stem from the prismatic calcite crystal assemblies within this composite structure. It is now evident that proteins play a major role in the formation of these assemblies. Recently, a superfamily of 7 conserved prismatic layer-specific mollusk shell proteins, Asprich, were sequenced, and the 42 AA C-terminal sequence region of this protein superfamily was found to introduce surface voids or porosities on calcite crystals in vitro. Using AFM imaging techniques, we further investigate the effect that this 42 AA domain (Fragment-2) and its constituent subdomains, DEAD-17 and Acidic-2, have on the morphology and growth kinetics of calcite dislocation hillocks. We find that Fragment-2 adsorbs on terrace surfaces and pins acute steps, accelerates then decelerates the growth of obtuse steps, forms clusters and voids on terrace surfaces, and transforms calcite hillock morphology from a rhombohedral form to a rounded one. These results mirror yet are distinct from some of the earlier findings obtained for nacreous polypeptides. The subdomains Acidic-2 and DEAD-17 were found to accelerate then decelerate obtuse steps and induce oval rather than rounded hillock morphologies. Unlike DEAD-17, Acidic-2 does form clusters on terrace surfaces and exhibits stronger obtuse velocity inhibition effects than either DEAD-17 or Fragment-2. Interestingly, a 1:1 mixture of both subdomains induces an irregular polygonal morphology to hillocks, and exhibits the highest degree of acute step pinning and obtuse step velocity inhibition. This suggests that there is some interplay between subdomains within an intra (Fragment-2) or intermolecular (1:1 mixture) context, and sequence interplay phenomena may be employed by biomineralization proteins to exert net effects on crystal growth and morphology.

  7. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast.

    PubMed

    Tsai, Zing Tsung-Yeh; Shiu, Shin-Han; Tsai, Huai-Kuang

    2015-08-01

    Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA "intrinsic properties" (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome.

  8. Contribution of Sequence Motif, Chromatin State, and DNA Structure Features to Predictive Models of Transcription Factor Binding in Yeast

    PubMed Central

    Tsai, Zing Tsung-Yeh; Shiu, Shin-Han; Tsai, Huai-Kuang

    2015-01-01

    Transcription factor (TF) binding is determined by the presence of specific sequence motifs (SM) and chromatin accessibility, where the latter is influenced by both chromatin state (CS) and DNA structure (DS) properties. Although SM, CS, and DS have been used to predict TF binding sites, a predictive model that jointly considers CS and DS has not been developed to predict either TF-specific binding or general binding properties of TFs. Using budding yeast as model, we found that machine learning classifiers trained with either CS or DS features alone perform better in predicting TF-specific binding compared to SM-based classifiers. In addition, simultaneously considering CS and DS further improves the accuracy of the TF binding predictions, indicating the highly complementary nature of these two properties. The contributions of SM, CS, and DS features to binding site predictions differ greatly between TFs, allowing TF-specific predictions and potentially reflecting different TF binding mechanisms. In addition, a "TF-agnostic" predictive model based on three DNA “intrinsic properties” (in silico predicted nucleosome occupancy, major groove geometry, and dinucleotide free energy) that can be calculated from genomic sequences alone has performance that rivals the model incorporating experiment-derived data. This intrinsic property model allows prediction of binding regions not only across TFs, but also across DNA-binding domain families with distinct structural folds. Furthermore, these predicted binding regions can help identify TF binding sites that have a significant impact on target gene expression. Because the intrinsic property model allows prediction of binding regions across DNA-binding domain families, it is TF agnostic and likely describes general binding potential of TFs. Thus, our findings suggest that it is feasible to establish a TF agnostic model for identifying functional regulatory regions in potentially any sequenced genome. PMID:26291518

  9. Sequence of a cDNA clone encoding the polysialic acid-rich and cytoplasmic domains of the neural cell adhesion molecule N-CAM.

    PubMed Central

    Hemperly, J J; Murray, B A; Edelman, G M; Cunningham, B A

    1986-01-01

    Purified fractions of the neural cell-adhesion molecule N-CAM from embryonic chicken brain contain two similar polypeptides (Mr, 160,000 and 130,000), each containing an amino-terminal external binding region, a carbohydrate-rich central region, and a carboxyl-terminal region that is associated with the cell. Previous studies indicate that the two polypeptides arise by alternative splicing of mRNAs transcribed from a single gene. We report here the 3556-nucleotide sequence of a cDNA clone (pEC208) that encodes 964 amino acids from the carbohydrate and cell-associated domains of the larger N-CAM polypeptide followed by 664 nucleotides of 3' untranslated sequence. The predicted protein sequence contains attachment sites for polysialic acid-containing oligosaccharides, four tandem homologous regions of polypeptide resembling those seen in the immunoglobulin superfamily, and a single hydrophobic sequence that appears to be the membrane-spanning segment. The cytoplasmic domain carboxyl terminal to this segment includes a block of approximately equal to 250 amino acids present in the larger but not in the smaller N-CAM polypeptide. We designate these the ld (large domain) polypeptide and the sd (small domain) polypeptide. The intracellular domains of the ld and sd polypeptides are likely to be critical for cell-surface modulation of N-CAM by interacting in a differential fashion with other intrinsic proteins or with the cytoskeleton. PMID:3458261

  10. Prediction of phase equilibrium and hydration free energy of carboxylic acids by Monte Carlo simulations.

    PubMed

    Ferrando, Nicolas; Gedik, Ibrahim; Lachet, Véronique; Pigeon, Laurent; Lugo, Rafael

    2013-06-13

    In this work, a new transferable united-atom force field has been developed to predict phase equilibrium and hydration free energy of carboxylic acids. To take advantage of the transferability of the AUA4 force field, all Lennard-Jones parameters of groups involved in the carboxylic acid chemical function are reused from previous parametrizations of this force field. Only a unique set of partial electrostatic charges is proposed to reproduce the experimental gas phase dipole moment, saturated liquid densities and vapor pressures. Phase equilibrium properties of various pure carboxylic acids (acetic acid, propanoic acid, butanoic acid, pentanoic acid, hexanoic acid) and one diacid (1,5-pentanedioic) are studied through Monte Carlo simulations in the Gibbs ensemble. A good accuracy is obtained for pure compound saturated liquid densities and vapor pressures (average deviation of 2% and 6%, respectively), as well as for critical points. The vaporization enthalpy is, however, poorly predicted for short acids, probably due to a limitation of the force field to correctly describe the significant dimerization in the vapor phase. Pressure-composition diagrams for two binary mixtures (acetic acid + n-butane and propanoic acid + pentanoic acid) are also computed with a good accuracy, showing the transferability of the proposed force field to mixtures. Hydration free energies are calculated for three carboxylic acids using thermodynamic integration. A systematic overestimation of around 10 kJ/mol is observed compared to experimental data. This new force field parametrized only on saturated equilibrium properties appears insufficient to reach an acceptable precision for this property, and only relative hydration free energies between two carboxylic acids can be correctly predicted. This highlights the limitation of the transferability feature of force fields to properties not included in the parametrization database.

  11. PTC725, an NS4B-Targeting Compound, Inhibits a Hepatitis C Virus Genotype 3 Replicon, as Predicted by Genome Sequence Analysis and Determined Experimentally

    PubMed Central

    Graci, Jason D.; Jung, Stephen P.; Pichardo, John; Tong, Xiao; Gu, Zhengxian

    2016-01-01

    PTC725 is a small molecule NS4B-targeting inhibitor of hepatitis C virus (HCV) genotype (gt) 1 RNA replication that lacks activity against HCV gt2. We analyzed the Los Alamos HCV sequence database to predict susceptible/resistant HCV gt's according to the prevalence of known resistance-conferring amino acids in the NS4B protein. Our analysis predicted that HCV gt3 would be highly susceptible to the activity of PTC725. Indeed, PTC725 was shown to be active against a gt3 subgenomic replicon with a 50% effective concentration of ∼5 nM. De novo resistance selection identified mutations encoding amino acid substitutions mapping to the first predicted transmembrane region of NS4B, a finding consistent with results for PTC725 and other NS4B-targeting compounds against HCV gt1. This is the first report of the activity of an NS4B targeting compound against HCV gt3. In addition, we have identified previously unreported amino acid substitutions selected by PTC725 treatment which further demonstrate that these compounds target the NS4B first transmembrane region. PMID:27620477

  12. Sequence selective recognition of double-stranded RNA using triple helix-forming peptide nucleic acids.

    PubMed

    Zengeya, Thomas; Gupta, Pankaj; Rozners, Eriks

    2014-01-01

    Noncoding RNAs are attractive targets for molecular recognition because of the central role they play in gene expression. Since most noncoding RNAs are in a double-helical conformation, recognition of such structures is a formidable problem. Herein, we describe a method for sequence-selective recognition of biologically relevant double-helical RNA (illustrated on ribosomal A-site RNA) using peptide nucleic acids (PNA) that form a triple helix in the major grove of RNA under physiologically relevant conditions. Protocols for PNA preparation and binding studies using isothermal titration calorimetry are described in detail.

  13. Predicting lipase types by improved Chou's pseudo-amino acid composition.

    PubMed

    Zhang, Guang-Ya; Li, Hong-Chun; Gao, Jia-Qiang; Fang, Bai-Shan

    2008-01-01

    By proposing a improved Chou's pseudo amino acid composition approach to extract the features of the sequences, a powerful predictor based on k-nearest neighbor was introduced to identify the types of lipases according to their sequences. To avoid redundancy and bias, demonstrations were performed on a dataset where none of the proteins has > or =25% sequence identity to any other. The overall success rate thus obtained by the 10-fold cross-validation test was over 90%, indicating that the improved Chou's pseudo amino acid composition might be a useful tool for extracting the features of protein sequences, or at lease can play a complementary role to many of the other existing approaches.

  14. Fluorescence energy transfer as a probe for nucleic acid structures and sequences.

    PubMed Central

    Mergny, J L; Boutorine, A S; Garestier, T; Belloc, F; Rougée, M; Bulychev, N V; Koshkin, A A; Bourson, J; Lebedev, A V; Valeur, B

    1994-01-01

    The primary or secondary structure of single-stranded nucleic acids has been investigated with fluorescent oligonucleotides, i.e., oligonucleotides covalently linked to a fluorescent dye. Five different chromophores were used: 2-methoxy-6-chloro-9-amino-acridine, coumarin 500, fluorescein, rhodamine and ethidium. The chemical synthesis of derivatized oligonucleotides is described. Hybridization of two fluorescent oligonucleotides to adjacent nucleic acid sequences led to fluorescence excitation energy transfer between the donor and the acceptor dyes. This phenomenon was used to probe primary and secondary structures of DNA fragments and the orientation of oligodeoxynucleotides synthesized with the alpha-anomers of nucleoside units. Fluorescence energy transfer can be used to reveal the formation of hairpin structures and the translocation of genes between two chromosomes. PMID:8152922

  15. Amino acid sequence of two neurotoxins from the venom of the Egyptian black snake (Walterinnesia aegyptia).

    PubMed

    Samejima, Y; Aoki-Tomomatsu, Y; Yanagisawa, M; Mebs, D

    1997-02-01

    The venom of the Egyptian black snake Walterinnesia aegyptia contains at least three toxins, which act postsynaptically to block the neuromuscular transmission of isolated rat phrenic nerve-diaphragm and chicken biventer cervicis muscle. The complete amino acid sequence of the two toxins, W-III and W-IV, consisting of 62 amino acid residues, was elucidated by Edman degradation of fragments obtained after Staphylococcus aureus protease and prolylpeptidase digestion. Although the toxins exhibit close structural homology to other short-chain postsynaptic neurotoxins from Elapidae venoms, toxin IV is unique by having a free SH-group (cysteine) at position 16. In position 35 of W-III, which is located at the tip of the central loop, threonine is replaced by lysine, which may alter the interaction of the toxin with the acetylcholine receptor, since the toxin is seven times less lethal than toxin W-IV.

  16. Complete genome sequence of Lactococcus lactis IO-1, a lactic acid bacterium that utilizes xylose and produces high levels of L-lactic acid.

    PubMed

    Kato, Hiroaki; Shiwa, Yuh; Oshima, Kenshiro; Machii, Miki; Araya-Kojima, Tomoko; Zendo, Takeshi; Shimizu-Kadota, Mariko; Hattori, Masahira; Sonomoto, Kenji; Yoshikawa, Hirofumi

    2012-04-01

    We report the complete genome sequence of Lactococcus lactis IO-1 (= JCM7638). It is a nondairy lactic acid bacterium, produces nisin Z, ferments xylose, and produces predominantly L-lactic acid at high xylose concentrations. From ortholog analysis with other five L. lactis strains, IO-1 was identified as L. lactis subsp. lactis.

  17. Complete genome sequence of Bacillus amyloliquefaciens LL3, which exhibits glutamic acid-independent production of poly-γ-glutamic acid.

    PubMed

    Geng, Weitao; Cao, Mingfeng; Song, Cunjiang; Xie, Hui; Liu, Li; Yang, Chao; Feng, Jun; Zhang, Wei; Jin, Yinghong; Du, Yang; Wang, Shufang

    2011-07-01

    Bacillus amyloliquefaciens is one of most prevalent Gram-positive aerobic spore-forming bacteria with the ability to synthesize polysaccharides and polypeptides. Here, we report the complete genome sequence of B. amyloliquefaciens LL3, which was isolated from fermented food and presents the glutamic acid-independent production of poly-γ-glutamic acid.

  18. Accurate Prediction of the Statistics of Repetitions in Random Sequences: A Case Study in Archaea Genomes

    PubMed Central

    Régnier, Mireille; Chassignet, Philippe

    2016-01-01

    Repetitive patterns in genomic sequences have a great biological significance and also algorithmic implications. Analytic combinatorics allow to derive formula for the expected length of repetitions in a random sequence. Asymptotic results, which generalize previous works on a binary alphabet, are easily computable. Simulations on random sequences show their accuracy. As an application, the sample case of Archaea genomes illustrates how biological sequences may differ from random sequences. PMID:27376057

  19. Formation Sequences of Iron Minerals in the Acidic Alteration Products and Variation of Hydrothermal Fluid Conditions

    NASA Astrophysics Data System (ADS)

    Isobe, H.; Yoshizawa, M.

    2008-12-01

    Iron minerals have important role in environmental issues not only on the Earth but also other terrestrial planets. Iron mineral species related to alteration products of primary minerals with surface or subsurface fluids are characterized by temperature, acidity and redox conditions of the fluids. We can see various iron- bearing alteration products in alteration products around fumaroles in geothermal/volcanic areas. In this study, zonal structures of iron minerals in alteration products of the geothermal area are observed to elucidate temporal and spatial variation of hydrothermal fluids. Alteration of the pyroxene-amphibole andesite of Garan-dake volcano, Oita, Japan occurs by the acidic hydrothermal fluid to form cristobalite leaching out elements other than Si. Hand specimens with unaltered or weakly altered core and cristobalite crust show various sequences of layers. XRD analysis revealed that the alteration degree is represented by abundance of cristobalite. Intermediately altered layers are characterized by occurrence including alunite, pyrite, kaolinite, goethite and hematite. A specimen with reddish brown core surrounded by cristobalite-rich white crust has brown colored layers at the boundary of core and the crust. Reddish core is characterized by occurrence of crystalline hematite by XRD. Another hand specimen has light gray core, which represents reduced conditions, and white cristobalite crust with light brown and reddish brown layers of ferric iron minerals between the core and the crust. On the other hand, hornblende crystals, typical ferrous iron-bearing mineral of the host rock, are well preserved in some samples with strongly decolorized cristobalite-rich groundmass. Hydrothermal alteration experiments of iron-rich basaltic material shows iron mineral species depend on acidity and temperature of the fluid. Oxidation states of the iron-bearing mineral species are strongly influenced by the acidity and redox conditions. Variations of alteration

  20. Design, synthesis, and characterization of a protein sequencing reagent yielding amino acid derivatives with enhanced detectability by mass spectrometry.

    PubMed Central

    Aebersold, R.; Bures, E. J.; Namchuk, M.; Goghari, M. H.; Shushan, B.; Covey, T. C.

    1992-01-01

    We report the design, chemical synthesis, and structural and functional characterization of a novel reagent for protein sequence analysis by the Edman degradation, yielding amino acid derivatives rapidly detectable at high sensitivity by ion-evaporation mass spectrometry. We demonstrate that the reagent 3-[4'(ethylene-N,N,N-trimethylamino)phenyl]-2-isothiocyanate is chemically stable and shows coupling and cyclization/cleavage yields comparable to phenylisothiocyanate, the standard reagent in chemical sequence analysis, under conditions typically encountered in manual or automated sequence analysis. Amino acid derivatives generated with this reagent were detectable by ion-evaporation mass spectrometry at the subfemtomole sensitivity level at a pace of one sample per minute. Furthermore, derivatives were identified by their mass, thus permitting the rapid and highly sensitive determination of the molecular nature of modified amino acids. Derivatives of amino acids with acidic, basic, polar, or hydrophobic side chains were reproducibly detectable at comparable sensitivities. The polar nature of the reagent required covalent immobilization of polypeptides prior to automated sequence analysis. This reagent, used in automated sequence analysis, has the potential for overcoming the limitations in sensitivity, speed, and the ability to characterize modified amino acid residues inherent in the chemical sequencing methods that are currently used. PMID:1304351

  1. Whole-Genome Sequencing Analysis Accurately Predicts Antimicrobial Resistance Phenotypes in Campylobacter spp.

    PubMed Central

    Tyson, G. H.; Chen, Y.; Li, C.; Mukherjee, S.; Young, S.; Lam, C.; Folster, J. P.; Whichard, J. M.; McDermott, P. F.

    2015-01-01

    The objectives of this study were to identify antimicrobial resistance genotypes for Campylobacter and to evaluate the correlation between resistance phenotypes and genotypes using in vitro antimicrobial susceptibility testing and whole-genome sequencing (WGS). A total of 114 Campylobacter species isolates (82 C. coli and 32 C. jejuni) obtained from 2000 to 2013 from humans, retail meats, and cecal samples from food production animals in the United States as part of the National Antimicrobial Resistance Monitoring System were selected for study. Resistance phenotypes were determined using broth microdilution of nine antimicrobials. Genomic DNA was sequenced using the Illumina MiSeq platform, and resistance genotypes were identified using assembled WGS sequences through blastx analysis. Eighteen resistance genes, including tet(O), blaOXA-61, catA, lnu(C), aph(2″)-Ib, aph(2″)-Ic, aph(2′)-If, aph(2″)-Ig, aph(2″)-Ih, aac(6′)-Ie-aph(2″)-Ia, aac(6′)-Ie-aph(2″)-If, aac(6′)-Im, aadE, sat4, ant(6′), aad9, aph(3′)-Ic, and aph(3′)-IIIa, and mutations in two housekeeping genes (gyrA and 23S rRNA) were identified. There was a high degree of correlation between phenotypic resistance to a given drug and the presence of one or more corresponding resistance genes. Phenotypic and genotypic correlation was 100% for tetracycline, ciprofloxacin/nalidixic acid, and erythromycin, and correlations ranged from 95.4% to 98.7% for gentamicin, azithromycin, clindamycin, and telithromycin. All isolates were susceptible to florfenicol, and no genes associated with florfenicol resistance were detected. There was a strong correlation (99.2%) between resistance genotypes and phenotypes, suggesting that WGS is a reliable indicator of resistance to the nine antimicrobial agents assayed in this study. WGS has the potential to be a powerful tool for antimicrobial resistance surveillance programs. PMID:26519386

  2. Exome Sequencing and Prediction of Long-Term Kidney Allograft Function

    PubMed Central

    Mesnard, Laurent; Muthukumar, Thangamani; Burbach, Maren; Li, Carol; Shang, Huimin; Dadhania, Darshana; Lee, John R.; Xiang, Jenny; Suberbielle, Caroline; Carmagnat, Maryvonnick; Ouali, Nacera; Rondeau, Eric; Abecassis, Michael M.; Suthanthiran, Manikkam

    2016-01-01

    Current strategies to improve graft outcome following kidney transplantation consider information at the human leukocyte antigen (HLA) loci. Cell surface antigens, in addition to HLA, may serve as the stimuli as well as the targets for the anti-allograft immune response and influence long-term graft outcomes. We therefore performed exome sequencing of DNA from kidney graft recipients and their living donors and estimated all possible cell surface antigens mismatches for a given donor/recipient pair by computing the number of amino acid mismatches in trans-membrane proteins. We designated this tally as the allogenomics mismatch score (AMS). We examined the association between the AMS and post-transplant estimated glomerular filtration rate (eGFR) using mixed models, considering transplants from three independent cohorts (a total of 53 donor-recipient pairs, 106 exomes, and 239 eGFR measurements). We found that the AMS has a significant effect on eGFR (mixed model, effect size across the entire range of the score: -19.4 [-37.7, -1.1], P = 0.0042, χ2 = 8.1919, d.f. = 1) that is independent of the HLA-A, B, DR matching, donor age, and time post-transplantation. The AMS effect is consistent across the three independent cohorts studied and similar to the strong effect size of donor age. Taken together, these results show that the AMS, a novel tool to quantify amino acid mismatches in trans-membrane proteins in individual donor/recipient pair, is a strong, robust predictor of long-term graft function in kidney transplant recipients. PMID:27684477

  3. Prediction of Thermostability from Amino Acid Attributes by Combination of Clustering with Attribute Weighting: A New Vista in Engineering Enzymes

    PubMed Central

    Ebrahimi, Mansour; Lakizadeh, Amir; Agha-Golzadeh, Parisa; Ebrahimie, Esmaeil; Ebrahimi, Mahdi

    2011-01-01

    The engineering of thermostable enzymes is receiving increased attention. The paper, detergent, and biofuel industries, in particular, seek to use environmentally friendly enzymes instead of toxic chlorine chemicals. Enzymes typically function at temperatures below 60°C and denature if exposed to higher temperatures. In contrast, a small portion of enzymes can withstand higher temperatures as a result of various structural adaptations. Understanding the protein attributes that are involved in this adaptation is the first step toward engineering thermostable enzymes. We employed various supervised and unsupervised machine learning algorithms as well as attribute weighting approaches to find amino acid composition attributes that contribute to enzyme thermostability. Specifically, we compared two groups of enzymes: mesostable and thermostable enzymes. Furthermore, a combination of attribute weighting with supervised and unsupervised clustering algorithms was used for prediction and modelling of protein thermostability from amino acid composition properties. Mining a large number of protein sequences (2090) through a variety of machine learning algorithms, which were based on the analysis of more than 800 amino acid attributes, increased the accuracy of this study. Moreover, these models were successful in predicting thermostability from the primary structure of proteins. The results showed that expectation maximization clustering in combination with uncertainly and correlation attribute weighting algorithms can effectively (100%) classify thermostable and mesostable proteins. Seventy per cent of the weighting methods selected Gln content and frequency of hydrophilic residues as the most important protein attributes. On the dipeptide level, the frequency of Asn-Glu was the key factor in distinguishing mesostable from thermostable enzymes. This study demonstrates the feasibility of predicting thermostability irrespective of sequence similarity and will serve as a

  4. Complete Genome Sequence of Enterobacter cloacae UW5, a Rhizobacterium Capable of High Levels of Indole-3-Acetic Acid Production.

    PubMed

    Coulson, Thomas J D; Patten, Cheryl L

    2015-08-06

    We report the complete genome sequence of Enterobacter cloacae UW5, an indole-3-acetic acid-producing rhizobacterium originally isolated from the rhizosphere of grass. The 4.9-Mbp genome has a G+C content of 54% and contains 4,496 protein-coding sequences.

  5. Complete Genome Sequence of Enterobacter cloacae UW5, a Rhizobacterium Capable of High Levels of Indole-3-Acetic Acid Production

    PubMed Central

    Coulson, Thomas J. D.

    2015-01-01

    We report the complete genome sequence of Enterobacter cloacae UW5, an indole-3-acetic acid-producing rhizobacterium originally isolated from the rhizosphere of grass. The 4.9-Mbp genome has a G+C content of 54% and contains 4,496 protein-coding sequences. PMID:26251488

  6. Genome Sequence of the Lactic Acid Bacterium Lactococcus lactis subsp. lactis TOMSC161, Isolated from a Nonscalded Curd Pressed Cheese

    PubMed Central

    Velly, H.; Abraham, A.-L.; Loux, V.; Delacroix-Buchet, A.; Fonseca, F.; Bouix, M.

    2014-01-01

    Lactococcus lactis is a lactic acid bacterium used in the production of many fermented foods, such as dairy products. Here, we report the genome sequence of L. lactis subsp. lactis TOMSC161, isolated from nonscalded curd pressed cheese. This genome sequence provides information in relation to dairy environment adaptation. PMID:25377704

  7. Deoxyribonucleic acid sequence of araBAD promoter mutants of Escherichia coli.

    PubMed

    Horwitz, A H; Morandi, C; Wilcox, G

    1980-05-01

    The controlling site region for the araBAD operon is defined, in part, by two classes of cis-acting constitutive mutations. The aralc mutations allow low-level constitutive expression of ara-BAD in the absence of the positive regulatory protein coded for by the araC gene, whereas the araXc mutations allow expression of araBAD in the absence of the cyclic adenosine monophosphate receptor protein. Six independently isolated aralc mutations and three independently isolated araXc mutations were cloned onto the plasmid pBR322 using in vitro recombinant deoxyribonucleic acid techniques and in vivo recombination between plasmid and chromosomal deoxyribonucleic acid. The location of these mutations was determined by deoxyribonucleic acid sequence analysis. All of the aralc mutations occurred at position -35 within the araBAD promoter (+1 = messenger ribonucleic acid start for araBAD) and resulted from an AT leads to GC transition. All of the araXc mutations occurred at position -10 within the araBAD promoter and resulted from a GC leads to AT transition. Models are presented to explain the mode of action of the aralc and araXc mutations.

  8. Inter-Protein Sequence Co-Evolution Predicts Known Physical Interactions in Bacterial Ribosomes and the Trp Operon.

    PubMed

    Feinauer, Christoph; Szurmant, Hendrik; Weigt, Martin; Pagnani, Andrea

    2016-01-01

    Interaction between proteins is a fundamental mechanism that underlies virtually all biological processes. Many important interactions are conserved across a large variety of species. The need to maintain interaction leads to a high degree of co-evolution between residues in the interface between partner proteins. The inference of protein-protein interaction networks from the rapidly growing sequence databases is one of the most formidable tasks in systems biology today. We propose here a novel approach based on the Direct-Coupling Analysis of the co-evolution between inter-protein residue pairs. We use ribosomal and trp operon proteins as test cases: For the small resp. large ribosomal subunit our approach predicts protein-interaction partners at a true-positive rate of 70% resp. 90% within the first 10 predictions, with areas of 0.69 resp. 0.81 under the ROC curves for all predictions. In the trp operon, it assigns the two largest interaction scores to the only two interactions experimentally known. On the level of residue interactions we show that for both the small and the large ribosomal subunit our approach predicts interacting residues in the system with a true positive rate of 60% and 85% in the first 20 predictions. We use artificial data to show that the performance of our approach depends crucially on the size of the joint multiple sequence alignments and analyze how many sequences would be necessary for a perfect prediction if the sequences were sampled from the same model that we use for prediction. Given the performance of our approach on the test data we speculate that it can be used to detect new interactions, especially in the light of the rapid growth of available sequence data.

  9. Inter-Protein Sequence Co-Evolution Predicts Known Physical Interactions in Bacterial Ribosomes and the Trp Operon

    PubMed Central

    Feinauer, Christoph; Szurmant, Hendrik; Weigt, Martin; Pagnani, Andrea

    2016-01-01

    Interaction between proteins is a fundamental mechanism that underlies virtually all biological processes. Many important interactions are conserved across a large variety of species. The need to maintain interaction leads to a high degree of co-evolution between residues in the interface between partner proteins. The inference of protein-protein interaction networks from the rapidly growing sequence databases is one of the most formidable tasks in systems biology today. We propose here a novel approach based on the Direct-Coupling Analysis of the co-evolution between inter-protein residue pairs. We use ribosomal and trp operon proteins as test cases: For the small resp. large ribosomal subunit our approach predicts protein-interaction partners at a true-positive rate of 70% resp. 90% within the first 10 predictions, with areas of 0.69 resp. 0.81 under the ROC curves for all predictions. In the trp operon, it assigns the two largest interaction scores to the only two interactions experimentally known. On the level of residue interactions we show that for both the small and the large ribosomal subunit our approach predicts interacting residues in the system with a true positive rate of 60% and 85% in the first 20 predictions. We use artificial data to show that the performance of our approach depends crucially on the size of the joint multiple sequence alignments and analyze how many sequences would be necessary for a perfect prediction if the sequences were sampled from the same model that we use for prediction. Given the performance of our approach on the test data we speculate that it can be used to detect new interactions, especially in the light of the rapid growth of available sequence data. PMID:26882169

  10. In the TTF-1 homeodomain the contribution of several amino acids to DNA recognition depends on the bound sequence.

    PubMed Central

    Fabbro, D; Tell, G; Leonardi, A; Pellizzari, L; Pucillo, C; Lonigro, R; Formisano, S; Damante, G

    1996-01-01

    The thyroid transcription factor-1 homeodomain (TTF-1HD) shows a peculiar DNA binding specificity, preferentially recognizing sequences containing the 5'-CAAG-3' core motif. Most other homeodomains instead recognize sites containing the 5'-TAAT-3' core motif. Here, we show that TTF-1HD efficiently recognizes another sequence, called D1, devoid of the 5'-CAAG-3' core motif. Different experimental approaches indicate that TTF-1HD contacts the D1 sequence in a manner which is different to that used to interact with sequences containing the 5'-CAAG-3' core motif. The binding activities that mutants of TTF-1HD display with the D1 sequence or with the sequence containing the 5'-CAAG-3' core motif indicate that the role of several DNA-contacting amino acids is different. In particular, during recognition of the D1 sequence, backbone-interacting amino acids not relevant in binding to sequences containing the 5'-CAAG-3' core motif play an important role. In the TTF-1HD, therefore, the contribution of several amino acids to DNA recognition depends on the bound sequence. These data indicate that although a common bonding network exists in all of the HD/DNA complexes, peculiarities important for DNA recognition may occur in single cases. PMID:8811078

  11. Integrated protein function prediction by mining function associations, sequences, and protein–protein and gene–gene interaction networks

    PubMed Central

    Cao, Renzhi; Cheng, Jianlin

    2016-01-01

    Motivations Protein function prediction is an important and challenging problem in bioinformatics and computational biology. Functionally relevant biological information such as protein sequences, gene expression, and protein–protein interactions has been used mostly separately for protein function prediction. One of the major challenges is how to effectively integrate multiple sources of both traditional and new information such as spatial gene–gene interaction networks generated from chromosomal conformation data together to improve protein function prediction. Results In this work, we developed three different probabilistic scores (MIS, SEQ, and NET score) to combine protein sequence, function associations, and protein–protein interaction and spatial gene–gene interaction networks for protein function prediction. The MIS score is mainly generated from homologous proteins found by PSI-BLAST search, and also association rules between Gene Ontology terms, which are learned by mining the Swiss-Prot database. The SEQ score is generated from protein sequences. The NET score is generated from protein–protein interaction and spatial gene–gene interaction networks. These three scores were combined in a new Statistical Multiple Integrative Scoring System (SMISS) to predict protein function. We tested SMISS on the data set of 2011 Critical Assessment of Function Annotation (CAFA). The method performed substantially better than three base-line methods and an advanced method based on protein profile–sequence comparison, profile–profile comparison, and domain co-occurrence networks according to the maximum F-measure. PMID:26370280

  12. Multipolar Electrostatic Energy Prediction for all 20 Natural Amino Acids Using Kriging Machine Learning.

    PubMed

    Fletcher, Timothy L; Popelier, Paul L A

    2016-06-14

    A machine learning method called kriging is applied to the set of all 20 naturally occurring amino acids. Kriging models are built that predict electrostatic multipole moments for all topological atoms in any amino acid based on molecular geometry only. These models then predict molecular electrostatic interaction energies. On the basis of 200 unseen test geometries for each amino acid, no amino acid shows a mean prediction error above 5.3 kJ mol(-1), while the lowest error observed is 2.8 kJ mol(-1). The mean error across the entire set is only 4.2 kJ mol(-1) (or 1 kcal mol(-1)). Charged systems are created by protonating or deprotonating selected amino acids, and these show no significant deviation in prediction error over their neutral counterparts. Similarly, the proposed methodology can also handle amino acids with aromatic side chains, without the need for modification. Thus, we present a generic method capable of accurately capturing multipolar polarizable electrostatics in amino acids.

  13. Molecular cloning, encoding sequence, and expression of vaccinia virus nucleic acid-dependent nucleoside triphosphatase gene.

    PubMed Central

    Rodriguez, J F; Kahn, J S; Esteban, M

    1986-01-01

    A rabbit poxvirus genomic library contained within the expression vector lambda gt11 was screened with polyclonal antiserum prepared against vaccinia virus nucleic acid-dependent nucleoside triphosphatase (NTPase)-I enzyme. Five positive phage clones containing from 0.72- to 2.5-kilobase-pair (kbp) inserts expressed a beta-galactosidase fusion protein that was reactive by immunoblotting with the NTPase-I antibody. Hybridization analysis allowed the location of this gene within the vaccinia HindIIID restriction fragment. From the known nucleotide sequence of the 16-kbp vaccinia HindIIID fragment, we identified a region that contains a 1896-base open reading frame coding for a 631-amino acid protein. Analysis of the complete sequence revealed a highly basic protein, with hydrophilic COOH and NH2 termini, various hydrophobic domains, and no significant homology to other known proteins. Translational studies demonstrate that NTPase-I belongs to a late class of viral genes. This protein is highly conserved among Orthopoxviruses. Images PMID:3025846

  14. The amino acid sequences and activities of synergistic hemolysins from Staphylococcus cohnii.

    PubMed

    Mak, Pawel; Maszewska, Agnieszka; Rozalska, Malgorzata

    2008-10-01

    Staphylococcus cohnii ssp. cohnii and S. cohnii ssp. urealyticus are a coagulase-negative staphylococci considered for a long time as unable to cause infections. This situation changed recently and pathogenic strains of these bacteria were isolated from hospital environments, patients and medical staff. Most of the isolated strains were resistant to many antibiotics. The present work describes isolation and characterization of several synergistic peptide hemolysins produced by these bacteria and acting as virulence factors responsible for hemolytic and cytotoxic activities. Amino acid sequences of respective hemolysins from S. cohnii ssp. cohnii (named as H1C, H2C and H3C) and S. cohnii ssp. urealyticus (H1U, H2U and H3U) were identical. Peptides H1 and H3 possessed significant amino acid homology to three synergistic hemolysins secreted by Staphylococcus lugdunensis and to putative antibacterial peptide produced by Staphylococcus saprophyticus ssp. saprophyticus. On the other hand, hemolysin H2 had a unique sequence. All isolated peptides lysed red cells from different mammalian species and exerted a cytotoxic effect on human fibroblasts.

  15. The Sequence-Specific Cellular Uptake of Spherical Nucleic Acid Nanoparticle Conjugates

    PubMed Central

    Narayan, Suguna P.; Choi, Chung Hang J.; Hao, Liangliang; Calabrese, Colin M.; Auyeung, Evelyn; Zhang, Chuan; Goor, Olga J.G.M.

    2015-01-01

    We investigated the sequence-dependent cellular uptake of spherical nucleic acid nanoparticle conjugates (SNAs). This process occurs by interaction with class A scavenger receptors (SR-A) and caveolae-mediated endocytosis. It is known that linear poly(guanine) (poly G) is a natural ligand for SR-A, and it has been proposed that interaction of poly G with SR-A is dependent on the formation of G-quadruplexes. Since G-rich oligonucleotides are known to interact strongly with SR-A, we hypothesized that SNAs with higher G contents would be able to enter cells in larger amounts than SNAs composed of other nucleotides, and as such we measured cellular internalization of SNAs as a function of constituent oligonucleotide sequence. Indeed, SNAs with enriched G content show the highest cellular uptake. Using this hypothesis, we chemically conjugated a small molecule (camptothecin) with SNAs to create drug-SNA conjugates and observed that poly G SNAs deliver the most camptothecin to cells and have the highest cytotoxicity in cancer cells. Our data elucidate important design considerations for enhancing the intracellular delivery of spherical nucleic acids. PMID:26097111

  16. Partial amino acid sequences around sulfhydryl groups of soybean beta-amylase.

    PubMed

    Nomura, K; Mikami, B; Morita, Y

    1987-08-01

    Sulfhydryl (SH) groups of soybean beta-amylase were modified with 5-(iodoaceto-amidoethyl)aminonaphthalene-1-sulfonate (IAEDANS) and the SH-containing peptides exhibiting fluorescence were purified after chymotryptic digestion of the modified enzyme. The sequence analysis of the peptides derived from the modification of all SH groups in the denatured enzyme revealed the existence of six SH groups, in contrast to five reported previously. One of them was found to have extremely low reactivity toward SH-reagents without reduction. In the native state, IAEDANS reacted with 2 mol of SH groups per mol of the enzyme (SH1 and SH2) accompanied with inactivation of the enzyme owing to the modification of SH2 located near the active site of this enzyme. The selective modification of SH2 with IAEDANS was attained after the blocking of SH1 with 5,5'-dithiobis-(2-nitrobenzoic acid). The amino acid sequences of the peptides containing SH1 and SH2 were determined to be Cys-Ala-Asn-Pro-Gln and His-Gln-Cys-Gly-Gly-Asn-Val-Gly-Asp-Ile-Val-Asn-Ile-Pro-Ile-Pro-Gln-Trp, respectively.

  17. Genome Sequence of Lactobacillus rhamnosus Strain CASL, an Efficient l-Lactic Acid Producer from Cheap Substrate Cassava

    PubMed Central

    Yu, Bo; Su, Fei; Wang, Limin; Zhao, Bo; Qin, Jiayang; Ma, Cuiqing; Xu, Ping; Ma, Yanhe

    2011-01-01

    Lactobacillus rhamnosus is a type of probiotic bacteria with industrial potential for l-lactic acid production. We announce the draft genome sequence of L. rhamnosus CASL (2,855,156 bp with a G+C content of 46.6%), which is an efficient producer of l-lactic acid from cheap, nonfood substrate cassava with a high production titer. PMID:22123765

  18. High Dietary Acid Load Predicts ESRD among Adults with CKD

    PubMed Central

    Crews, Deidra C.; Wesson, Donald E.; Tilea, Anca M.; Saran, Rajiv; Ríos-Burrows, Nilka; Williams, Desmond E.; Powe, Neil R.

    2015-01-01

    Small clinical trials have shown that a reduction in dietary acid load (DAL) improves kidney injury and slows kidney function decline; however, the relationship between DAL and risk of ESRD in a population-based cohort with CKD remains unexamined. We examined the association between DAL, quantified by net acid excretion (NAEes), and progression to ESRD in a nationally representative sample of adults in the United States. Among 1486 adults with CKD age≥20 years enrolled in the National Health and Nutrition Examination Survey III, DAL was determined by 24-h dietary recall questionnaire. The development of ESRD was ascertained over a median 14.2 years of follow-up through linkage with the Medicare ESRD Registry. We used the Fine–Gray competing risks method to estimate the association of high, medium, and low DAL with ESRD after adjusting for demographics, nutritional factors, clinical factors, and kidney function/damage markers and accounting for intervening mortality events. In total, 311 (20.9%) participants developed ESRD. Higher levels of DAL were associated with increased risk of ESRD; relative hazards (95% confidence interval) were 3.04 (1.58 to 5.86) for the highest tertile and 1.81 (0.89 to 3.68) for the middle tertile compared with the lowest tertile in the fully adjusted model. The risk of ESRD associated with DAL tertiles increased as eGFR decreased (P trend=0.001). Among participants with albuminuria, high DAL was strongly associated with ESRD risk (P trend=0.03). In conclusion, high DAL in persons with CKD is independently associated with increased risk of ESRD in a nationally representative population. PMID:25677388

  19. High Dietary Acid Load Predicts ESRD among Adults with CKD.

    PubMed

    Banerjee, Tanushree; Crews, Deidra C; Wesson, Donald E; Tilea, Anca M; Saran, Rajiv; Ríos-Burrows, Nilka; Williams, Desmond E; Powe, Neil R

    2015-07-01

    Small clinical trials have shown that a reduction in dietary acid load (DAL) improves kidney injury and slows kidney function decline; however, the relationship between DAL and risk of ESRD in a population-based cohort with CKD remains unexamined. We examined the association between DAL, quantified by net acid excretion (NAEes), and progression to ESRD in a nationally representative sample of adults in the United States. Among 1486 adults with CKD age≥20 years enrolled in the National Health and Nutrition Examination Survey III, DAL was determined by 24-h dietary recall questionnaire. The development of ESRD was ascertained over a median 14.2 years of follow-up through linkage with the Medicare ESRD Registry. We used the Fine-Gray competing risks method to estimate the association of high, medium, and low DAL with ESRD after adjusting for demographics, nutritional factors, clinical factors, and kidney function/damage markers and accounting for intervening mortality events. In total, 311 (20.9%) participants developed ESRD. Higher levels of DAL were associated with increased risk of ESRD; relative hazards (95% confidence interval) were 3.04 (1.58 to 5.86) for the highest tertile and 1.81 (0.89 to 3.68) for the middle tertile compared with the lowest tertile in the fully adjusted model. The risk of ESRD associated with DAL tertiles increased as eGFR decreased (P trend=0.001). Among participants with albuminuria, high DAL was strongly associated with ESRD risk (P trend=0.03). In conclusion, high DAL in persons with CKD is independently associated with increased risk of ESRD in a nationally representative population.

  20. Amino acid sequence of versutoxin, a lethal neurotoxin from the venom of the funnel-web spider Atrax versutus.

    PubMed

    Brown, M R; Sheumack, D D; Tyler, M I; Howden, M E

    1988-03-01

    The complete amino acid sequence of versutoxin, a lethal neurotoxic polypeptide isolated from the venom of male and female funnel-web spiders of the species Atrax versutus, was determined. Sequencing was performed in a gas-phase protein sequencer by automated Edman degradation of the S-carboxymethylated toxin and fragments of it produced by reaction with CNBr. Versutoxin consisted of a single chain of 42 amino acid residues. It was found to have a high proportion of basic residues and of cystine. The primary structure showed marked homology with that of robustoxin, a novel neurotoxin recently isolated from the venom of another funnel-web-spider species, Atrax robustus.

  1. Amino acid sequence of versutoxin, a lethal neurotoxin from the venom of the funnel-web spider Atrax versutus.

    PubMed Central

    Brown, M R; Sheumack, D D; Tyler, M I; Howden, M E

    1988-01-01

    The complete amino acid sequence of versutoxin, a lethal neurotoxic polypeptide isolated from the venom of male and female funnel-web spiders of the species Atrax versutus, was determined. Sequencing was performed in a gas-phase protein sequencer by automated Edman degradation of the S-carboxymethylated toxin and fragments of it produced by reaction with CNBr. Versutoxin consisted of a single chain of 42 amino acid residues. It was found to have a high proportion of basic residues and of cystine. The primary structure showed marked homology with that of robustoxin, a novel neurotoxin recently isolated from the venom of another funnel-web-spider species, Atrax robustus. PMID:3355530

  2. Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS

    PubMed Central

    2013-01-01

    Background The large-scale identification of physical protein-protein interactions (PPIs) is an important step toward understanding how biological networks evolve and generate emergent phenotypes. However, experimental identification of PPIs is a laborious and error-prone process, and current methods of PPI prediction tend to be highly conservative or require large amounts of functional data that may not be available for newly-sequenced organisms. Results In this study we demonstrate a random-forest based technique, ENTS, for the computational prediction of protein-protein interactions based only on primary sequence data. Our approach is able to efficiently predict interactions on a whole-genome scale for any eukaryotic organism, using pairwise combinations of conserved domains and predicted subcellular localization of proteins as input features. We present the first predicted interactome for the forest tree Populus trichocarpa in addition to the predicted interactomes for Saccharomyces cerevisiae, Homo sapiens, Mus musculus, and Arabidopsis thaliana. Comparing our approach to other PPI predictors, we find that ENTS performs comparably to or better than a number of existing approaches, including several that utilize a variety of functional information for their predictions. We also find that the predicted interactions are biologically meaningful, as indicated by similarity in functional annotations and enrichment of co-expressed genes in public microarray datasets. Furthermore, we demonstrate some of the biological insights that can be gained from these predicted interaction networks. We show that the predicted interactions yield informative groupings of P. trichocarpa metabolic pathways, literature-supported associations among human disease states, and theory-supported insight into the evolutionary dynamics of duplicated genes in paleopolyploid plants. Conclusion We conclude that the ENTS classifier will be a valuable tool for the de novo annotation of genome

  3. Whole-exome sequencing predicted cancer epitope trees of 23 early cervical cancers in Chinese women.

    PubMed

    Li, Xia; Huang, Hailiang; Guan, Yanfang; Gong, Yuhua; He, Cheng-Yi; Yi, Xin; Qi, Ming; Chen, Zhi-Ying

    2017-01-01

    Emerging evidence suggest that the heterogeneity of cancer limits the efficacy of immunotherapy. To search for optimal therapeutic targets for enhancing the efficacy, we used whole-exome sequencing data of 23 early cervical tumors from Chinese women to investigate the hierarchical structure of the somatic mutations and the neo-epitopes. The putative neo-epitopes were predicted based on the mutant peptides' strong binding with major histocompatibility complex class I molecules. We found that each tumor carried an average of 117 mutations and 61 putative neo-epitopes. Each patient displayed a unique phylogenetic tree in which almost all subclones harbored neo-epitopes, highlighting the importance of individual neo-epitope tree in determination of immunotherapeutic targets. The alterations in FBXW7 and PIK3CA, or other members of the significantly altered ubiquitin-mediated proteolysis and extracellular matrix receptor interaction related pathways, were proposed as the earliest changes triggering the malignant progression. The neo-epitopes involved in these pathways, and located at the top of the hierarchy tree, might become the optimal candidates for therapeutic targets, possessing the potential to mediate T-cell killing of the descendant cells. These findings expanded our understanding in early stage of cervical carcinogenesis and offered an important approach to assist optimizing the immunotherapeutic target selection.

  4. Equilibrium model prediction for the scatter in the star-forming main sequence

    NASA Astrophysics Data System (ADS)

    Mitra, Sourav; Davé, Romeel; Simha, Vimal; Finlator, Kristian

    2017-01-01

    The analytic `equilibrium model' for galaxy evolution using a mass balance equation is able to reproduce mean observed galaxy scaling relations between stellar mass, halo mass, star formation rate (SFR), and metallicity across the majority of cosmic time with a small number of parameters related to feedback. Here, we aim to test this data-constrained model to quantify deviations from the mean relation between stellar mass and SFR, i.e. the star-forming galaxy main sequence (MS). We implement fluctuation in halo accretion rates parametrized from merger-based simulations, and quantify the intrinsic scatter introduced into the MS under the assumption that fluctuations in star formation follow baryonic inflow fluctuations. We predict the 1σ MS scatter to be ˜0.2-0.25 dex over the stellar mass range 108-1011 M⊙ and a redshift range 0.5 ≲ z ≲ 3 for SFRs averaged over 100 Myr. The scatter increases modestly at z ≳ 3, as well as by averaging over shorter time-scales. The contribution from merger-induced star formation is generally small, around 5 per cent today and 10-15 per cent during the peak epoch of cosmic star formation. These results are generally consistent with available observations, suggesting that deviations from the MS primarily reflect stochasticity in the inflow rate owing to halo mergers.

  5. Clostridium sticklandii, a specialist in amino acid degradation:revisiting its metabolism through its genome sequence

    PubMed Central

    2010-01-01

    Background Clostridium sticklandii belongs to a cluster of non-pathogenic proteolytic clostridia which utilize amino acids as carbon and energy sources. Isolated by T.C. Stadtman in 1954, it has been generally regarded as a "gold mine" for novel biochemical reactions and is used as a model organism for studying metabolic aspects such as the Stickland reaction, coenzyme-B12- and selenium-dependent reactions of amino acids. With the goal of revisiting its carbon, nitrogen, and energy metabolism, and comparing studies with other clostridia, its genome has been sequenced and analyzed. Results C. sticklandii is one of the best biochemically studied proteolytic clostridial species. Useful additional information has been obtained from the sequencing and annotation of its genome, which is presented in this paper. Besides, experimental procedures reveal that C. sticklandii degrades amino acids in a preferential and sequential way. The organism prefers threonine, arginine, serine, cysteine, proline, and glycine, whereas glutamate, aspartate and alanine are excreted. Energy conservation is primarily obtained by substrate-level phosphorylation in fermentative pathways. The reactions catalyzed by different ferredoxin oxidoreductases and the exergonic NADH-dependent reduction of crotonyl-CoA point to a possible chemiosmotic energy conservation via the Rnf complex. C. sticklandii possesses both the F-type and V-type ATPases. The discovery of an as yet unrecognized selenoprotein in the D-proline reductase operon suggests a more detailed mechanism for NADH-dependent D-proline reduction. A rather unusual metabolic feature is the presence of genes for all the enzymes involved in two different CO2-fixation pathways: C. sticklandii harbours both the glycine synthase/glycine reductase and the Wood-Ljungdahl pathways. This unusual pathway combination has retrospectively been observed in only four other sequenced microorganisms. Conclusions Analysis of the C. sticklandii genome and

  6. Limited proteolysis and sequence analysis of the 2-oxo acid dehydrogenase complexes from Escherichia coli. Cleavage sites and domains in the dihydrolipoamide acyltransferase components.

    PubMed Central

    Packman, L C; Perham, R N

    1987-01-01

    The structures of the dihydrolipoamide acyltransferase (E2) components of the 2-oxo acid dehydrogenase complexes from Escherichia coli were investigated by limited proteolysis. Trypsin and Staphylococcus aureus V8 proteinase were used to excise the three lipoyl domains from the E2p component of the pyruvate dehydrogenase complex and the single lipoyl domain from the E2o component of the 2-oxoglutarate dehydrogenase complex. The principal sites of action of these enzymes on each E2 chain were determined by sequence analysis of the isolated lipoyl fragments and of the truncated E2p and E2o chains. Each of the numerous cleavage sites (12 in E2p, six in E2o) fell within similar segments of the E2 chains, namely stretches of polypeptide rich in alanine, proline and/or charged amino acids. These regions are clearly accessible to proteinases of Mr 24,000-28,000 and, on the basis of n.m.r. spectroscopy, some of them have previously been implicated in facilitating domain movements by virtue of their conformational flexibility. The limited proteolysis data suggest that E2p and E2o possess closer architectural similarities than would be predicted from inspection of their amino acid sequences. As a result of this work, an error was detected in the sequence of E2o inferred from the previously published sequence of the encoding gene, sucB. The relevant peptides from E2o were purified and sequenced by direct means; an amended sequence is presented. Images Fig. 1. Fig. 2. PMID:3297046

  7. Amino acid sequence of neurotoxin III of the scorpion Androctonus austrialis Hector.

    PubMed

    Kopeyan, C; Martinez, G; Rochat, H

    1979-03-01

    The amino acid sequence of neurotoxin III, purified from the venom of the North African scorpion Androctonus australis Hector, has been determined by Edman degradation using a liquid-phase sequencer. Carboxypeptidase A hydrolyses confirmed not only the sequence of the five last residues but also the presence of a free alpha-carboxylic group at the C-terminus. Edman degradation was conducted on one hand with the Quadrol [N,N,N',N'-tetrakis(2-hydroxypropyl)ethylene diamine] program and S-alkylated protein before or after coupling with sulfophenylisothiocynate (the first 34 residues were thus identified), on the other hand on tryptic and chymotryptic peptides with a dimethylbenzylamine program (residues 1--23 and 31--34 were confirmed, the positions of residues 35-64 were established). Neurotoxin III was found to belong to the same group of scorpion toxins active on mammals as neurotoxin I purified from the same venom (50 homologous positions exist in the two proteins).

  8. Isolation and amino acid sequences of squirrel monkey (Saimiri sciurea) insulin and glucagon

    SciTech Connect

    Yu, Jinghua ); Eng, J.; Yalow, R.S. City Univ. of New York, NY )

    1990-12-01

    It was reported two decades ago that insulin was not detectable in the glucose-stimulated state in Saimiri sciurea, the New World squirrel monkey, by a radioimmunoassay system developed with guinea pig anti-pork insulin antibody and labeled park insulin. With the same system, reasonable levels were observed in rhesus monkeys and chimpanzees. This suggested that New World monkeys, like the New World hystricomorph rodents such as the guinea pig and the coypu, might have insulins whose sequences differ markedly from those of Old World mammals. In this report the authors describe the purification and amino acid sequences of squirrel monkey insulin and glucagon. They demonstrate that the substitutions at B29, B27, A2, A4, and A17 of squirrel monkey insulin are identical with those previously found in another New World primate, the owl monkey (Aotus trivirgatus). The immunologic cross-reactivity of this insulin in their immunoassay system is only a few percent of that of human insulin. It appears that the peptides of the New World monkeys have diverged less from those of the Old World mammals than have those of the New World hystricomorph rodents. The striking improvements in peptide purification and sequencing have the potential for adding new information concerning the evolutionary divergence of species.

  9. Purification, amino acid sequence and characterisation of kangaroo IGF-I.

    PubMed

    Yandell, C A; Francis, G L; Wheldrake, J F; Upton, Z

    1998-01-01

    Insulin-like growth factor-I (IGF-I) and IGF-II have been purified to homogeneity from kangaroo (Macropus fuliginosus) serum, thus this represents the first report of the purification, sequencing and characterisation of marsupial IGFs. N-Terminal protein sequencing reveals that there are six amino acid differences between kangaroo and human IGF-I. Kangaroo IGF-II has been partially sequenced and no differences were found between human and kangaroo IGF-II in the 53 residues identified. Thus the IGFs appear to be remarkably structurally conserved during mammalian radiation. In addition, in vitro characterisation of kangaroo IGF-I demonstrated that the functional properties of human, kangaroo and chicken IGF-I are very similar. In an assay measuring the ability of the proteins to stimulate protein synthesis in rat L6 myoblasts, all IGF-I proteins were found to be equally potent. The ability of all three proteins to compete for binding with radiolabelled human IGF-I to type-1 IGF receptors in L6 myoblasts and in Sminthopsis crassicaudata transformed lung fibroblasts, a marsupial cell line, was comparable. Furthermore, kangaroo and human IGF-I react equally in a human IGF-I RIA using a human reference standard, radiolabelled human IGF-I and a polyclonal antibody raised against recombinant human IGF-I. This study indicates that not only is the primary structure of eutherian and metatherian IGF-I conserved, but also the proteins appear to be functionally similar.

  10. Complete Genome Sequence of the Prototype Lactic Acid Bacterium Lactococcus lactis subsp. cremoris MG1363▿

    PubMed Central

    Wegmann, Udo; O'Connell-Motherway, Mary; Zomer, Aldert; Buist, Girbe; Shearman, Claire; Canchaya, Carlos; Ventura, Marco; Goesmann, Alexander; Gasson, Michael J.; Kuipers, Oscar P.; van Sinderen, Douwe; Kok, Jan

    2007-01-01

    Lactococcus lactis is of great importance for the nutrition of hundreds of millions of people worldwide. This paper describes the genome sequence of Lactococcus lactis subsp. cremoris MG1363, the lactococcal strain most intensively studied throughout the world. The 2,529,478-bp genome contains 81 pseudogenes and encodes 2,436 proteins. Of the 530 unique proteins, 47 belong to the COG (clusters of orthologous groups) functional category “carbohydrate metabolism and transport,” by far the largest category of novel proteins in comparison with L. lactis subsp. lactis IL1403. Nearly one-fifth of the 71 insertion elements are concentrated in a specific 56-kb region. This integration hot-spot region carries genes that are typically associated with lactococcal plasmids and a repeat sequence specifically found on plasmids and in the “lateral gene transfer hot spot” in the genome of Streptococcus thermophilus. Although the parent of L. lactis MG1363 was used to demonstrate lysogeny in Lactococcus, L. lactis MG1363 carries four remnant/satellite phages and two apparently complete prophages. The availability of the L. lactis MG1363 genome sequence will reinforce its status as the prototype among lactic acid bacteria through facilitation of further applied and fundamental research. PMID:17307855

  11. X-29 flight - Acid test for design predictions

    NASA Technical Reports Server (NTRS)

    Putnam, T. W.; Petersen, K. L.; Ishmael, S. D.; Sefic, W. J.

    1986-01-01

    The X-29 flight test data are being disseminated to interested industrial and military users as fast as it becomes available. The aircraft is extensively instrumented with accelerometers and pressure sensors and optical sensors for measuring wing deflection. The thoroughness of preflight preparations permitted a rapid advance through initial test checkpoints, which have both confirmed many predictions and revealed several discrepancies. The flight envelope had been expanded to Mach 1.1 and an altitude of 40,000 ft by December 1985. Notably, the X-29 has provided in-flight data which could not be faithfully depicted in a simulator, e.g., flare procedures during landing, and has shown that the stability adjustments, although adequate for controlling the aircraft, are not rapid enough to offer a satisfactory margin of harmony. The tests are now being performed in the transonic regime, where supercritical airfoil and forward swept wing drag reduction become significant factors.

  12. Application of a target enrichment-based next-generation sequencing protocol for identification and sequence-based prediction of pneumococcal serotypes

    PubMed Central

    2014-01-01

    Background The use of whole-genome sequencing in microbiology at a diagnostic level, although feasible, is still limited by the expenses associated and by the complex bioinformatics pipelines in data analyses. We describe the use of target enrichment-based next-generation sequencing for pneumococcal identification and serotyping as applied to the polysaccharide 23 valent vaccine serotypes as an affordable alternative to whole genome sequencing. Results Correct identification of Streptococcus pneumoniae and prediction of common vaccine serotypes: 12 to serotype level and the rest to serogroup levels were achieved for all serotypes with >500 reads mapped against serotypes sequences. A proportion-based criterion also enabled the identification of two serotypes present in the same sample, thus indicating the possibility of using this method in detecting co-colonizing serotypes. The results obtained were comparable to or an improvement on the currently existing molecular serotyping methods for S. pneumoniae in relation to the polysaccharide vaccine serotypes. Conclusion We propose that this method has the potential to become an affordable and adaptable alternative to whole-genome sequencing for pneumococcal identification and serotyping. PMID:24612771

  13. The ABRF Edman Sequencing Research Group 2008 Study: Investigation into Homopolymeric Amino Acid N-Terminal Sequence Tags and Their Effects on Automated Edman Degradation

    PubMed Central

    Thoma, R. S.; Smith, J. S.; Sandoval, W.; Leone, J. W.; Hunziker, P.; Hampton, B.; Linse, K. D.; Denslow, N. D.

    2009-01-01

    The Edman Sequence Research Group (ESRG) of the Association of Biomolecular Resource designs and executes interlaboratory studies investigating the use of automated Edman degradation for protein and peptide analysis. In 2008, the ESRG enlisted the help of core sequencing facilities to investigate the effects of a repeating amino acid tag at the N-terminus of a protein. Commonly, to facilitate protein purification, an affinity tag containing a polyhistidine sequence is conjugated to the N-terminus of the protein. After expression, polyhistidine-tagged protein is readily purified via chelation with an immobilized metal affinity resin. The addition of the polyhistidine tag presents unique challenges for the determination of protein identity using Edman degradation chemistry. Participating laboratories were asked to sequence one protein engineered in three configurations: with an N-terminal polyhistidine tag; with an N-terminal polyalanine tag; or with no tag. Study participants were asked to return a data file containing the uncorrected amino acid picomole yields for the first 17 cycles. Initial and repetitive yield (R.Y.) information and the amount of lag were evaluated. Information about instrumentation and sample treatment was also collected as part of the study. For this study, the majority of participating laboratories successfully called the amino acid sequence for 17 cycles for all three test proteins. In general, laboratories found it more difficult to call the sequence containing the polyhistidine tag. Lag was observed earlier and more consistently with the polyhistidine-tagged protein than the polyalanine-tagged protein. Histidine yields were significantly less than the alanine yields in the tag portion of each analysis. The polyhistidine and polyalanine protein-R.Y. calculations were found to be equivalent. These calculations showed that the nontagged portion from each protein was equivalent. The terminal histidines from the tagged portion of the protein

  14. Deformation history and load sequence effects on cumulative fatigue damage and life predictions

    NASA Astrophysics Data System (ADS)

    Colin, Julie

    Fatigue loading seldom involves constant amplitude loading. This is especially true in the cooling systems of nuclear power plants, typically made of stainless steel, where thermal fluctuations and water turbulent flow create variable amplitude loads, with presence of mean stresses and overloads. These complex loading sequences lead to the formation of networks of microcracks (crazing) that can propagate. As stainless steel is a material with strong deformation history effects and phase transformation resulting from plastic straining, such load sequence and variable amplitude loading effects are significant to its fatigue behavior and life predictions. The goal of this study was to investigate the effects of cyclic deformation on fatigue behavior of stainless steel 304L as a deformation history sensitive material and determine how to quantify and accumulate fatigue damage to enable life predictions under variable amplitude loading conditions for such materials. A comprehensive experimental program including testing under fully-reversed, as well as mean stress and/or mean strain conditions, with initial or periodic overloads, along with step testing and random loading histories was conducted on two grades of stainless steel 304L, under both strain-controlled and load-controlled conditions. To facilitate comparisons with a material without deformation history effects, similar tests were also carried out on aluminum 7075-T6. Experimental results are discussed, including peculiarities observed with stainless steel behavior, such as a phenomenon, referred to as secondary hardening characterized by a continuous increase in the stress response in a strain-controlled test and often leading to runout fatigue life. Possible mechanisms for secondary hardening observed in some tests are also discussed. The behavior of aluminum is shown not to be affected by preloading, whereas the behavior of stainless steel is greatly influenced by prior loading. Mean stress relaxation in

  15. Evaluating long-term relationship of protein sequence by use of D-interval conditional probability and its impact on protein structural class prediction.

    PubMed

    Gu, Fei; Chen, Hang

    2009-01-01

    To fix the large and expanding gap between sequence known proteins and structure known proteins, it is important to study on protein structural class prediction (PSCP) for its foundation and usefulness in protein structure analysis. In this paper, the d-interval conditional probability index was proposed to reflect the long-term correlation between amino acids. Based on this index, the impact of residues' long-term relationship on PSCP was analyzed. Two new information theory based algorithms were proposed and were used combining with the long-term information between residues to predict protein structural class (PSC). The dataset 5714 was tested for its low sequence similarity and high reliability. The result showed that the new index was 3-6% higher than traditional index by use of the same algorithms, and the PSCP accuracy was 4-10% improved using the new algorithms. The presented index, algorithms and the long-term relationship of residues on PSCP can be extensively applied in other sequence based protein structure analysis.

  16. The amino acid sequence around the active-site cysteine and histidine residues, and the buried cysteine residue in ficin.

    PubMed

    Husain, S S; Lowe, G

    1970-04-01

    Ficin that had been prepared from the latex of Ficus glabrata by salt fractionation and chromatography on carboxymethylcellulose was completely and irreversibly inhibited with 1,3-dibromo[2-(14)C]acetone and then treated with N-(4-dimethylamino-3,5-dinitrophenyl)maleimide in 6m-guanidinium chloride. After reduction and carboxymethylation of the labelled protein, it was digested with trypsin and alpha-chymotrypsin. Two radioactive peptides and two coloured peptides were isolated chromatographically and their sequences determined. The radioactive peptides revealed the amino acid sequences around the active-site cysteine and histidine residues and showed a high degree of homology with the omino acid sequence around the active-site cysteine and histidine residues in papain. The coloured peptides allowed the amino acid sequence around the buried cysteine residue in ficin to be determined.

  17. The `heavy' subunit of the photosynthetic reaction centre from Rhodopseudomonas viridis: isolation of the gene, nucleotide and amino acid sequence

    PubMed Central

    Michel, H.; Weyer, K. A.; Gruenberg, H.; Lottspeich, F.

    1985-01-01

    The gene coding for the `heavy' subunit of the photosynthetic reaction centre from Rhodopseudomonas viridis was isolated in an expression vector. Expression of the heavy subunit in Escherichia coli was detected with antibodies raised against crystalline reaction centres. The entire subunit, and not a fusion protein, was expressed in E. coli. The protein coding region of the gene was sequenced and the amino acid sequence derived. Part of the amino acid sequence was confirmed by chemical sequence analysis of the protein. The heavy subunit consists of 258 amino acids and its mol. wt. is 28 345. It possesses one membrane-spanning α-helical segment, as was revealed by the concomitant X-ray structure analysis. ImagesFig. 1.Fig. 2. PMID:16453623

  18. Purification, amino acid sequence and immunological characterization of Ole e 6, a cysteine-enriched allergen from olive tree pollen.

    PubMed

    Batanero, E; Ledesma, A; Villalba, M; Rodríguez, R

    1997-06-30

    The Ole e 6 allergen from olive tree pollen has been isolated by combining gel permeation and reverse-phase chromatographies. It is a single and highly acidic (pI 4.2) polypeptide chain protein. Its NH2-terminal amino acid sequence has been determined by Edman degradation. Total RNA from the olive tree pollen was isolated, and a specific cDNA was amplified by the polymerase chain reaction using a degenerate oligonucleotide primer designed according to the NH2-terminal sequence of the protein. The nucleotide sequencing of the cDNA rendered an open reading frame encoding a 50 amino acid polypeptide chain, in which two sets of the sequential motif Cys-X3-Cys-X3-Cys are present. No sequence similarity has been found between this protein and other previously described polypeptides.

  19. The effects of demography and long-term selection on the accuracy of genomic prediction with sequence data.

    PubMed

    MacLeod, Iona M; Hayes, Ben J; Goddard, Michael E

    2014-12-01

    The use of dense SNPs to predict the genetic value of an individual for a complex trait is often referred to as "genomic selection" in livestock and crops, but is also relevant to human genetics to predict, for example, complex genetic disease risk. The accuracy of prediction depends on the strength of linkage disequilibrium (LD) between SNPs and causal mutations. If sequence data were used instead of dense SNPs, accuracy should increase because causal mutations are present, but demographic history and long-term negative selection also influence accuracy. We therefore evaluated genomic prediction, using simulated sequence in two contrasting populations: one reducing from an ancestrally large effective population size (Ne) to a small one, with high LD common in domestic livestock, while the second had a large constant-sized Ne with low LD similar to that in some human or outbred plant populations. There were two scenarios in each population; causal variants were either neutral or under long-term negative selection. For large Ne, sequence data led to a 22% increase in accuracy relative to ∼600K SNP chip data with a Bayesian analysis and a more modest advantage with a BLUP analysis. This advantage increased when causal variants were influenced by negative selection, and accuracy persisted when 10 generations separated reference and validation populations. However, in the reducing Ne population, there was little advantage for sequence even with negative selection. This study demonstrates the joint influence of demography and selection on accuracy of prediction and improves our understanding of how best to exploit sequence for genomic prediction.

  20. Nucleotide and derived amino acid sequences of the major porin of Comamonas acidovorans and comparison of porin primary structures.

    PubMed Central

    Gerbl-Rieger, S; Peters, J; Kellermann, J; Lottspeich, F; Baumeister, W

    1991-01-01

    The DNA sequence of the gene which codes for the major outer membrane porin (Omp32) of Comamonas acidovorans has been determined. The structural gene encodes a precursor consisting of 351 amino acid residues with a signal peptide of 19 amino acid residues. Comparisons with amino acid sequences of outer membrane proteins and porins from several other members of the class Proteobacteria and of the Chlamydia trachomatis porin and the Neurospora crassa mitochondrial porin revealed a motif of eight regions of local homology. The results of this analysis are discussed with regard to common structural features of porins. PMID:1848840

  1. Prediction of virological response by pretreatment hepatitis B virus reverse transcriptase quasispecies heterogeneity: the advantage of using next-generation sequencing.

    PubMed

    Han, Y; Gong, L; Sheng, J; Liu, F; Li, X-H; Chen, L; Yu, D-M; Gong, Q-M; Hao, P; Zhang, X-X

    2015-08-01

    Prediction of antiviral efficacy prior to treatment remains largely unavailable. We have previously demonstrated the clinical value of on-treatment hepatitis B virus (HBV) reverse transcriptase (RT) quasispecies (QS) evolution patterns. In this study, we aimed to elucidate the relevance for prediction of pretreatment HBV RT QS characteristics by comparing the performance of next-generation sequencing (NGS) and clone-based Sanger sequencing (CBS). Thirty-six lamivudine-treated patients were retrospectively studied, including 18 responders and 18 non-responders. CBS and NGS data of pretreatment serum HBV were used to generate RT QS genetic complexity and diversity scores, according to our previous studies. The ability of both methods to predict responsiveness was evaluated with receiver operating characteristic (ROC) curves. A cut-off value was generated on the basis of prediction ability. Responders had significantly higher pretreatment RT QS genetic complexity and diversity (in the first two parts, which overlapped with the S gene, at both the nucleotide and amino acid levels) than non-responders by NGS-based testing. NGS-based algorithms predicted response better than CBS in the ROC curve analysis. The mean distance of the second contig had the highest area under the curve (AUC) value. When the cut-off value was set to 0.007186, the difference between survival curves was significant (p 0.0090). Pretreatment HBV RT QS heterogeneity in the overlapping region of the RT and S genes could be a predictor of antiviral efficacy. NGS improves the predictions of virological outcomes relative to CBS algorithms. This may have important implications for the clinical management of subjects chronically infected with HBV.

  2. Fragmentation Characteristics of Deprotonated N-linked Glycopeptides: Influences of Amino Acid Composition and Sequence

    NASA Astrophysics Data System (ADS)

    Nishikaze, Takashi; Kawabata, Shin-ichirou; Tanaka, Koichi

    2014-06-01

    Glycopeptide structural analysis using tandem mass spectrometry is becoming a common approach for elucidating site-specific N-glycosylation. The analysis is generally performed in positive-ion mode. Therefore, fragmentation of protonated glycopeptides has been extensively investigated; however, few studies are available on deprotonated glycopeptides, despite the usefulness of negative-ion mode analysis in detecting glycopeptide signals. Here, large sets of glycopeptides derived from well-characterized glycoproteins were investigated to understand the fragmentation behavior of deprotonated N-linked glycopeptides under low-energy collision-induced dissociation (CID) conditions. The fragment ion species were found to be significantly variable depending on their amino acid sequence and could be classified into three types: (i) glycan fragment ions, (ii) glycan-lost fragment ions and their secondary cleavage products, and (iii) fragment ions with intact glycan moiety. The CID spectra of glycopeptides having a short peptide sequence were dominated by type (i) glycan fragments (e.g., 2,4AR, 2,4AR-1, D, and E ions). These fragments define detailed structural features of the glycan moiety such as branching. For glycopeptides with medium or long peptide sequences, the major fragments were type (ii) ions (e.g., [peptide + 0,2X0-H]- and [peptide-NH3-H]-). The appearance of type (iii) ions strongly depended on the peptide sequence, and especially on the presence of Asp, Asn, and Glu. When a glycosylated Asn is located on the C-terminus, an interesting fragment having an Asn residue with intact glycan moiety, [glycan + Asn-36]-, was abundantly formed. Observed fragments are reasonably explained by a combination of existing fragmentation rules suggested for N-glycans and peptides.

  3. An amino acid sequence motif sufficient for subnuclear localization of an arginine/serine-rich splicing factor.

    PubMed

    Hedley, M L; Amrein, H; Maniatis, T

    1995-12-05

    We have identified an amino acid sequence in the Drosophila Transformer (Tra) protein that is capable of directing a heterologous protein to nuclear speckles, regions of the nucleus previously shown to contain high concentrations of spliceosomal small nuclear RNAs and splicing factors. This sequence contains a nucleoplasmin-like bipartite nuclear localization signal (NLS) and a repeating arginine/serine (RS) dipeptide sequence adjacent to a short stretch of basic amino acids. Sequence comparisons from a number of other splicing factors that colocalize to nuclear speckles reveal the presence of one or more copies of this motif. We propose a two-step subnuclear localization mechanism for splicing factors. The first step is transport across the nuclear envelope via the nucleoplasmin-like NLS, while the second step is association with components in the speckled domain via the RS dipeptide sequence.

  4. Purification and partial amino acid sequence of the chloroplast cytochrome b-559.

    PubMed

    Widger, W R; Cramer, W A; Hermodson, M; Meyer, D; Gullifor, M

    1984-03-25

    The hydrophobic cytochrome b-559, purified from unstacked, ethanol-washed spinach thylakoid membranes, using extraction with 2% Triton X-100 in 4 M urea and three chromatographic steps in the presence of protease inhibitors, has a dominant band on sodium dodecyl sulfate-urea gels corresponding to Mr = 10,000. The yield of this preparation is 30-50% (5-10 mg) starting with 600 mg of chlorophyll. The heme content yields a calculated molecular weight of no more than 17,500/heme, and perhaps somewhat smaller after correction for impurities. The Mr = 10,000 band is stained by the tetramethylbenzidine-H2O2 heme reagent on lithium dodecyl sulfate gels run at 0 degrees C. The Mr = 10,000 protein, further separated by high performance liquid chromatography, contains a unique NH2 terminus that is not blocked, and the amino acid sequence for the first 27 residues is NH2-Ser-Gly-Ser-Thr-Gly-Glu-Arg-Ser-Phe-Ala-Asp-Ile-Ile-Thr-Ser-Ile-Arg-Tyr-Trp -Val-Ile-X-Ser-Ile-Thr-Ile-Pro. . . COOH. Approximately 55% of the amino acids are hydrophobic, based on amino acid analysis of the Mr = 10,000 peptide, which also indicated the presence of at least one histidine. Only one cytochrome b-559 component could be identified, whose yield indicated that it arises from a single b-559 protein in chloroplasts corresponding to the in situ high potential cytochrome of the chloroplast photosystem II.

  5. Chapter 8. Methods for in silico prediction of microbial polyketide and nonribosomal peptide biosynthetic pathways from DNA sequence data.

    PubMed

    Bachmann, Brian O; Ravel, Jacques

    2009-01-01

    Fore-knowledge of the secondary metabolic potential of cultivated and previously uncultivated microorganisms can potentially facilitate the process of natural product discovery. By combining sequence-based knowledge with biochemical precedent, translated gene sequence data can be used to rapidly derive structural elements encoded by secondary metabolic gene clusters from microorganisms. These structural elements provide an estimate of the secondary metabolic potential of a given organism and a starting point for identification of potential lead compounds in isolation/structure elucidation campaigns. The accuracy of these predictions for a given translated gene sequence depends on the biochemistry of the metabolite class, similarity to known metabolite gene clusters, and depth of knowledge concerning its biosynthetic machinery. This chapter introduces methods for prediction of structural elements for two well-studied classes: modular polyketides and nonribosomally encoded peptides. A bioinformatics tool is presented for rapid preliminary analysis of these modular systems, and prototypical methods for converting these analyses into substructural elements are described.

  6. Mathematical and Live Meningococcal Models for Simple Sequence Repeat Dynamics – Coherent Predictions and Observations

    PubMed Central

    Alfsnes, Kristian; Raynaud, Xavier; Tønjum, Tone; Ambur, Ole Herman

    2014-01-01

    Evolvability by means of simple sequence repeat (SSR) instability is a feature under the constant influence of opposing selective pressures to expand and compress the repeat tract and is mechanistically influenced by factors that affect genetic instability. In addition to direct selection for protein expression and structural integrity, other factors that influence tract length evolution were studied. The genetic instability of SSRs that switch the expression of antibiotic resistance ON and OFF was modelled mathematically and monitored in a panel of live meningococcal strains. The mathematical model showed that the SSR length of a theoretical locus in an evolving population may be shaped by direct selection of expression status (ON or OFF), tract length dependent (α) and tract length independent factors (β). According to the model an increase in α drives the evolution towards shorter tracts. An increase in β drives the evolution towards a normal distribution of tract lengths given that an upper and a lower limit are set. Insertion and deletion biases were shown to skew allelic distributions in both directions. The meningococcal SSR model was tested in vivo by monitoring the frequency of spectinomycin resistance OFF→ON switching in a designed locus. The instability of a comprehensive panel of the homopolymeric SSRs, constituted of a range of 5–13 guanine nucleotides, was monitored in wildtype and mismatch repair deficient backgrounds. Both the repeat length itself and mismatch repair deficiency were shown to influence the genetic instability of the homopolymeric tracts. A possible insertion bias was observed in tracts ≤G10. Finally, an inverse correlation between the number of tract-encoded amino acids and growth in the presence of ON-selection illustrated a limitation to SSR expansion in an essential gene associated with the designed model locus and the protein function mediating antibiotic resistance. PMID:24999629

  7. A framework for establishing predictive relationships between specific bacterial 16S rRNA sequence abundances and biotransformation rates.

    PubMed

    Helbling, Damian E; Johnson, David R; Lee, Tae Kwon; Scheidegger, Andreas; Fenner, Kathrin

    2015-03-01

    The rates at which wastewater treatment plant (WWTP) microbial communities biotransform specific substrates can differ by orders of magnitude among WWTP communities. Differences in taxonomic compositions among WWTP communities may predict differences in the rates of some types of biotransformations. In this work, we present a novel framework for establishing predictive relationships between specific bacterial 16S rRNA sequence abundances and biotransformation rates. We selected ten WWTPs with substantial variation in their environmental and operational metrics and measured the in situ ammonia biotransformation rate constants in nine of them. We isolated total RNA from samples from each WWTP and analyzed 16S rRNA sequence reads. We then developed multivariate models between the measured abundances of specific bacterial 16S rRNA sequence reads and the ammonia biotransformation rate constants. We constructed model scenarios that systematically explored the effects of model regularization, model linearity and non-linearity, and aggregation of 16S rRNA sequences into operational taxonomic units (OTUs) as a function of sequence dissimilarity threshold (SDT). A large percentage (greater than 80%) of model scenarios resulted in well-performing and significant models at intermediate SDTs of 0.13-0.14 and 0.26. The 16S rRNA sequences consistently selected into the well-performing and significant models at those SDTs were classified as Nitrosomonas and Nitrospira groups. We then extend the framework by applying it to the biotransformation rate constants of ten micropollutants measured in batch reactors seeded with the ten WWTP communities. We identified phylogenetic groups that were robustly selected into all well-performing and significant models constructed with biotransformation rates of isoproturon, propachlor, ranitidine, and venlafaxine. These phylogenetic groups can be used as predictive biomarkers of WWTP microbial community activity towards these specific

  8. The Genome Sequence of the Highly Acetic Acid-Tolerant Zygosaccharomyces bailii-Derived Interspecies Hybrid Strain ISA1307, Isolated From a Sparkling Wine Plant

    PubMed Central

    Mira, Nuno P.; Münsterkötter, Martin; Dias-Valada, Filipa; Santos, Júlia; Palma, Margarida; Roque, Filipa C.; Guerreiro, Joana F.; Rodrigues, Fernando; Sousa, Maria João; Leão, Cecília; Güldener, Ulrich; Sá-Correia, Isabel

    2014-01-01

    In this work, it is described the sequencing and annotation of the genome of the yeast strain ISA1307, isolated from a sparkling wine continuous production plant. This strain, formerly considered of the Zygosaccharomyces bailii species, has been used to study Z. bailii physiology, in particular, its extreme tolerance to acetic acid stress at low pH. The analysis of the genome sequence described in this work indicates that strain ISA1307 is an interspecies hybrid between Z. bailii and a closely related species. The genome sequence of ISA1307 is distributed through 154 scaffolds and has a size of around 21.2 Mb, corresponding to 96% of the genome size estimated by flow cytometry. Annotation of ISA1307 genome includes 4385 duplicated genes (∼90% of the total number of predicted genes) and 1155 predicted single-copy genes. The functional categories including a higher number of genes are ‘Metabolism and generation of energy’, ‘Protein folding, modification and targeting’ and ‘Biogenesis of cellular components’. The knowledge of the genome sequence of the ISA1307 strain is expected to contribute to accelerate systems-level understanding of stress resistance mechanisms in Z. bailii and to inspire and guide novel biotechnological applications of this yeast species/strain in fermentation processes, given its high resilience to acidic stress. The availability of the ISA1307 genome sequence also paves the way to a better understanding of the genetic mechanisms underlying the generation and selection of more robust hybrid yeast strains in the stressful environment of wine fermentations. PMID:24453040

  9. The genome sequence of the highly acetic acid-tolerant Zygosaccharomyces bailii-derived interspecies hybrid strain ISA1307, isolated from a sparkling wine plant.

    PubMed

    Mira, Nuno P; Münsterkötter, Martin; Dias-Valada, Filipa; Santos, Júlia; Palma, Margarida; Roque, Filipa C; Guerreiro, Joana F; Rodrigues, Fernando; Sousa, Maria João; Leão, Cecília; Güldener, Ulrich; Sá-Correia, Isabel

    2014-06-01

    In this work, it is described the sequencing and annotation of the genome of the yeast strain ISA1307, isolated from a sparkling wine continuous production plant. This strain, formerly considered of the Zygosaccharomyces bailii species, has been used to study Z. bailii physiology, in particular, its extreme tolerance to acetic acid stress at low pH. The analysis of the genome sequence described in this work indicates that strain ISA1307 is an interspecies hybrid between Z. bailii and a closely related species. The genome sequence of ISA1307 is distributed through 154 scaffolds and has a size of around 21.2 Mb, corresponding to 96% of the genome size estimated by flow cytometry. Annotation of ISA1307 genome includes 4385 duplicated genes (∼ 90% of the total number of predicted genes) and 1155 predicted single-copy genes. The functional categories including a higher number of genes are 'Metabolism and generation of energy', 'Protein folding, modification and targeting' and 'Biogenesis of cellular components'. The knowledge of the genome sequence of the ISA1307 strain is expected to contribute to accelerate systems-level understanding of stress resistance mechanisms in Z. bailii and to inspire and guide novel biotechnological applications of this yeast species/strain in fermentation processes, given its high resilience to acidic stress. The availability of the ISA1307 genome sequence also paves the way to a better understanding of the genetic mechanisms underlying the generation and selection of more robust hybrid yeast strains in the stressful environment of wine fermentations.

  10. Predicting Folding Sequences Based on the Maximum Rock Strength and Mechanical Equilibrium

    NASA Astrophysics Data System (ADS)

    Cubas, N.; Souloumiac, P.; Maillot, B.; Leroy, Y. M.

    2007-12-01

    The objective is to propose and validate simple procedures, compared to the finite-element method, to select and optimize the dominant mode of folding in fold-and-thrust belts and accretionary wedges, and to determine its stress distribution. Mechanical equilibrium as well as the constraints due to the limited rock strength of the bulk material and of major discontinuities, such as décollements, are accounted for. The first part of the proposed procedure, which is at the core of the external approach of classical limit analysis, consists in estimating the least upper bound on the tectonic force by minimisation of the internal dissipation and part of the external work. The new twist to the method is that the optimization is also done with respect to the geometry of the evolving fold. If several folding events are possible, the dominant mode is the one leading to the least upper bound. The second part of the procedure is based on the Equilibrium Element Method, which is an application of the internal approach of limit analysis. The optimum stress field, obtained by spatial discretisation of the fold, provides the best lower bound on the tectonic force. The difference between the two bounds defines an error estimate of the exact unknown tectonic force. To show the merits of the proposed procedure, its first part is applied to predict the life span of a thrust within an accretionary prism, from its onset, its development with a relief build up and its arrest because of the onset of a more favorable new thrust (Cubas et al., 2007). This life span is sensitive to the friction angles over the ramp and the décollement. It is shown how the normal sequence of thrusting in a supercritical wedge is ended with the first out-of sequence event. The second part of the procedure provides the stress state over each thrust showing that the active back thrust is a narrow fan which dip is sensitive to the friction angle over the ramp and the amount of relief build up (Souloumiac et

  11. Recombination frequency in plasmid DNA containing direct repeats--predictive correlation with repeat and intervening sequence length.

    PubMed

    Oliveira, Pedro H; Lemos, Francisco; Monteiro, Gabriel A; Prazeres, Duarte M F

    2008-09-01

    In this study, a simple non-linear mathematical function is proposed to accurately predict recombination frequencies in bacterial plasmid DNA harbouring directly repeated sequences. The mathematical function, which was developed on the basis of published data on deletion-formation in multicopy plasmids containing direct-repeats (14-856 bp) and intervening sequences (0-3872 bp), also accounts for the strain genotype in terms of its recA function. A bootstrap resampling technique was used to estimate confidence intervals for the correlation parameters. More than 92% of the predicted values were found to be within a pre-established +/-5-fold interval of deviation from experimental data. The correlation does not only provide a way to predict, with good accuracy, the recombination frequency, but also opens the way to improve insight into these processes.

  12. Alignment of 700 globin sequences: extent of amino acid substitution and its correlation with variation in volume.

    PubMed Central

    K